Christian Balbin

122 posts

Christian Balbin

@C_Balbin

Medical AI research | PhD ‘24 @UUtah

Entrou em Ağustos 2025

267 Seguindo37 Seguidores

Christian Balbin retweetou

Benjamin Marie@bnjmn_marie·18h

Gemma 4 31B vs Qwen3.5 27B: Inference Throughput RTX Pro 6000 (vLLM; 8k input tokens; 64k output tokens): - Qwen3.5 27B: 45 tok/sec with MTP (17.9 tok/sec without MTP) - Gemma 4 31B: 12.3 tok/sec The gap is large. But still, end-to-end, Gemma 4 31B remains faster since it generates fewer tokens. Will publish more about this soon. Eagle support for Gemma 4 31B is also coming to vLLM and should significantly accelerate inference.

English

133

19.5K

Christian Balbin retweetou

Mark Kretschmann@mark_k·1d

OpenAI nerfed Codex and now you hit the limit after one hour of coding. Obviously they want you to sell the new $100 Pro plan instead. Not cool, @OpenAI 😠

English

178

1.4K

65.7K

Christian Balbin retweetou

Tyler@rezoundous·1d

Dear OpenAI, I am hitting the 5h session limit within 10 minutes on my Codex Plus plan. You have made it completely unusable overnight. Please fix it. Thanks.

English

269

146

2.4K

187.7K

Christian Balbin@C_Balbin·1d

@denuvocracker @theo @JoshRadDev Nah that’s correct. Another nuance is that what people are leaking is actually the developer prompt. The sys prompt is a higher level and contains these parameters. So makes sense that the leaked “sys” prompts don’t have a reasoning effort parameter in them

English

Denuvo Cracker@denuvocracker·1d

@C_Balbin @theo @JoshRadDev iirc gpt oss gives a string for the reasoning effort they dont use the juice number that proprietrary openai models use, could be wrong on this tho

English

Theo - t3.gg@theo·1d

nvm I was wrong. Repro'd this 3 times in a row. I need to stop assuming Anthropic is competent. Burns me every time I do 🙃

Theo - t3.gg@theo

Fun fact: LLMs have zero idea how they are configured. They don't know what GPUs they're running on. They don't know what temperature or reasoning level they have set. They don't know if they've been quantized or not. They're just doing next-token prediction. As always.

English

1.3K

214.8K

Christian Balbin@C_Balbin·1d

@theo @JoshRadDev Link developers.openai.com/cookbook/artic…

English

145

Theo - t3.gg@theo·1d

@JoshRadDev Any evidence of this? The leaked system prompts don't include any information about this

English

11.8K

Christian Balbin@C_Balbin·1d

@theo Idk if that’s a hallucination or not but makes some sense . GPT-OSS’ reasoning effort is literally just controlled by the system prompt

English

1.8K

Christian Balbin@C_Balbin·1d

@yacineMTB put it back

English

762

kache@yacineMTB·1d

Welp. The cats out of the bag

English

210

52K

Christian Balbin@C_Balbin·1d

@sama Thats awful, sorry that happened to you. I really hope that the anti-AI sentiment will abate as AI helps us discovery new science, cures etc. I hope people see the positive

English

541

Sam Altman@sama·1d

I wrote this early this morning and I wasn't sure if I would actually publish it, but here it is: blog.samaltman.com/2279512

English

2.4K

1.2K

14.7K

5.4M

Christian Balbin@C_Balbin·1d

@jardel1307 @LLMJunky Same here … wish they have 2x usage off peak or something. The 20 dollar plan isn’t usable for me anymore

English

Jardel@jardel1307·1d

@LLMJunky no, not even close. it is not same usage amount. i am 100% sure.

English

161

am.will@LLMJunky·1d

the duality of man what do you think of the new rebalancing of the plus plan? the Codex team does listen... keep in mind that you are getting the same usage as before the promo, just in smaller, more frequent chunks.

Aibra@aibra

I rarely hit my usage limit on the $20/month ChatGPT plus tier what the heck am I going to do with 5x more usage??

English

10.9K

Christian Balbin retweetou

kache@yacineMTB·1d

he's right

English

184

3.2K

142.8K

Christian Balbin@C_Balbin·1d

@RedHat_AI @_akhaliq Wow - my biggest disappointment was google gatekeeping MTP away from the safe tensor files . This is amazing

English

420

Christian Balbin retweetou

Red Hat AI@RedHat_AI·1d

Speculative decoding for Gemma 4 31B (EAGLE-3) A 2B draft model predicts tokens ahead; the 31B verifier validates them. Same output, faster inference. Early release. vLLM main branch support is in progress (PR #39450). Reasoning support coming soon. huggingface.co/RedHatAI/gemma…

English

322

19.5K

Christian Balbin@C_Balbin·1d

@simonw i see this all the time. i have to tell all my friends not to judge AI by the voice model.

English

Simon Willison@simonw·1d

I think it's non-obvious to many people that the OpenAI voice mode runs on a much older, much weaker model - it feels like the AI that you can talk to should be the smartest AI but it really isn't

Andrej Karpathy@karpathy

Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions. TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.

English

115

1.6K

255.9K

Christian Balbin@C_Balbin·2d

@PrunaAI @replicate I really wish someone would figure out MTP for the dense gemma model

English

165

Pruna AI@PrunaAI·2d

Models are getting fast. We make them faster. 🚀 We just deployed optimized inference for Google's Gemma 4 26B on @replicate, and we've managed to squeeze performance a lot against other deployments: ⚡ +20% throughput ⏱️ -50% time to first token We made Gemma more production-ready by applying a bag of tricks: kernel tuning, quantization, backend selection, and the secret sauce that makes good models shine in fast, affordable deployments. Gemma 4 is a remarkable open model, and we're proud to help it reach its full potential. 👉 Try it yourself: replicate.com/prunaai/gemma-…

English

131

6.6K

Christian Balbin@C_Balbin·2d

@phuctm97 I would just leave it on high until it gets stumped (rare) and then move it to xhigh. If you leave it in xhigh it will routinely overcomplicate things.

English

694

Minh-Phuc Tran@phuctm97·2d

Is it true that Codex GPT 5.4 with Extra High thinking effort is worse than with Medium/High thinking effort? If it’s true, that’d be very bad design. 😅

English

193

42.4K

Christian Balbin@C_Balbin·2d

@TheAhmadOsman 18 tps with Qwens terrible reasoning efficiency is rough :/

English

259

Ahmad@TheAhmadOsman·2d

Some stats of my last run on the 4x DGX Sparks before I call it a night

English

102

8.2K

Christian Balbin@C_Balbin·2d

@chesterzelaya Did you crack middle out compression ?

English

594

chester@chesterzelaya·2d

Make it five

Ivan Kuleshov@Merocle

Better than one local AI agent might be only four local AI agents

English

1.3K

90.4K

Christian Balbin@C_Balbin·2d

@bedesqui @heliumbrowser @uwukko totally browsermogged the browser company

English

Igor bedesqui@bedesqui·2d

took them LESS THAN A MONTH to ship this @heliumbrowser , you are the best x.com/heliumbrowser/…

Igor bedesqui@bedesqui

day 01 of asking @uwukko to make the compact mode more compact

English

168

7.8K

Christian Balbin@C_Balbin·2d

@theo @cmgriffing I’ve extensively benchmarked these and Qwen3.5 27B is simply the best open model in this size class.

English

139

Theo - t3.gg@theo·2d

@cmgriffing Things are ramping up fast! I have been super unimpressed with Gemma 4 though. Horrible numbers in all my benches :(

English

151

11K

Chris Griffing@cmgriffing·2d

Emergency @theo stream incoming titled "Was I wrong about local models?" I'm not sure it will sway his opinion that much but this is a pretty compelling graph considering Minimax 2.7 is really good for the stuff I have been doing code-wise.

English

12.7K

Descobrir

@OpenAI @denuvocracker @theo @JoshRadDev @yacineMTB @sama @jardel1307 @LLMJunky