Christian Balbin

123 posts

Christian Balbin

Christian Balbin

@C_Balbin

Medical AI research | PhD ‘24 @UUtah

가입일 Ağustos 2025
268 팔로잉37 팔로워
@jason
@jason@Jason·
if anything remotely close to this CyberSUV concept drops, it will take over the entire SUV category Would love to have hyper-lifted feature in this, in order to navigate Tahoe blizzards!
@jason tweet media
English
559
217
6.2K
315.4K
Christian Balbin 리트윗함
Benjamin Marie
Benjamin Marie@bnjmn_marie·
Gemma 4 31B vs Qwen3.5 27B: Inference Throughput RTX Pro 6000 (vLLM; 8k input tokens; 64k output tokens): - Qwen3.5 27B: 45 tok/sec with MTP (17.9 tok/sec without MTP) - Gemma 4 31B: 12.3 tok/sec The gap is large. But still, end-to-end, Gemma 4 31B remains faster since it generates fewer tokens. Will publish more about this soon. Eagle support for Gemma 4 31B is also coming to vLLM and should significantly accelerate inference.
English
17
7
159
22.7K
Christian Balbin 리트윗함
Mark Kretschmann
Mark Kretschmann@mark_k·
OpenAI nerfed Codex and now you hit the limit after one hour of coding. Obviously they want you to sell the new $100 Pro plan instead. Not cool, @OpenAI 😠
English
179
85
1.4K
67.7K
Christian Balbin 리트윗함
Tyler
Tyler@rezoundous·
Dear OpenAI, I am hitting the 5h session limit within 10 minutes on my Codex Plus plan. You have made it completely unusable overnight. Please fix it. Thanks.
English
276
148
2.5K
194.4K
Christian Balbin
Christian Balbin@C_Balbin·
@denuvocracker @theo @JoshRadDev Nah that’s correct. Another nuance is that what people are leaking is actually the developer prompt. The sys prompt is a higher level and contains these parameters. So makes sense that the leaked “sys” prompts don’t have a reasoning effort parameter in them
English
0
0
1
36
Denuvo Cracker
Denuvo Cracker@denuvocracker·
@C_Balbin @theo @JoshRadDev iirc gpt oss gives a string for the reasoning effort they dont use the juice number that proprietrary openai models use, could be wrong on this tho
English
1
0
1
45
Theo - t3.gg
Theo - t3.gg@theo·
@JoshRadDev Any evidence of this? The leaked system prompts don't include any information about this
English
12
0
21
12K
Christian Balbin
Christian Balbin@C_Balbin·
@theo Idk if that’s a hallucination or not but makes some sense . GPT-OSS’ reasoning effort is literally just controlled by the system prompt
English
1
0
3
1.8K
kache
kache@yacineMTB·
Welp. The cats out of the bag
English
32
3
210
52.3K
Christian Balbin
Christian Balbin@C_Balbin·
@sama Thats awful, sorry that happened to you. I really hope that the anti-AI sentiment will abate as AI helps us discovery new science, cures etc. I hope people see the positive
English
0
0
0
541
Jardel
Jardel@jardel1307·
@LLMJunky no, not even close. it is not same usage amount. i am 100% sure.
English
2
0
6
161
Christian Balbin 리트윗함
kache
kache@yacineMTB·
he's right
kache tweet media
English
95
184
3.2K
144.3K
Christian Balbin 리트윗함
Red Hat AI
Red Hat AI@RedHat_AI·
Speculative decoding for Gemma 4 31B (EAGLE-3) A 2B draft model predicts tokens ahead; the 31B verifier validates them. Same output, faster inference. Early release. vLLM main branch support is in progress (PR #39450). Reasoning support coming soon. huggingface.co/RedHatAI/gemma…
English
13
32
328
19.9K
Christian Balbin
Christian Balbin@C_Balbin·
@simonw i see this all the time. i have to tell all my friends not to judge AI by the voice model.
English
0
0
0
60
Simon Willison
Simon Willison@simonw·
I think it's non-obvious to many people that the OpenAI voice mode runs on a much older, much weaker model - it feels like the AI that you can talk to should be the smartest AI but it really isn't
Andrej Karpathy@karpathy

Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions. TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.

English
116
30
1.6K
263.8K
Pruna AI
Pruna AI@PrunaAI·
Models are getting fast. We make them faster. 🚀 We just deployed optimized inference for Google's Gemma 4 26B on @replicate, and we've managed to squeeze performance a lot against other deployments: ⚡ +20% throughput ⏱️ -50% time to first token We made Gemma more production-ready by applying a bag of tricks: kernel tuning, quantization, backend selection, and the secret sauce that makes good models shine in fast, affordable deployments. Gemma 4 is a remarkable open model, and we're proud to help it reach its full potential. 👉 Try it yourself: replicate.com/prunaai/gemma-…
English
3
16
131
6.6K
Christian Balbin
Christian Balbin@C_Balbin·
@phuctm97 I would just leave it on high until it gets stumped (rare) and then move it to xhigh. If you leave it in xhigh it will routinely overcomplicate things.
English
0
0
7
694
Minh-Phuc Tran
Minh-Phuc Tran@phuctm97·
Is it true that Codex GPT 5.4 with Extra High thinking effort is worse than with Medium/High thinking effort? If it’s true, that’d be very bad design. 😅
English
80
1
193
43.3K
Ahmad
Ahmad@TheAhmadOsman·
Some stats of my last run on the 4x DGX Sparks before I call it a night
Ahmad tweet media
English
13
4
102
8.2K