Christian Balbin

216 posts

Christian Balbin

Christian Balbin

@asipilled

Health AI research | PhD ‘24 @UUtah

Katılım Ağustos 2025
455 Takip Edilen53 Takipçiler
Christian Balbin retweetledi
wd 🔺
wd 🔺@populartourist·
Qwen3.6 27B and 35B-A3B are amazing models, but nothing reaches the efficiency of GPT-OSS yet. Qwen3.6 35B-A3B is as fast as GPT-OSS-20B but nowhere near the prefill performance.
English
21
1
98
20.8K
jason
jason@jxnlco·
People don't know this, but I actually leave my laptop at the office on the weekends, and I skate in every time I need to check my computer.
English
27
2
251
26K
Zach Mueller
Zach Mueller@TheZachMueller·
In today’s adventures of working at Lambda is super cool, I get to have a B200 rack in my house*
Zach Mueller tweet media
English
20
2
126
8.8K
David Bonanno
David Bonanno@BonannoDavid·
I’m not doing this X thing correctly. I’m the CFO of a company that just announced the largest crypto M&A deal in history, I posted about it, then reposted my own posts and somehow I only gained one follower this week…. Can someone tell me what I’m doing wrong?
David Bonanno tweet media
English
917
40
1.3K
305.4K
Christian Balbin
Christian Balbin@asipilled·
@bnjmn_marie Your account is literally straight alpha for local LLMs. Thanks for doing this !
English
0
0
1
46
Benjamin Marie
Benjamin Marie@bnjmn_marie·
I benchmarked Google’s new MTP for Gemma 4 31B using vLLM with 4 speculative tokens, a fairly conservative setup. Results: - Much higher throughput than Qwen3.6’s MTP - Lower latency too, helped by Gemma 4 generating fewer tokens - For coding tasks with reasoning enabled, Gemma 4 is now at least 6x faster than Qwen3.6. So you can generate 5 outputs, run your tests to select the best one, and it would still be cheaper than a single output by Qwen3.6. I’ve updated my full comparison with the new numbers: kaitchup.substack.com/p/qwen36-27b-v… I also confirmed what others have reported: Gemma 4’s MTP handles a high number of speculative tokens very well. On simple text generation, I’m now testing values above 10 and reached 129 tok/s on an RTX Pro 6000, compared with 20 tok/s without MTP. Next step: confirming how this translates to real tasks.
Benjamin Marie tweet media
English
32
36
331
34K
Loktar 🇺🇸
Loktar 🇺🇸@loktar00·
@vllm_project Bah even with nightly 20.2.rc1 I keep getting: NotImplementedError: Speculative Decoding with draft models or parallel drafting does not support multimodal models yet
English
3
0
8
1.7K
vLLM
vLLM@vllm_project·
🚀 Day-0 MTP support for Gemma4 now available at vLLM with ready-to-use docker image! ⚡️Enjoy up to 3x faster decoding performance to supercharge your development with zero quality degradation! Check out the full vLLM recipes for Gemma 4 model series👇 recipes.vllm.ai/Google/gemma-4…
vLLM tweet media
Google for Developers@googledevs

Gemma 4: Now up to 3x Faster. ⚡ Same quality, way more speed. Our new MTP drafters allow Gemma 4 to predict multiple tokens at once, effectively tripling your output speed without compromising intelligence.

English
18
99
905
88.6K
Christian Balbin
Christian Balbin@asipilled·
@YouJiacheng they should know these are wild claims and that people would rightfully be skeptical. they should have had trusted a 3rd party benchmark them
English
0
0
2
2.5K
Sam Altman
Sam Altman@sama·
hey chat, we haven't forgotten about you 👀
English
1.5K
167
8.9K
1.6M
Christian Balbin
Christian Balbin@asipilled·
@jxnlco @jxmnop they did not. 1M context is still the default. I agree it feels like it degrades past 200k tho
Christian Balbin tweet media
English
0
0
1
198
Jack Morris
Jack Morris@jxmnop·
it is endlessly fascinating to me that we still don't have a true 1M-context model it's an unusual case where the infra is far ahead of the science. Claude discontinued 1M+ context bc it didn't really work past ~200k we don't have the right data? training techniques? not sure
English
164
23
1K
257.5K
Christian Balbin
Christian Balbin@asipilled·
how i feel using xhigh to make the most simple UI cosmetic changes
English
0
1
2
74
Christian Balbin
Christian Balbin@asipilled·
@sama that 'not accepted' email spiked my cortisol lol but appreciate the kindness here. OpenAI is one of the few companies where it really feels like they care deeply about the community
English
0
0
1
327
Sam Altman
Sam Altman@sama·
we are gonna do something nice for everyone who applied for the GPT-5.5 party and that we didn't have space for. hope you enjoy!
English
1.2K
188
8.1K
885.5K
Christian Balbin
Christian Balbin@asipilled·
anyone using ChatGPT Atlas ? just remembered about it when going through my apps
Christian Balbin tweet media
English
0
0
0
57
Dara A.
Dara A.@daradoescode·
Computer use shenanigans
English
1
0
22
1.1K
dudu
dudu@dudufolio·
when your agents are running but you have to reach the top shelf
dudu tweet media
English
46
9
507
19.2K
Christian Balbin
Christian Balbin@asipilled·
@MatthewBerman i would just run the draft model on the dgx station. My assumption is you won’t get too much of a gain by offloading it to the spark.
English
1
0
0
153