Christian Balbin (@asipilled) - Twitter Profili

Christian Balbin retweetledi

wd 🔺@populartourist·16 May

Qwen3.6 27B and 35B-A3B are amazing models, but nothing reaches the efficiency of GPT-OSS yet. Qwen3.6 35B-A3B is as fast as GPT-OSS-20B but nowhere near the prefill performance.

English

21

1

98

20.8K

Christian Balbin@asipilled·14 May

@OpenAIDevs @coreyching on the toilet seat is wild

English

0

1

51

OpenAI Developers@OpenAIDevs·14 May

ZXX

402

408

5.9K

1.3M

Christian Balbin@asipilled·11 May

goalmaxxing

English

0

15

Christian Balbin@asipilled·10 May

@jxnlco as if you don’t have like 50 MacBooks 😆

English

0

521

jason@jxnlco·10 May

People don't know this, but I actually leave my laptop at the office on the weekends, and I skate in every time I need to check my computer.

English

27

2

251

26K

Christian Balbin@asipilled·8 May

@TheZachMueller want

English

0

99

Zach Mueller@TheZachMueller·8 May

In today’s adventures of working at Lambda is super cool, I get to have a B200 rack in my house*

English

20

2

126

8.8K

Christian Balbin@asipilled·8 May

@BonannoDavid fine. I’ll follow

English

0

David Bonanno@BonannoDavid·8 May

I’m not doing this X thing correctly. I’m the CFO of a company that just announced the largest crypto M&A deal in history, I posted about it, then reposted my own posts and somehow I only gained one follower this week…. Can someone tell me what I’m doing wrong?

English

917

40

1.3K

305.4K

Christian Balbin@asipilled·7 May

@bnjmn_marie Your account is literally straight alpha for local LLMs. Thanks for doing this !

English

0

1

46

Benjamin Marie@bnjmn_marie·7 May

I benchmarked Google’s new MTP for Gemma 4 31B using vLLM with 4 speculative tokens, a fairly conservative setup. Results: - Much higher throughput than Qwen3.6’s MTP - Lower latency too, helped by Gemma 4 generating fewer tokens - For coding tasks with reasoning enabled, Gemma 4 is now at least 6x faster than Qwen3.6. So you can generate 5 outputs, run your tests to select the best one, and it would still be cheaper than a single output by Qwen3.6. I’ve updated my full comparison with the new numbers: kaitchup.substack.com/p/qwen36-27b-v… I also confirmed what others have reported: Gemma 4’s MTP handles a high number of speculative tokens very well. On simple text generation, I’m now testing values above 10 and reached 129 tok/s on an RTX Pro 6000, compared with 20 tok/s without MTP. Next step: confirming how this translates to real tasks.

English

32

36

331

34K

Christian Balbin@asipilled·7 May

@loktar00 @vllm_project same here. also the rc broke the generation of xml/html in tool calls because of a parsing issue.

English

0

1

26

Loktar 🇺🇸@loktar00·6 May

@vllm_project Bah even with nightly 20.2.rc1 I keep getting: NotImplementedError: Speculative Decoding with draft models or parallel drafting does not support multimodal models yet

English

3

0

8

1.7K

vLLM@vllm_project·5 May

🚀 Day-0 MTP support for Gemma4 now available at vLLM with ready-to-use docker image! ⚡️Enjoy up to 3x faster decoding performance to supercharge your development with zero quality degradation! Check out the full vLLM recipes for Gemma 4 model series👇 recipes.vllm.ai/Google/gemma-4…

Google for Developers@googledevs

Gemma 4: Now up to 3x Faster. ⚡ Same quality, way more speed. Our new MTP drafters allow Gemma 4 to predict multiple tokens at once, effectively tripling your output speed without compromising intelligence.

English

18

99

905

88.6K

Christian Balbin@asipilled·5 May

@YouJiacheng they should know these are wild claims and that people would rightfully be skeptical. they should have had trusted a 3rd party benchmark them

English

0

2

2.5K

You Jiacheng@YouJiacheng·5 May

81.8% swe-bench in the first release??? wow. is this legit?

Alexander Whedon@alex_whedon

Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA), And the first frontier model with a 12 million token context window which is: - 52x faster than FlashAttention at 1MM tokens - Less than 5% the cost of Opus Transformer-based LLMs waste compute by processing every possible relationship between words (standard attention). Only a small fraction actually matter. @subquadratic finds and focuses only on the ones that do. That's nearly 1,000x less compute and a new way for LLMs to scale.

English

27

8

328

77.8K

Christian Balbin@asipilled·5 May

@rileybrown i'll believe it when i see it on @ArtificialAnlys

English

0

3

3.8K

Riley Brown@rileybrown·5 May

Chat is this real?

Alexander Whedon@alex_whedon

Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA), And the first frontier model with a 12 million token context window which is: - 52x faster than FlashAttention at 1MM tokens - Less than 5% the cost of Opus Transformer-based LLMs waste compute by processing every possible relationship between words (standard attention). Only a small fraction actually matter. @subquadratic finds and focuses only on the ones that do. That's nearly 1,000x less compute and a new way for LLMs to scale.

English

55

12

382

127.1K

Christian Balbin@asipilled·5 May

@sama 👋

QME

0

9

Sam Altman@sama·5 May

hey chat, we haven't forgotten about you 👀

English

1.5K

167

8.9K

1.6M

Christian Balbin@asipilled·5 May

@OpenAIDevs @thsottiaux @jxnlco are legends . That’s a lot of tokens…..

English

0

9

Christian Balbin@asipilled·4 May

@jxnlco @jxmnop they did not. 1M context is still the default. I agree it feels like it degrades past 200k tho

English

0

1

198

jason@jxnlco·4 May

@jxmnop They did?

English

6

0

46

17.9K

Jack Morris@jxmnop·4 May

it is endlessly fascinating to me that we still don't have a true 1M-context model it's an unusual case where the infra is far ahead of the science. Claude discontinued 1M+ context bc it didn't really work past ~200k we don't have the right data? training techniques? not sure

English

164

23

1K

257.5K

Christian Balbin@asipilled·4 May

how i feel using xhigh to make the most simple UI cosmetic changes

English

0

1

2

74

Christian Balbin@asipilled·4 May

@sama that 'not accepted' email spiked my cortisol lol but appreciate the kindness here. OpenAI is one of the few companies where it really feels like they care deeply about the community

English

0

1

327

Sam Altman@sama·4 May

we are gonna do something nice for everyone who applied for the GPT-5.5 party and that we didn't have space for. hope you enjoy!

English

1.2K

188

8.1K

885.5K

Christian Balbin@asipilled·4 May