Kevin

4.2K posts

Kevin

@TheOneKev

Product Owner building independent AI systems. Interested in psychology, politics & the global economy

参加日 Haziran 2020

475 フォロー中178 フォロワー

Kevin@TheOneKev·4h

Still blown away by the speed Gemma 4 now mit MTP. Did 31B and 26B with 4bit and 8bit quants from @RedHat_AI and what can I say, those speeds are nuts. Especially the MoE model is just flying now on a single RTX 5000 Pro.

English

Kevin@TheOneKev·22h

@_philschmid @arena Big fan of both models. Especially now with MTP, they're both just crazy good. Token/s is just mind-blowing in combination with the intelligence, and super smooth on my RTX 5000 Pro. Really worthy successors for the Gemma 3 series. Amazing job.

English

Philipp Schmid@_philschmid·1d

Gemma 4 shifts Pareto Frontier on Code @arena.🔥 Among open models, Gemma-4-31b ranks #13 and Gemma-4-26b-a4b ranks #17. Pretty good for open models you can run a MBP. 👀

English

4.4K

Kevin@TheOneKev·23h

@yabbanx Half of the stuff probably won't work in 5 years. I like tech gimmicks, but too much - unnecessar stuff - just means expensive maintenance and repair costs.

English

302

yaw.@yabbanx·1d

China is waging a serious ‘war’ against Western car manufacturers. This is not normal bro. 😂 🏷️ $80,000

English

1.6K

3.6K

22.1K

2.6M

Kevin がリツイート

Arena.ai@arena·1d

Gemma-4 lands in Code Arena: Frontend Webdev and shifts the Pareto Frontier! Among open models, Gemma-4-31b ranks #13 and Gemma-4-26b-a4b ranks #17. Congrats to @GoogleDeepMind on shifting the frontier!

Google DeepMind@GoogleDeepMind

Meet Gemma 4: our new family of open models you can run on your own hardware. Built for advanced reasoning and agentic workflows, we’re releasing them under an Apache 2.0 license. Here’s what’s new 🧵

English

388

38.1K

Kevin@TheOneKev·1d

@osanseviero Absolute game changer. Great work, fellas. No doubt!

English

Omar Sanseviero@osanseviero·2d

Excited to introduce Gemma 4 Multi-Token Prediction Drafters⚡️Accelerated inference right in your pockets - Up to a 3x speedup - Same quality guarantees - Available in your favorite open-source tools

English

120

145.2K

Kevin@TheOneKev·1d

@bnjmn_marie That was really that critical thing missing with Gemma 4 31B. Speed. Now that this also is there, best open model for me right now. Love it so far.

English

254

Benjamin Marie@bnjmn_marie·1d

I benchmarked Google’s new MTP for Gemma 4 31B using vLLM with 4 speculative tokens, a fairly conservative setup. Results: - Much higher throughput than Qwen3.6’s MTP - Lower latency too, helped by Gemma 4 generating fewer tokens - For coding tasks with reasoning enabled, Gemma 4 is now at least 6x faster than Qwen3.6. So you can generate 5 outputs, run your tests to select the best one, and it would still be cheaper than a single output by Qwen3.6. I’ve updated my full comparison with the new numbers: kaitchup.substack.com/p/qwen36-27b-v… I also confirmed what others have reported: Gemma 4’s MTP handles a high number of speculative tokens very well. On simple text generation, I’m now testing values above 10 and reached 129 tok/s on an RTX Pro 6000, compared with 20 tok/s without MTP. Next step: confirming how this translates to real tasks.

English

306

20.5K

Kevin@TheOneKev·1d

@bnjmn_marie Tested it on my two RTX 5000 Pros, one model on each, both Gemma 4 31B, both FP8, one with and one without MTP. The 3x was no lie. Went from 30 to almost 100 tok/s. That's incredible.

English

Benjamin Marie@bnjmn_marie·2d

Gemma 4 was very slow compared to Qwen3.6. Now, it's probably much faster! I'll publish my own numbers tomorrow

Google for Developers@googledevs

Gemma 4: Now up to 3x Faster. ⚡ Same quality, way more speed. Our new MTP drafters allow Gemma 4 to predict multiple tokens at once, effectively tripling your output speed without compromising intelligence.

English

199

18.4K

Kevin@TheOneKev·1d

Ok, had to try it out. Doing the first runs, both FP8 and each running on a RTX 5000 Pro. And what can I say? hey weren't exaggerating. ~30 tok/s vs almost 100 tok/sec. Which meant in that test run reducing the time from 16 to 5 secs. And I can't see any degradation or similar. Great job @googlegemma

Google Gemma@googlegemma

Gemma 4 just got even faster! We're releasing Multi-Token Prediction (MTP) drafters that deliver up to a 3x speedup, without any degradation in output quality or reasoning logic.

English

Kevin がリツイート

Google Gemma@googlegemma·2d

Gemma 4 just got even faster! We're releasing Multi-Token Prediction (MTP) drafters that deliver up to a 3x speedup, without any degradation in output quality or reasoning logic.

GIF

English

354

3.3K

193.5K

Kevin@TheOneKev·2d

@hamptonism Dope locations for a hackathon

English

ₕₐₘₚₜₒₙ@hamptonism·2d

If anyone wants to do this this summer I have giant villa on the Amalfi Coast & Lake Como that I’ll be hosting hackathons from.

ol’ stocky ⛳️@oldstocky

Your soul needs this, not the Met Gala

English

736

57.8K

Kevin@TheOneKev·2d

@micheltamanda Thanks, my man. Much appreciated :)

English

Michel Laclé@micheltamanda·2d

@TheOneKev This is the pro move brother! You gave me inspiration to move to the next level.

English

Michel Laclé@micheltamanda·2d

What LLM gateway are you using? I built my own to have a single point of configuration for my local AI systems. How did you all solve the pain point of having many local models over many local machines.

English

2.2K

Kevin@TheOneKev·2d

@micheltamanda I use mine for local and external models though. Plus API Key Management.

English

Kevin@TheOneKev·2d

@micheltamanda Exactly the same. Built my own.

English

Kevin@TheOneKev·2d

@Dozer3000 @gas0linr Stimme ich halbwegs zu. Zumindest was den Status Quo betrifft. Aber wenn man überlegt, wo LLMs herkommen und dass Scaling bis jetzt immer noch was bringt, abgesehen von evtl neuen Architekturen bald, wird bald kein Dev mehr mithalten können. Biologisch nicht möglich.

Deutsch

122

Timotheus V.@Dozer3000·2d

@gas0linr Die Aussage ist meines Erachtens völliger Quark. Das Studium lohnt sich. Nur weil man mit vibe-coding mit ein paar Agenten effizienter arbeiten kann, werden gute ITler nicht obsolet. Reine Script-Coder haben es schwerer, aber sonst gibt genug zu tun.

Deutsch

171

11.9K

Yves@gas0linr·2d

Es ist ein wirklich irres Gefühl als Informatiker zu sehen, was der technologische Fortschritt mit der Branche macht. Und es ist beängstigend zu erkennen, dass 90% der Menschen das nicht kommen sehen. Als Student der Informatik würde ich jetzt (!) abbrechen und mich orientieren.

Deutsch

136

800

123.9K

Kevin@TheOneKev·2d

@gas0linr 90% ist wahrscheinlich noch sehr(!) optimistisch.

Deutsch

Kevin@TheOneKev·2d

Sometimes it's hard to understand, when you're right in the middle of it, but this is literally history in the making. And I honestly think just a precursor of what will come. I think @sama even said it himself, that it will probably get bad first, before it can(!) become good.

Anonymous@YourAnonNews

Kevin O'Leary's massive data center was approved by a county commission in Utah last night without residents' approval of the measure. At 40,000 acres, it would be 2.5x the size of Manhattan. The commission approved the proposal despite opposition from hundreds of locals.

English

Kevin@TheOneKev·2d

@isabelunraveled As a dad, makes me happy reading that reply from you dad. That's how it should be. He's doing a great job.

English

Isabel🌻@isabelunraveled·3d

me and my dad this week // me and my dad just after i was born

English

151

5.5K

236.3K

Kevin@TheOneKev·3d

@TheoMediaAI @sama I mostly agree. The only things that come to my mind in that scenario is b) for how long (driving) that will still be a thing, but even more b) Augmented Reality, e.g. HUD. Great combo. No doubt though, voice only will have it's use cases. Just not standard standalone UI.

English

Theoretically Media@TheoMediaAI·3d

@TheOneKev @sama Agreed, but also: Driving. So, yeah: Voice only has a place.

English

Sam Altman@sama·3d

pretty excited for voice models to get great its interesting to watch how people are already starting to change the way they interface with AI

English

930

242

6.3K

645K

Kevin@TheOneKev·3d

@TheAhmadOsman Ngl, that 10x tokens thing...they really know how to get me. Tokenite 😄

English

326

Ahmad@TheAhmadOsman·3d

People keep treating everything like isolated events - Dario / Anthropic fearmongering - Policy maker pressure - Elon’s lawsuit - Sudden 10x tokens - SF parties All just random coincidences? Come on, look more than 2 steps ahead We’re surrounded by existential risks & Psyops

English

384

31.8K

Kevin@TheOneKev·3d

@0xSero Wait...you guys get the party, plus the "band-aid"?