Grummz: "Every AI company should be optimizing for speed of inference. It’s the differen"

Post

Grummz@Grummz·9h

Every AI company should be optimizing for speed of inference. It’s the difference between at 10fps game and a 60fps game. The same game feels horrible at 10 and amazing at 60. It’s not just a quality be degrees situation, it’s transformative. The same model feels 100x smarter and can get more done with realtime inference.

English

121

8.7K

Sunny Golovine@sunnygg·9h

It’s a real balancing act. For example I use Codex Spark a lot and while it’s a great model, it has its limitations around reasoning and most importantly context window size. I see it more like tools in a toolbox. There are times you want to reach for the ultra fast model cause you don’t need reasoning or a deep context window, you just need to do this small thing FAST. But there are other times you want to build something big and need that larger context window size and the more advanced reasoning.

English

174

Wallscreet@Wallscreet·8h

@Grummz I honestly think this is putting more expectation than necessary on models and providers. As long as they move at an equal or greater pace than human capability we should be satisfied overall. Context and model adaptation/learning ability is still the frontier imho.

English

ᛟ Gungner ᛟ@Gungner297671·9h

@Grummz 60 is old standard. They should aim for 120

English

テストさん@testo_san·3h

@Grummz I didn't find out until later that DLSS 5.0 currently requires two RTX5090s & they didn't even specify which cards can use it efficiently b/c they prob don't even know yet. What an absolute sh*tshow. Also I get the impression many ppl still can't tell diff between 30fps & 60fps.

English

Dochex@doc_hex1337·9h

@Grummz There are already dedicated ASIC inference chip that would work. Soon it will be CPU, GPU and IPU

English

Petri Kuittinen@KuittinenPetri·8h

@Grummz Faster response is of course nice, but if it speed comes at a noticeable cost to accuracy, this is a big NO for research tasks, fixing hard bugs, translating difficult texts (e.g. old poetry). In those tasks I would be happy with a slower, yet high quality response.

English

Mark Kretschmann@mark_k·9h

@Grummz It depends on the task. If I ask a complex question that requires research, I'm irritated if the model gives an instant answer. That's because I know that the task requires reasoning time.

English

390

BetMGM 🦁@BetMGM·1d

Pick which twin will win! You could score a share of $2 million in Bonus Bets.

English

266

164

3.4K

43.1M

BlaiseBits@BlaiseBits·9h

@Grummz Inference is nice, but being correct is better. In this case, 30fps of awesome is better than 60fps of slop.

English

144

🎯🔫👌@gurgle_io·8h

@Grummz 100 %. I can't have 30 seconds while it's thinking. Also it never gives a better answer anyway. Maybe for some tailored workflows, but not for general stuff. Response should be instant and it should fit on the screen. Any game dev would have told you that. @xai you're welcome.

English

aidenpryde@aidenpryde·8h

@Grummz My question is why are they focused so much on "enhancing the graphics." NPCs are still hard coded and running off of the CPU. The promised land for me is a local LLMs running the NPCs. They respond to what you are doing, the world events, etc in real time.

English

nør 🏴‍☠️@meta_acc·9h

@Grummz I would say first fps is more akin to output quality. Inference speed is more like ping.

English

DHYohko@DHYohko·7h

@Grummz Depends on what I ask. Most of the time im willing to wait minutes for more accurate data, expecially regarding tcg rulings and other such things. If its too fast im suspicious about innacurate info.

English

LeavingTheRestToYou@RenTheLast·8h

@Grummz May all your dreams come true.

English

Amsanir@Amsanir·9h

@Grummz This tweet was written by ai

English

Tony Nova@TonyNova_·9h

@Grummz Ah yes, give the skitzos coke

English