
@professor_b_ At the very least you have to have the extra qps available. But am also curious. Any token overhead?
English
kaolin fire
31.2K posts

@kaolinfire
he/him ... a conglomeration of ideas, side projects, and experiments. Former web developer of 15 years, spinner of lies and truths (writer/editor), c++/as3/tla



LLMs are getting crazily good at reasoning — but also crazily slow. Hard problems can make them think for hours. Why? Even with tons of GPUs, they still decode one. token. at. a. time.⏳ More GPUs ≠ faster answers Our ThreadWeaver🧵⚡asks: “Why not make LLMs think in parallel?” 🧵1/N👇














