Lema
1.3K posts

Lema
@lemantorus
fulltime vibecoder






We push Prefill/Decode disaggregation beyond a single cluster: cross-datacenter + heterogeneous hardware, unlocking the potential for significantly lower cost per token. This was previously blocked by KV cache transfer overhead. The key enabler is our hybrid model (Kimi Linear), which reduces KV cache size and makes cross-DC PD practical. Validated on a 20x scaled-up Kimi Linear model: ✅ 1.54× throughput ✅ 64% ↓ P90 TTFT → Directly translating into lower token cost. More in Prefill-as-a-Service: arxiv.org/html/2604.1503…




Introducing Gemini 3.1 Flash TTS 🗣️, our latest text to speech model with scene direction, speaker level specificity, audio tags, more natural + expressive voices, and support for 70 different languages. Available via our new audio playground in AI Studio and in the Gemini API!





Very interesting article. A lot to chew on. A couple things that really stuck out to me. 1. Acc to this report that the biggest risk to NVidia’s margins and profitability ironically isn’t its technology superiority eroding due to competitors catching up but rather the absolute technical dominance of its hardware-software ecosystem creating insurmountable strategic national security value and inviting more US govt direct interventions from export controls to outright margin or IP influence? Will the US govt want to take a 10% strategic stake in NVDA? 2. Export controls have “worked” insofar as having slowed down China’s AI development. Without access to the most advanced GPUs, China has to spend time and money to reinvent key technologies that already exist in the U.S. before embarking on the multi-trillion parameters models. To Jensen’s credit, this delay isn’t free but may be bought at the expense of spurring a Chinese AI hardware-software ecosystem that’s eventually entirely independent of NVidia. That would be bad for NVidia and he argues also for the US technological dominance. It seems that indeed may be the case bc “NVidia isn’t a car company” that’s easily substitutable but a sticky and powerful ecosystem or OS. Except to Jensen’s critics, that’s also exactly why they want to limit China’s access to it. China was always going to try to build a NVidia tech stack replica from CUDA to the interconnect software like NVLink or NIXL. Delaying China by a few months or a couple years is exactly the goal the proponents of the export controls are after. At that point, as I said in a post yesterday, the difference stems from a difference in existential values.

Let’s bet on America — not against it. I’m proposing a 10% fee on prediction markets and online gambling to fund an American Innovation Fund—investing in AI, quantum computing, fusion energy, life sciences, and national security tech. President Xi is investing in the future, whereas President Trump’s cuts in NIH and NSF is defaulting on America’s preeminence. That’s not how you win. Let’s fund cures, build new energy resources, lead in defense technologies and quantum tech. Stop falling further behind, start investing in winning again. That’s the American way. bloomberg.com/news/articles/…















