tung
992 posts

tung
@yoquankara
CTO/CDO at a #contech startup. Interested in Representation/Continual learning.




bruh

The Jensen Huang episode. 0:00:00 – Is Nvidia’s biggest moat its grip on scarce supply chains? 0:16:25 – Will TPUs break Nvidia’s hold on AI compute? 0:41:06 – Why doesn’t Nvidia become a hyperscaler? 0:57:36 – Should we be selling AI chips to China? 1:35:06 – Why doesn’t Nvidia make multiple different chip architectures? Look up Dwarkesh Podcast on YouTube, Apple Podcasts, Spotify, etc. Enjoy!


Gemma 4 is here! 🧠 31B and 26B A4B for models with impressive intelligence per parameter 🤏E2B and E4B for mobile and IoT 🤗Apache 2.0 🤖Base and IT checkpoints available Available in AI Studio, Hugging Face, Ollama, Android, and your favorite OS tools 🚀Download it today!




In a recent chat with a Gemini VP regarding hiring philosophy, one trait he emphasized: the combination of low ego and high competence. We are no longer in an era defined by individual papers or claims of ownership. Success today requires a 'last mile' mindset—a relentless focus on doing whatever work is necessary to deliver world-class models. A team member who pairs high contribution with low ego simplifies and energizes the entire organization. In this hyper-competitive frontier, the delta between contribution and ego has become a key metric for identifying the talent that actually moves the needle.

Meta × TBD Lab × CMU × UChicago × UMaryland In our latest work, we introduce Token-Level LLM Collaboration via FusionRoute 📝: arxiv.org/pdf/2601.05106 LLMs have come a long way, but we continue to face the same trade-off: – one huge model that kind of does everything, but is expensive and inefficient, or – many small specialist models that are cheap, but brittle outside their comfort zones We’ve tried a lot of things in between — model merging, MoE, sequence-level agents, token-level routing, controlled decoding, etc. Each helps a bit, but all come with real limitations. A key realization behind FusionRoute is: Pure token-level model selection is fundamentally limited, unless you assume unrealistically strong global coverage. We show this formally. And then we fix it by letting the same router also generate. Concretely, FusionRoute is a lightweight router LLM that – performs token-level model selection, and – directly contributes complementary logits to refine or correct the selected specialist when it fails So it's not "routing + another model" — the router itself is part of the decoding policy as well. This turns token-level collaboration from a brittle "pick-an-expert" problem into a strictly more expressive policy. No joint training of specialized models. No model merging. No full multi-agent rollouts. In our experiments, FusionRoute works across math, coding, instruction following, and consistently outperforms sequence-level collaboration, prior token-level methods, model merging, and even direct fine-tuning. Feeling especially timely as LLM systems (e.g., GPT-5) move toward routing-based, heterogeneous model stacks (whether prompt-level or test-time).

This paper looks interesting too. Maybe 2026 will be the Year of the Improved Residual Connection.









The @karpathy interview 0:00:00 – AGI is still a decade away 0:30:33 – LLM cognitive deficits 0:40:53 – RL is terrible 0:50:26 – How do humans learn? 1:07:13 – AGI will blend into 2% GDP growth 1:18:24 – ASI 1:33:38 – Evolution of intelligence & culture 1:43:43 - Why self driving took so long 1:57:08 - Future of education Look up Dwarkesh Podcast on YouTube, Apple Podcasts, Spotify, etc. Enjoy!

LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA. thinkingmachines.ai/blog/lora/



@suchenzang Literally me in half my code reviews lately. "Did you vibe code this?!" Is a meme over here now
