
Tim Franzmeyer
64 posts

Tim Franzmeyer
@frtimlive
Research Scientist @googledeepmind Gemini, Reinforcement Learning, Agents


What if LLMs knew when to stop? 🚧 HALT finetuning teaches LLMs to only generate content they’re confident is correct. 🔍 Insight: Post-training must be adjusted to the model’s capabilities. ⚖️ Tunable trade-off: Higher correctness 🔒 vs. More completeness 📝 with @AIatMeta 🧵





Self play works so well in chess, go, and poker because those games are two-player zero-sum. That simplifies a lot of problems. The real world is messier, which is why we haven’t seen many successes from self play in LLMs yet. Btw @karpathy did great and I mostly agree with him!










This looks insane, SynCity Training-Free Generation of 3D Worlds

Why can AIs code for 1h but not 10h? A simple explanation: if there's a 10% chance of error per 10min step (say), the success rate is: 1h: 53% 4h: 8% 10h: 0.002% @tobyordoxford has tested this 'constant error rate' theory and shown it's a good fit for the data chance of success declines exponentially







What if LLMs knew when to stop? 🚧 HALT finetuning teaches LLMs to only generate content they’re confident is correct. 🔍 Insight: Post-training must be adjusted to the model’s capabilities. ⚖️ Tunable trade-off: Higher correctness 🔒 vs. More completeness 📝 with @AIatMeta 🧵

What if LLMs knew when to stop? 🚧 HALT finetuning teaches LLMs to only generate content they’re confident is correct. 🔍 Insight: Post-training must be adjusted to the model’s capabilities. ⚖️ Tunable trade-off: Higher correctness 🔒 vs. More completeness 📝 with @AIatMeta 🧵



