Behrooz Ghorbani
158 posts

Behrooz Ghorbani
@_ghorbani
Head of Science of Scaling @Reflection_ai. Formerly @OpenAI, @GoogleBrain and @stanford_ee. Opinions expressed are solely my own.


Hi friends, after three incredible years at OpenAI I am excited to share that I am starting a new chapter at @reflection_ai, where I will be leading the Science of Scaling team. Our mission is to deepen the scientific understanding of large scale learning and to turn compute into intelligence as efficiently and predictably as possible.


Today, we're joined by @achowdhery, member of technical staff at @reflection_ai, to explore the fundamental shifts required to build true agentic AI. While the industry has largely focused on post-training techniques to improve reasoning, Aakanksha draws on her experience leading pre-training efforts for Google’s PaLM and early Gemini models to argue that pre-training itself must be rethought to move beyond static benchmarks. We explore the limitations of next-token prediction for multi-step workflows and examine how attention mechanisms, loss objectives, and training data must evolve to support long-form reasoning and planning. Aakanksha shares insights on the difference between context retrieval and actual reasoning, the importance of "trajectory" training data, and why scaling remains essential for discovering emergent agentic capabilities like error recovery and dynamic tool learning. 🗒️ For the full list of resources for this episode, visit the show notes page: twimlai.com/go/759. 📖 CHAPTERS =============================== 00:00 - Introduction 02:26 - Reflection 04:54 - Limitations of post-training for building agents 07:31 - Rethinking pre-training in agents 10:51 - Scaling 11:27 - Evolving attention mechanisms for agentic capabilities 12:39 - Memory as a tool 14:13 - Loss objectives and training data 15:50 - Fine-tuning loss in agent performance 19:37 - Training data 21:29 - Augmenting dominant training data source 24:11 - Overcoming challenges in training on synthetic data 25:47 - Benchmarks 30:44 - Scaling laws in large models versus small models 33:20 - Long-form versus short-form reasoning 37:57 - Agent’s ability to recover from failure 40:15 - Hallucinations and failure recovery 43:53 - Tool use in agents 46:38 - Coding agents 48:37 - How researchers can contribute to agentic AI








Today we’re introducing GDPval, a new evaluation that measures AI on real-world, economically valuable tasks. Evals ground progress in evidence instead of speculation and help track how AI improves at the kind of work that matters most. openai.com/index/gdpval-v0


Introducing the Environments Hub RL environments are the key bottleneck to the next wave of AI progress, but big labs are locking them down We built a community platform for crowdsourcing open environments, so anyone can contribute to open-source AGI

We're excited to have @shengjia_zhao at the helm as Chief Scientist of Meta Superintelligence Labs. Big things are coming! 🚀 See Mark's post: @zuck/post/DMiwjXJSYCd" target="_blank" rel="nofollow noopener">threads.com/@zuck/post/DMi…

1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).







