Behrooz Ghorbani

158 posts

Behrooz Ghorbani banner
Behrooz Ghorbani

Behrooz Ghorbani

@_ghorbani

Head of Science of Scaling @Reflection_ai. Formerly @OpenAI, @GoogleBrain and @stanford_ee. Opinions expressed are solely my own.

San Francisco, CA Katılım Aralık 2017
612 Takip Edilen1.5K Takipçiler
Behrooz Ghorbani retweetledi
Misha Laskin
Misha Laskin@MishaLaskin·
AGI is in its first stages of take-off. Every country is realizing that AI sovereignty is existential, which requires open models. We’ve signed a deal with Shinsegae Group to build South Korea’s sovereign cloud on a US open model built by Reflection. More to come.
Misha Laskin tweet media
English
10
19
130
22.4K
Behrooz Ghorbani
Behrooz Ghorbani@_ghorbani·
Proud to share that Reflection is partnering with Shinsegae Group to build a 250MW AI factory for Korea’s sovereign AI 🇰🇷 Excited to keep pushing the frontiers of RL, reasoning, and open models with this team! wsj.com/tech/ai/nvidia…
English
1
5
50
3.1K
Patrick Fernandes
Patrick Fernandes@psanfernandes·
Excited to announce that, after finishing my PhD a couple of months ago, I will continue to do *open* science at @reflection_ai on @_ghorbani's new team! And we are still looking for exceptional individuals to join us 😉
Behrooz Ghorbani@_ghorbani

Hi friends, after three incredible years at OpenAI I am excited to share that I am starting a new chapter at @reflection_ai, where I will be leading the Science of Scaling team. Our mission is to deepen the scientific understanding of large scale learning and to turn compute into intelligence as efficiently and predictably as possible.

English
1
1
19
1.3K
Behrooz Ghorbani retweetledi
Reflection AI
Reflection AI@reflection_ai·
Most approaches to “agentic AI” focus on post-training fixes. In this conversation, member of our technical staff, @achowdhery argues the bottleneck is pre-training itself. Drawing on her work on PaLM and early Gemini, she explains why next-token prediction breaks down for long-horizon planning -- and how objectives, attention, and training data must evolve to support true agentic behavior.
The TWIML AI Podcast@twimlai

Today, we're joined by @achowdhery, member of technical staff at @reflection_ai, to explore the fundamental shifts required to build true agentic AI. While the industry has largely focused on post-training techniques to improve reasoning, Aakanksha draws on her experience leading pre-training efforts for Google’s PaLM and early Gemini models to argue that pre-training itself must be rethought to move beyond static benchmarks. We explore the limitations of next-token prediction for multi-step workflows and examine how attention mechanisms, loss objectives, and training data must evolve to support long-form reasoning and planning. Aakanksha shares insights on the difference between context retrieval and actual reasoning, the importance of "trajectory" training data, and why scaling remains essential for discovering emergent agentic capabilities like error recovery and dynamic tool learning. 🗒️ For the full list of resources for this episode, visit the show notes page: twimlai.com/go/759. 📖 CHAPTERS =============================== 00:00 - Introduction 02:26 - Reflection 04:54 - Limitations of post-training for building agents 07:31 - Rethinking pre-training in agents 10:51 - Scaling 11:27 - Evolving attention mechanisms for agentic capabilities 12:39 - Memory as a tool 14:13 - Loss objectives and training data 15:50 - Fine-tuning loss in agent performance 19:37 - Training data 21:29 - Augmenting dominant training data source 24:11 - Overcoming challenges in training on synthetic data 25:47 - Benchmarks 30:44 - Scaling laws in large models versus small models 33:20 - Long-form versus short-form reasoning 37:57 - Agent’s ability to recover from failure 40:15 - Hallucinations and failure recovery 43:53 - Tool use in agents 46:38 - Coding agents 48:37 - How researchers can contribute to agentic AI

English
5
15
110
40.8K
Behrooz Ghorbani retweetledi
Casey Flint
Casey Flint@FlintCasey·
2 hrs in and I have almost lost my voice
Casey Flint tweet media
English
3
3
73
6K
Behrooz Ghorbani
Behrooz Ghorbani@_ghorbani·
I am deeply grateful to my colleagues at OpenAI. It has been a privilege to be there from the early days of ChatGPT and to learn from so many brilliant people, especially the reasoning team, which has been my home these past few years and a constant source of insight, collaboration, and support. Thank you for everything we built together. I am excited for what comes next.
English
1
1
18
2.2K
Behrooz Ghorbani
Behrooz Ghorbani@_ghorbani·
In Science of Scaling we will focus on three pillars: understanding LLM training dynamics at scale, the role of real and synthetic data, and the science of RL. I am especially excited to pursue this mission together with @MishaLaskin and @real_ioannis at Reflection. I am building a small, high trust team that cares deeply about open research, careful measurement, and engineering excellence. If you are interested in the science of pretraining, data, and RL at scale and want to help push the frontier with a focused, tight knit group, my DMs are open. I will also be at NeurIPS this week (calendly.com/b-ghorbani-bg/…).
English
3
3
33
3.6K
Behrooz Ghorbani
Behrooz Ghorbani@_ghorbani·
Hi friends, after three incredible years at OpenAI I am excited to share that I am starting a new chapter at @reflection_ai, where I will be leading the Science of Scaling team. Our mission is to deepen the scientific understanding of large scale learning and to turn compute into intelligence as efficiently and predictably as possible.
Behrooz Ghorbani tweet media
English
31
12
282
72.6K
Behrooz Ghorbani retweetledi
Applied Compute
Applied Compute@appliedcompute·
Generalists are useful, but it’s not enough to be smart. Advances come from specialists, whether human or machine. To have an edge, agents need specific expertise, within specific companies, built on models trained on specific data. We call this Specific Intelligence. It's what we're building at Applied Compute. We unlock the latent knowledge inside a company, use it to train custom models, and deploy an in-house agent workforce that reports to your team. We work with sophisticated companies that have already captured early gains from general models, like @cognition, @DoorDash, and @mercor_ai. They’re pulling even further ahead with proprietary in-house agents that don’t need to wait for the next public model release. Together, we are building and validating models and agents in days instead of months, achieving state-of-the-art performance on customer evals. Our team has high density and low latency. Our founders all worked on different parts of this problem while they were researchers at OpenAI — @ypatil125 as a key member on the agentic software engineer effort (Codex), @rhythmrg as a core contributor to the first RL-trained reasoning model (o1), and @lindensli as a core contributor on ML systems and infrastructure for RL training. Two-thirds of the team are former founders, and everyone brings a deep technical background, from top AI researchers to Math Olympiad winners. We are backed by $80M in funding from Benchmark, Sequoia, Lux, Elad Gil, Victor Lazarte, Omri Casspi, and others. With their support, we are growing the team, scaling deployments, and bringing to market the first generation of agent workforces built on specific models. In short: 1. We are building Specific Intelligence for specific work at specific companies. 2. That will power in-house agent workforces to support their human bosses. 3. That in turn will unlock AI’s full potential through humanity’s greatest engine of progress: thriving corporations in a free market.
Applied Compute tweet media
English
107
61
636
1M
Behrooz Ghorbani retweetledi
Tejal Patwardhan
Tejal Patwardhan@tejalpatwardhan·
Understanding the capabilities of AI models is important to me. To forecast how AI models might affect labor, we need methods to measure their real-world work abilities. That’s why we created GDPval.
Tejal Patwardhan tweet media
OpenAI@OpenAI

Today we’re introducing GDPval, a new evaluation that measures AI on real-world, economically valuable tasks. Evals ground progress in evidence instead of speculation and help track how AI improves at the kind of work that matters most. openai.com/index/gdpval-v0

English
58
190
1.3K
1.1M
Behrooz Ghorbani retweetledi
Anjney Midha
Anjney Midha@AnjneyMidha·
the distance between category leaders and stragglers in frontier AI starts with talent and culture by the time the revenue and valuation signals show up, it’s too late
English
10
9
80
7.3K
Behrooz Ghorbani retweetledi
Jiantao Jiao
Jiantao Jiao@JiantaoJ·
🚀 We’re hiring at NVIDIA! Our team is pushing the frontier of LLM / DLM post-training and system optimization. We are looking for exceptional people with large-scale LLM + systems experience to join us (full time only). 🔹 Focus areas include: •Post-training of large models •Systems for LLM/DLM training & inference at scale •Efficiency, scaling, and evaluation frameworks of LLMs At NVIDIA, you’ll work with world-class researchers and engineers on cutting-edge foundation models at unprecedented scale. 👉 If you’re passionate about LLMs, systems, and building the next generation of AI, we’d love to hear from you. 📩 If you’re interested, please send me your CV! @nvidia #LLM #AI #Systems #PostTraining #DeepLearning
English
22
33
471
103.3K
Behrooz Ghorbani retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
In era of pretraining, what mattered was internet text. You'd primarily want a large, diverse, high quality collection of internet documents to learn from. In era of supervised finetuning, it was conversations. Contract workers are hired to create answers for questions, a bit like what you'd see on Stack Overflow / Quora, or etc., but geared towards LLM use cases. Neither of the two above are going away (imo), but in this era of reinforcement learning, it is now environments. Unlike the above, they give the LLM an opportunity to actually interact - take actions, see outcomes, etc. This means you can hope to do a lot better than statistical expert imitation. And they can be used both for model training and evaluation. But just like before, the core problem now is needing a large, diverse, high quality set of environments, as exercises for the LLM to practice against. In some ways, I'm reminded of OpenAI's very first project (gym), which was exactly a framework hoping to build a large collection of environments in the same schema, but this was way before LLMs. So the environments were simple academic control tasks of the time, like cartpole, ATARI, etc. The @PrimeIntellect environments hub (and the `verifiers` repo on GitHub) builds the modernized version specifically targeting LLMs, and it's a great effort/idea. I pitched that someone build something like it earlier this year: x.com/karpathy/statu… Environments have the property that once the skeleton of the framework is in place, in principle the community / industry can parallelize across many different domains, which is exciting. Final thought - personally and long-term, I am bullish on environments and agentic interactions but I am bearish on reinforcement learning specifically. I think that reward functions are super sus, and I think humans don't use RL to learn (maybe they do for some motor tasks etc, but not intellectual problem solving tasks). Humans use different learning paradigms that are significantly more powerful and sample efficient and that haven't been properly invented and scaled yet, though early sketches and ideas exist (as just one example, the idea of "system prompt learning", moving the update to tokens/contexts not weights and optionally distilling to weights as a separate process a bit like sleep does).
Prime Intellect@PrimeIntellect

Introducing the Environments Hub RL environments are the key bottleneck to the next wave of AI progress, but big labs are locking them down We built a community platform for crowdsourcing open environments, so anyone can contribute to open-source AGI

English
258
864
7.3K
944.8K
Behrooz Ghorbani retweetledi
OpenAI
OpenAI@OpenAI·
LIVE5TREAM THURSDAY 10AM PT
English
2K
2.8K
23.5K
6.9M
Behrooz Ghorbani retweetledi
Jerry Tworek
Jerry Tworek@MillionInt·
To summarize this week: - we released general purpose computer using agent - got beaten by a single human in atcoder heuristics competition - solved 5/6 new IMO problems with natural language proofs All of those are based on the same single reinforcement learning system
English
43
114
1.3K
172.5K
Behrooz Ghorbani retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly increase (/decrease) the probability of every action I took for the future". You get a lot more leverage from verifier functions than explicit supervision, this is great. But first, it looks suspicious asymptotically - once the tasks grow to be minutes/hours of interaction long, you're really going to do all that work just to learn a single scalar outcome at the very end, to directly weight the gradient? Beyond asymptotics and second, this doesn't feel like the human mechanism of improvement for majority of intelligence tasks. There's significantly more bits of supervision we extract per rollout via a review/reflect stage along the lines of "what went well? what didn't go so well? what should I try next time?" etc. and the lessons from this stage feel explicit, like a new string to be added to the system prompt for the future, optionally to be distilled into weights (/intuition) later a bit like sleep. In English, we say something becomes "second nature" via this process, and we're missing learning paradigms like this. The new Memory feature is maybe a primordial version of this in ChatGPT, though it is only used for customization not problem solving. Notice that there is no equivalent of this for e.g. Atari RL because there are no LLMs and no in-context learning in those domains. Example algorithm: given a task, do a few rollouts, stuff them all into one context window (along with the reward in each case), use a meta-prompt to review/reflect on what went well or not to obtain string "lesson", to be added to system prompt (or more generally modify the current lessons database). Many blanks to fill in, many tweaks possible, not obvious. Example of lesson: we know LLMs can't super easily see letters due to tokenization and can't super easily count inside the residual stream, hence 'r' in 'strawberry' being famously difficult. Claude system prompt had a "quick fix" patch - a string was added along the lines of "If the user asks you to count letters, first separate them by commas and increment an explicit counter each time and do the task like that". This string is the "lesson", explicitly instructing the model how to complete the counting task, except the question is how this might fall out from agentic practice, instead of it being hard-coded by an engineer, how can this be generalized, and how lessons can be distilled over time to not bloat context windows indefinitely. TLDR: RL will lead to more gains because when done well, it is a lot more leveraged, bitter-lesson-pilled, and superior to SFT. It doesn't feel like the full story, especially as rollout lengths continue to expand. There are more S curves to find beyond, possibly specific to LLMs and without analogues in game/robotics-like environments, which is exciting.
English
409
840
8.4K
1.1M