Will Bryk@WilliamBryk
Deep thoughts on Deepseek and Deep Research
There was a lot of big AI news the past 2 weeks, but the actual biggest news wasn't what you'd think.
The biggest news was not the trillion dollar Nvidia drop. That was a market overreaction bc of a spicy story.
It was not the cheap Deepseek training run. That was impressive engineering under constraints but overblown.
It was not the 500 billion dollar Stargate cluster. That was in line with predictions for big lab compute spend in the coming years.
And it was not OpenAI's Deep Research. That was an impressive release but an entirely predictable combination of o3 with a traditional search engine API.
So what was the biggest news?
The biggest news… was that Reinforcement Learning for LLMs "just works".
RL for LLMs is now easy to get working.
We see RL for LLMs just working for Deepseek, given the speed they were able to replicate o1, and given the ease that other orgs had using the same RL algorithm on different training data.
And we see RL for LLMs just working for OpenAI, with the speed they were able to get Deep Research working, only SOME WEEKS after o3 was trained.
Something new has been discovered about reality, a statistical law of the universe. It's hard for us to grasp the power of billions of weights melding toward some reward signal. We're touching up against fundamental properties of information systems. If we ever meet superintelligent aliens out there, they'd probably tell us that they too discovered something akin to RL for LLMs long ago.
All the other AI news stories the past two weeks will one day be minor details in the story. But RL for LLMs just working will power all the AI news going forward. It is this discovery that will usher in the next era of human history.
So what will this next era look like if it's powered by RL + LLMs? Will it be run by startups or big tech companies? Will it be open source or closed? Will it be deeply sought or deeply researched?
Yes. All of the above.
I think the past two weeks suggest we're on track for a very diverse world, one where small players and big players, open and closed source, intelligent systems (deepseek) and knowledge systems (deep research), each have big roles to play.
That's because the amount of value that's coming is absolutely massive (trillions of dollars) and no single player or single system will take it all. When you transition an entire economy to a new foundation built on compute, there will be opportunity everywhere for everyone. RL for LLMs just works, not just for OpenAI but for everyone.
The Deepseek result complements what I've heard from people at the AI labs -- this new RL paradigm is no longer hard.
It doesn't rely on some hard to replicate breakthrough like the transformer. It doesn't require some proprietary data mix like GPT-4, which took 2 years to replicate. It's an optimization function, one that requires a few thousand examples.
The iteration cycles here are extremely fast. Deepseek replicated o1 in a couple months. OpenAI finetuned o3 for deep research in a couple weeks. All the big labs will have their o3-level models soon and their tool using agents soon after. And the open source versions will follow.
Don't big labs have a massive compute advantage? Yes, because of the logarithmic test-time compute scaling law for RL + LLMs, you need exponentially more compute for linear gains in quality. The big companies will therefore own the frontier models.
But Deepseek showed that startups and individuals will also have very good models of their own. These can be trained on proprietary data mixes to make them better than the frontier models for many tasks. There will be a powerful open source ecosystem of RL data, resources, and tools. And when the cost of serving goes down to basically the cost of the underlying GPUs, you won't need to run their o5 on their compute when you can run your personalized r5 on your own compute.
Additionally we've seen that startups and individuals benefit from the race to the bottom that the big players play with their APIs, even from their frontier models. If RL + LLMs levels the playing field even more among the big labs, this probably gets more true.
There is a wide distribution of tasks at all positions on the latency/intelligence/skill specialization/privacy graph, and no player will satisfy them all. You don't need a Terrence Tao o7 model to do your taxes.
Trillions of dollars of new value is going to be created. There will be, and already are, a new breed of AI-first companies whose advantage come from streamlined integrations, prolific partnerships, magical product sense, shipping speed, access to unique data, connections to the physical world, viral marketing, building where big companies won't or can't.
The world will overflow with models of all different types and sizes. Compute will power all pockets of the economy.
This is the coolest time to be alive and to be building. Hectic and dangerous for our species for sure, but I'm optimistic. We'll get through it well if we act sensibly ( a big assumption yes). On the other side is abundance.
(btw if you’re worried about lack of meaning in a world of abundance, don’t worry there will be plenty of scarcity — someone is gonna have more compute than you and you're gonna want it.)
I wrote a post a couple weeks ago that predicted that at minimum by end of 2025 we’ll have phd level agents navigating the web doing complex tasks. Some called it hype. With Operator and Deep Research coming out some days later, we seem more than on track. These types of systems aren't accelerating people's work yet, but that's because they're bottlenecked on simple features that will come soon -- better integrations, longer context windows, connections to lots of data sources, and more training examples.
We're only at the beginning. The past two weeks in AI were wild, and they point to many more wild weeks to come.