Jacob Beck

99 posts

Jacob Beck banner
Jacob Beck

Jacob Beck

@jakeABeck

Let’s get agents to learn fast! 🤖🔥 Research Scientist @Oracle | PhD @UniOfOxford, MS & BS @BrownUniversity, Predoc @Microsoft

Katılım Nisan 2014
112 Takip Edilen364 Takipçiler
Sabitlenmiş Tweet
Jacob Beck
Jacob Beck@jakeABeck·
Big news—our survey paper “A Tutorial on Meta-Reinforcement Learning” is officially published! Meta-RL = learning how to adapt through interaction. It embraces The Bitter Lesson: don’t hardcode agents—train them to adapt on their own arxiv.org/abs/2301.08028 🧵⬇️
English
2
77
337
21.1K
Jacob Beck
Jacob Beck@jakeABeck·
@istappar There are diminishing returns to more intelligence and the world is not that controllable. Here’s a more in depth summary than twitter allows: tinyurl.com/ASIisFine
English
0
0
0
15
Nils Öster
Nils Öster@istappar·
@jakeABeck Humanity as a whole could be considered super-intelligent, although it's not well coordinated. Compared to other life-forms, humanity has super-power. If something gets created that can self-improve until it's better than humanity, it would have more power than humanity.
English
1
0
0
40
Jacob Beck
Jacob Beck@jakeABeck·
AI optimists “don’t have counter-arguments — they just call names.” — @So8res on a podcast with @ESYudkowsky + Sam Harris Curious what you two think of these counter-arguments. And since @ylecun was called out by name, I’d love his take too…
English
1
0
2
251
Jacob Beck
Jacob Beck@jakeABeck·
@istappar Is it less intelligent? Is that the real bottleneck? Cryptography is mathematically hard, chaotic systems are unpredictable, large-scale resource acquisition requires time and offers avenues for pushback. These are the bottlenecks, and AI has to play by the same constraints.
English
0
0
0
12
Nils Öster
Nils Öster@istappar·
@jakeABeck A super AI that is more capable than humanity combined would be more creative and more efficient than humanity at achieving its weird internal goals. North Korea is a lot less capable / intelligent than USA for example.
English
1
0
1
42
Jacob Beck
Jacob Beck@jakeABeck·
@istappar If our adversaries had to pick a work with us in it or not in it, I’m pretty sure I know which one they would prefer.
English
0
0
0
9
Nils Öster
Nils Öster@istappar·
@jakeABeck I don't think North Korea would love to destroy the US, the North Korean regime would just like to continue its dictatorship.
English
1
0
0
29
Jacob Beck
Jacob Beck@jakeABeck·
@istappar LLMs learning by doing is the domain of RL. Empirically, we have positive results on problems where learned reasoning chains are short, the AI already had a sense of what to do, and we already knew the answer, leaving us still recycling the same finite pool of digital content.
English
0
0
0
22
Jacob Beck
Jacob Beck@jakeABeck·
@istappar A recent estimate (arxiv.org/abs/2211.04325) puts the median year we exhaust the supply of quality internet text at 2028. We’ve already trained on one internet’s worth of information, & replenishing it is hard. Industry’s bet is on “learning from experience”, but results are mixed
English
1
0
0
37
Jacob Beck
Jacob Beck@jakeABeck·
Summer 2026 Internship — Oracle (Boston, MA) My fantastic research team is hiring! Projects include a data scientist agent with in-context learning, evolutionary search (a la AlphaEvolve), AI feedback, and RL/ES Apply here! eeho.fa.us2.oraclecloud.com/hcmUI/Candidat… 📧 jake.beck@oracle.com
English
1
2
6
358
Jacob Beck
Jacob Beck@jakeABeck·
@jsuarez @siddarthv66 Does this not count as “no, here’s why, and this is all arbitrary”? x.com/jakeabeck/stat…
Jacob Beck@jakeABeck

@jsuarez Where the experience came from feels like an odd concepts boundary to me, and pragmatically the tools of offline RL look a lot more like those of RL than SL, but it’s hard to argue for the elegance of ultimately arbitrary definitions.

English
0
0
0
108
Joseph Suarez 🐡
Joseph Suarez 🐡@jsuarez·
@siddarthv66 Given that most of the comments on the original were "you're wrong because I can't read," this is a comparative literary masterpiece. Not a single person said "no, I don't think interaction should be the cornerstone and here's why"
English
2
0
2
410
Joseph Suarez 🐡
Joseph Suarez 🐡@jsuarez·
Offline RL is not RL. RL is about interaction. No interaction, no RL.
English
63
21
463
174.5K
Jacob Beck
Jacob Beck@jakeABeck·
@jsuarez Where the experience came from feels like an odd concepts boundary to me, and pragmatically the tools of offline RL look a lot more like those of RL than SL, but it’s hard to argue for the elegance of ultimately arbitrary definitions.
English
1
0
3
229
Joseph Suarez 🐡
Joseph Suarez 🐡@jsuarez·
@jakeABeck I'm not saying that's a bad problem to solve. I'm just drawing the line for RL around interaction itself
English
1
0
2
335
Jacob Beck
Jacob Beck@jakeABeck·
@agarwl_ Good point. I would strengthen the claim to say that RL is precisely about learning from suboptimal experience, to distinguish it from imitation learning, and doing so usually entails learning from reward.
English
0
0
3
594
Siddarth Venkatraman
Siddarth Venkatraman@siddarthv66·
MC advantage estimation (aka mean baseline) is literally a part of REINFORCE. This variance reduction is covered in like the second or third lecture of any deep RL class covering policy gradients. Clipped objective is equivalent to unclipped objective when fully on-policy. With a few async steps it’s not equivalent, but many REINFORCE trainers also use the clipped objective anyway (like the RLOO verl trainer)
English
1
0
2
248
Tanishq Mathew Abraham, Ph.D.
Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·
practical, modern GRPO tweaks as described in Meta's Code World Models paper
Tanishq Mathew Abraham, Ph.D. tweet media
English
13
80
864
244.2K
Jacob Beck
Jacob Beck@jakeABeck·
@siddarthv66 @iScienceLuvr These papers still use clipping and monte carlo advantage estimatation — which, given the number of ablations in papers like Dr. GRPO, DAPO, and Part I: Tricks or Traps — is probably necessary
English
1
0
0
244
Siddarth Venkatraman
Siddarth Venkatraman@siddarthv66·
@iScienceLuvr GRPO without advantage normalization, and without KL That’s literally vanilla REINFORCE. Why can’t the LLM community just call it REINFORCE? This obsession with GRPO has to stop
English
2
2
15
2.6K
Jacob Beck
Jacob Beck@jakeABeck·
@dwarkesh_sp @RichardSSutton LLMs can do continual RL and can train on (textual) MDPs! Here’s the thread from after Rich’s talk at RLC — with thoughts on LLMs, especially as applied to continual RL and meta-RL!
Jacob Beck@jakeABeck

Fantastic talk from @RichardSSutton at @RL_Conference with shoutouts to meta-RL. Honored to be called “more extreme” than Rich (by Rich) for taking the Bitter Lesson to heart and suggesting we meta-learn all the components he discussed. My Q: Aren’t LLMs already doing all this?

English
0
1
7
1.1K
Dwarkesh Patel
Dwarkesh Patel@dwarkesh_sp·
.@RichardSSutton, father of reinforcement learning, doesn’t think LLMs are bitter-lesson-pilled. My steel man of Richard’s position: we need some new architecture to enable continual (on-the-job) learning. And if we have continual learning, we don't need a special training phase - the agent just learns on-the-fly - like all humans, and indeed, like all animals. This new paradigm will render our current approach with LLMs obsolete. I did my best to represent the view that LLMs will function as the foundation on which this experiential learning can happen. Some sparks flew. 0:00:00 – Are LLMs a dead-end? 0:13:51 – Do humans do imitation learning? 0:23:57 – The Era of Experience 0:34:25 – Current architectures generalize poorly out of distribution 0:42:17 – Surprises in the AI field 0:47:28 – Will The Bitter Lesson still apply after AGI? 0:54:35 – Succession to AI
English
249
623
4.5K
3.1M
Eliezer Yudkowsky ⏹️
Eliezer Yudkowsky ⏹️@ESYudkowsky·
"If Anyone Builds It, Everyone Dies" is now out. Read it today if you want to see with fresh eyes what's truly there, before others try to prime your brain to see something else instead!
Eliezer Yudkowsky ⏹️ tweet media
English
169
127
988
421.9K
OpenAI
OpenAI@OpenAI·
Today we’re releasing research with @apolloaievals. In controlled tests, we found behaviors consistent with scheming in frontier models—and tested a way to reduce it. While we believe these behaviors aren’t causing serious harm today, this is a future risk we’re preparing for. openai.com/index/detectin…
English
219
351
3K
1.4M
Jacob Beck
Jacob Beck@jakeABeck·
4️⃣ Superintelligence does not beget super-power. Some systems are inherently unpredictable, and prediction doesn’t guarantee control. Knowing how a hurricane forms doesn’t mean you can steer one.
English
1
0
2
216
Jacob Beck
Jacob Beck@jakeABeck·
3️⃣ We already live alongside “misaligned superintelligences” in the form of adversarial nation states. North Korea would love to destroy the US, and yet here we are. The benefits of superintelligence are limited by real-world constraints.
English
3
0
2
244