Jesse Farebrother

449 posts

Jesse Farebrother

@JesseFarebro

Ph.D. student studying AI & decision making at @Mila_Quebec / @McGillU. Previously @AIatMeta, @GoogleDeepMind, @Google 🧠.

London, UK 🇬🇧 Katılım Haziran 2009

477 Takip Edilen1.1K Takipçiler

Sabitlenmiş Tweet

Jesse Farebrother@JesseFarebro·2 May

Honored that our paper Temporal Difference Flows received the Best Paper Award at the #ICLR2025 World Models Workshop, and has also been accepted as a spotlight for #ICML2025! All made possible with the exceptional team @AIatMeta! 📄arxiv.org/abs/2503.09817 x.com/JesseFarebro/s…

Jesse Farebrother@JesseFarebro

3) At the World Models workshop, I'll be giving an oral on a new approach to learning a generative model of successor states through flow matching / diffusion. 📍Peridot 201 & 206 📅Mon 28 Apr 5 PM - 5:30 PM Check out the paper on arXiv: arxiv.org/abs/2503.09817 with a full tweet thread coming soon 🙂.

English

214

43K

Jesse Farebrother@JesseFarebro·2d

@HenaffMikael @amilabs Congrats, Mikael!

Français

Mikael Henaff@HenaffMikael·3d

Personal update: I'm leaving Meta to join @amilabs. Very thankful to my colleagues and proud of the progress in exploration, intrinsic motivation, long-horizon control, and embodied AI we've made together. Stoked to train some world models next! (w/ some of above features ;))

English

290

21.5K

Jesse Farebrother@JesseFarebro·4 May

@HenaffMikael @pfau I’ll also shamelessly plug our latest paper that essentially does hierarchical planning and additionally has a consistency-style loss for credit assignment: arxiv.org/abs/2602.19634

English

2.8K

Mikael Henaff@HenaffMikael·4 May

@pfau Keep me posted if you find it! I don't see many HRL papers these days although I feel like it's due for a comeback.

English

171

David Pfau@pfau·4 May

Hive mind: I saw a preprint recently on hierarchical RL where they replace the usual n-step TD backup with a self-consistent update trying to predict the returns over a long horizon. Looked interesting but now it's vanished in the feed. Anyone know what I'm thinking of?

English

11.8K

Jesse Farebrother@JesseFarebro·4 May

@pfau I’d also recommend the original paper this is based on: openreview.net/forum?id=YweXy… and AFAIK this only holds in goal conditioned RL.

English

114

David Pfau@pfau·4 May

@JesseFarebro YESSSSS

147

Jesse Farebrother@JesseFarebro·4 May

@pfau This?

Seohong Park@seohong_park

We scaled up an "alternative" paradigm in RL: *divide and conquer*. Compared to Q-learning (TD learning), divide and conquer can naturally scale to much longer horizons. Blog post: seohong.me/blog/rl-withou… Paper: arxiv.org/abs/2510.22512 ↓

English

371

David Pfau@pfau·4 May

@JesseFarebro What's X? I saw it on Twitter.

English

258

Jesse Farebrother@JesseFarebro·26 Nis

@avenugo2 Cool work! Was curious if you tried using the likelihood of g as a reward bonus?

English

Aravind Venugopal@avenugo2·25 Nis

3/ 🧵 Perhaps, generative world models already capture long-horizon temporal information implicitly. If so, how do we extract it into a reward function?

English

142

Aravind Venugopal@avenugo2·25 Nis

1/ 🧵 Generative world models implicitly encode the geometry of the world. We present Occupancy Reward Shaping, a method to extract temporal geometry as information-rich rewards for goal-reaching tasks. with @JiayuChen98666 @chongyiz1 @ben_eysenbach Paper: arxiv.org/abs/2604.20627 Visit our poster at ICLR Poster Session 5, Pavilion 4 (9:30am-noon): iclr.cc/virtual/2026/p…

English

5.4K

Jesse Farebrother retweetledi

Marco Bagatella@mar_baga·22 Nis

Which representations are meaningful for control? We're presenting TD-JEPA as an oral at ICLR🇧🇷: a zero-shot reinforcement learning algorithm using self-prediction (JEPA) to learn representations that are predictive of long-term, policy-dependent behavior. It works pretty well!🧵

GIF

English

207

14.4K

Jesse Farebrother retweetledi

Nate Rahn@n8rahn·26 Mar

New Anthropic Fellows research: Abstractive red-teaming of language model character The worst way to find out about a character flaw in your language model is from a viral screenshot. How can we find these issues before deployment, rather than after? In this work, we introduce abstractive red-teaming, a new approach that searches over natural-language categories of queries, rather than individual prompts.

English

149

18.3K

Jesse Farebrother@JesseFarebro·17 Mar

@ShamKakade6 Can you talk about the relationship to inverse RL? Eg, we had done exactly this in continuous control (arxiv.org/abs/2411.07007), and makes me wonder how feature matching compares with IQ-learn for LLMs as seen here: arxiv.org/abs/2409.01369.

English

817

Sham Kakade@ShamKakade6·17 Mar

1/ Au revoir, RLVR. New work: EBFT (Energy-Based Fine-Tuning), a post-training method that directly optimizes the long-horizon behavior of model generations, addressing SFT’s deployment-time error amplification without relying on sparse, task-specific rewards.

English

278

266.2K

Jesse Farebrother@JesseFarebro·9 Mar

@kjaved_ @RichardSSutton Slight bit of self promotion: even better for (3) is conditioning on a policy as we did here: arxiv.org/abs/2602.19634. We’re getting close to having full option transition models.

English

1.2K

Khurram Javed@kjaved_·9 Mar

World models are all the rage these days, so it's worth reiterating a few points that are largely correct. 1. Yes, our agents need models. The primary use of these models is planning. Planning can be done in real time, to improve an immediate decision, or in the background when not much is going on, to improve future decisions. 2. Learning models that predict the next sensory percept, such as pixels, is insufficient. The models should predict agent state; agent state is a summary of the past observations. 3. Learning one-step models is insufficient. Models should be conditioned on sequences of actions (e.g., option models). Finding what sequences of actions they should be conditioned on is an unsolved problem.

English

208

20K

Jesse Farebrother@JesseFarebro·8 Mar

It is infuriating how many ICML submissions could have been entirely prevented if authors just took 5 minutes to do a literature review. Ignoring ~10 years of established work on the exact idea you are proposing is just lazy.

English

339

32.7K

Jesse Farebrother retweetledi

Max Schwarzer@max_a_schwarzer·4 Mar

I've decided to leave OpenAI. I'm incredibly proud of all the work I've been part of here, from helping create the reasoning paradigm with @MillionInt, scaling up test-time compute with @polynoamial, working on RL algorithms with my fellow strawberries, shipping o1-preview (which started life as of one of my derisking runs), to post-training o1 and o3 with @ericmitchellai, @yanndubs and many others. I'm most proud of having led the post-training team here for the last year -- the team has done incredible work and shipped some really smart models, including GPT-5, 5.1, 5.2, and 5.3-Codex. OpenAI has genuinely some of the most talented researchers I have ever met, and I have learned more than I could have imagined knowing since I joined as a new grad. I want to thank @markchen90 @FidjiSimo @sama @merettm for all their support over my time here, and too many collaborators to name for the insights, ideas, and just plain fun we have had working together. After leading post-training for a year, though, I'm longing to start fresh and return to IC research work. I've been thinking about going back to technical research for quite some time, and I genuinely believe my colleagues and team here are set up to succeed going forward without me. I'm personally very excited for my next chapter -- I'm proud to be joining @AnthropicAI to get back into the weeds in RL research, and I'm looking forward supporting my friends there at this important time. Many of people I most trust and respect have joined Anthropic over the last couple of years, and I'm excited to work with them again. I have also been very impressed with Anthropic's talent, research taste and values, and I'm excited to be part of what the company does next!

English

605

1.2K

21.1K

3.2M

Jesse Farebrother retweetledi

Google DeepMind@GoogleDeepMind·19 Şub

Gemini 3.1 Pro is here. We’ve significantly improved the model’s overall intelligence so it can solve tougher problems. 🧵

GIF

English

288

737

6.3K

924.4K

Jesse Farebrother retweetledi

Demis Hassabis@demishassabis·29 Oca

AlphaGenome is our latest & most advanced genomics model published in @Nature today including making the model & weights available to academic researchers. Can’t wait to see what the research community will do with it. Congrats to the team on our newest front cover! #AI4Science

Google DeepMind@GoogleDeepMind

Our breakthrough AI model AlphaGenome is helping scientists understand our DNA, predict the molecular impact of genetic changes, and drive new biological discoveries. 🧬 Find out more in @Nature ↓ goo.gle/4bXlV6y

English

124

670

4.8K

504.4K

Jesse Farebrother@JesseFarebro·29 Oca

RL infrastructure seems like one of the biggest pain points for academia right now, been waiting for someone to tackle this

Josh Greaves@joshgreaves_ml

The big labs are betting RL will unlock superhuman coding. But their infrastructure is closed, and OSS tooling doesn't support true online RL—just iterative batch optimization. We're releasing ARES to close that gap 🧵

English

Jesse Farebrother retweetledi

Arnav Jain@arnavkj95·4 Ara

📢Thrilled to dock ⚓️ at #NeurIPS2025 in San Diego! Come say ahoy to SAILOR ⛵️ at our spotlight poster #2407 (11:00 AM – 2:00 PM PST)! Paper: arxiv.org/abs/2506.05294 Code: github.com/arnavkj1995/SA…

Gokul Swamy@g_k_swamy

Say ahoy to 𝚂𝙰𝙸𝙻𝙾𝚁⛵: a new paradigm of *learning to search* from demonstrations, enabling test-time reasoning about how to recover from mistakes w/o any additional human feedback! 𝚂𝙰𝙸𝙻𝙾𝚁 ⛵ out-performs Diffusion Policies trained via behavioral cloning on 5-10x data!

English

4.8K

Jesse Farebrother retweetledi

Arnav Jain@arnavkj95·2 Ara

Excited to be at #NeurIPS in San Diego (Dec 1–7) to present our work on learning to search-- SAILOR ⛵️! If you are into RL, reward modeling, or world models, let's grab a coffee ☕️ and chat.

English

2.5K

Jesse Farebrother retweetledi

Harley Wiltzer@harwiltz·28 Kas

I'll be @NeurIPSConf next week presenting ripe work on control+DistRL: our "Temperature Decoupling Gambit" for entropy-regularized RL gives convergence to an interpretable optimal policy in the 0 temp limit + convergent return distribution iterates. 📄arxiv.org/abs/2510.08526

English

4.3K

Jesse Farebrother retweetledi

Martin Klissarov@MartinKlissarov·24 Kas

🚨Internship alert 🚨 Together with @HolarisSun, we will be hosting a Student Researcher next year at @GoogleDeepMind. The research will be at the intersection of continual learning, self-improvement and social learning. ➡️ Please do fill this form: docs.google.com/forms/d/e/1FAI…

English

325

57.1K

Jesse Farebrother@JesseFarebro·19 Kas

@sedielem @danijarh Yes, of course it’s not new, just simply recent work also showing large gains from x-prediction! (should have read past your initial tweet for the context)

English

256

Sander Dieleman@sedielem·19 Kas

Thanks! I didn't put that work in the same category, because AFAIK it doesn't discuss dimensionality as the underlying motivation. Many works have previously suggested x-prediction with various alternative and equally valid motivations (including the seminal EDM paper arxiv.org/abs/2206.00364, the DALL-E 2 unCLIP model arxiv.org/abs/2204.06125, and some of my own work 😁)

English

847

Sander Dieleman@sedielem·18 Kas

Two recent papers (arxiv.org/abs/2510.11690, arxiv.org/abs/2511.13720) suggest that predicting x (clean) works much better than predicting eps or v (noisy) in high dimensions. Natural signals like images live on a low-dimensional manifold. Noise takes you off the manifold! (1/3)

English

579

50.7K

Keşfet

@HenaffMikael @amilabs @pfau @avenugo2 @JiayuChen98666 @chongyiz1 @ben_eysenbach @ShamKakade6