Jesse Farebrother

449 posts

Jesse Farebrother banner
Jesse Farebrother

Jesse Farebrother

@JesseFarebro

Ph.D. student studying AI & decision making at @Mila_Quebec / @McGillU. Previously @AIatMeta, @GoogleDeepMind, @Google 🧠.

London, UK 🇬🇧 Katılım Haziran 2009
477 Takip Edilen1.1K Takipçiler
Sabitlenmiş Tweet
Jesse Farebrother
Jesse Farebrother@JesseFarebro·
Honored that our paper Temporal Difference Flows received the Best Paper Award at the #ICLR2025 World Models Workshop, and has also been accepted as a spotlight for #ICML2025! All made possible with the exceptional team @AIatMeta! 📄arxiv.org/abs/2503.09817 x.com/JesseFarebro/s…
Jesse Farebrother tweet media
Jesse Farebrother@JesseFarebro

3) At the World Models workshop, I'll be giving an oral on a new approach to learning a generative model of successor states through flow matching / diffusion. 📍Peridot 201 & 206 📅Mon 28 Apr 5 PM - 5:30 PM Check out the paper on arXiv: arxiv.org/abs/2503.09817 with a full tweet thread coming soon 🙂.

English
12
39
214
43K
Mikael Henaff
Mikael Henaff@HenaffMikael·
Personal update: I'm leaving Meta to join @amilabs. Very thankful to my colleagues and proud of the progress in exploration, intrinsic motivation, long-horizon control, and embodied AI we've made together. Stoked to train some world models next! (w/ some of above features ;))
English
17
6
290
21.5K
Mikael Henaff
Mikael Henaff@HenaffMikael·
@pfau Keep me posted if you find it! I don't see many HRL papers these days although I feel like it's due for a comeback.
English
2
0
1
171
David Pfau
David Pfau@pfau·
Hive mind: I saw a preprint recently on hierarchical RL where they replace the usual n-step TD backup with a self-consistent update trying to predict the returns over a long horizon. Looked interesting but now it's vanished in the feed. Anyone know what I'm thinking of?
English
6
3
47
11.8K
Jesse Farebrother
Jesse Farebrother@JesseFarebro·
@avenugo2 Cool work! Was curious if you tried using the likelihood of g as a reward bonus?
English
1
0
0
32
Aravind Venugopal
Aravind Venugopal@avenugo2·
3/ 🧵 Perhaps, generative world models already capture long-horizon temporal information implicitly. If so, how do we extract it into a reward function?
Aravind Venugopal tweet media
English
2
0
0
142
Jesse Farebrother retweetledi
Marco Bagatella
Marco Bagatella@mar_baga·
Which representations are meaningful for control? We're presenting TD-JEPA as an oral at ICLR🇧🇷: a zero-shot reinforcement learning algorithm using self-prediction (JEPA) to learn representations that are predictive of long-term, policy-dependent behavior. It works pretty well!🧵
GIF
English
1
34
207
14.4K
Jesse Farebrother retweetledi
Nate Rahn
Nate Rahn@n8rahn·
New Anthropic Fellows research: Abstractive red-teaming of language model character The worst way to find out about a character flaw in your language model is from a viral screenshot. How can we find these issues before deployment, rather than after? In this work, we introduce abstractive red-teaming, a new approach that searches over natural-language categories of queries, rather than individual prompts.
Nate Rahn tweet media
English
2
29
149
18.3K
Sham Kakade
Sham Kakade@ShamKakade6·
1/ Au revoir, RLVR. New work: EBFT (Energy-Based Fine-Tuning), a post-training method that directly optimizes the long-horizon behavior of model generations, addressing SFT’s deployment-time error amplification without relying on sparse, task-specific rewards.
English
7
42
278
266.2K
Khurram Javed
Khurram Javed@kjaved_·
World models are all the rage these days, so it's worth reiterating a few points that are largely correct. 1. Yes, our agents need models. The primary use of these models is planning. Planning can be done in real time, to improve an immediate decision, or in the background when not much is going on, to improve future decisions. 2. Learning models that predict the next sensory percept, such as pixels, is insufficient. The models should predict agent state; agent state is a summary of the past observations. 3. Learning one-step models is insufficient. Models should be conditioned on sequences of actions (e.g., option models). Finding what sequences of actions they should be conditioned on is an unsolved problem.
English
5
17
208
20K
Jesse Farebrother
Jesse Farebrother@JesseFarebro·
It is infuriating how many ICML submissions could have been entirely prevented if authors just took 5 minutes to do a literature review. Ignoring ~10 years of established work on the exact idea you are proposing is just lazy.
English
5
12
339
32.7K
Jesse Farebrother retweetledi
Max Schwarzer
Max Schwarzer@max_a_schwarzer·
I've decided to leave OpenAI. I'm incredibly proud of all the work I've been part of here, from helping create the reasoning paradigm with @MillionInt, scaling up test-time compute with @polynoamial, working on RL algorithms with my fellow strawberries, shipping o1-preview (which started life as of one of my derisking runs), to post-training o1 and o3 with @ericmitchellai, @yanndubs and many others. I'm most proud of having led the post-training team here for the last year -- the team has done incredible work and shipped some really smart models, including GPT-5, 5.1, 5.2, and 5.3-Codex. OpenAI has genuinely some of the most talented researchers I have ever met, and I have learned more than I could have imagined knowing since I joined as a new grad. I want to thank @markchen90 @FidjiSimo @sama @merettm for all their support over my time here, and too many collaborators to name for the insights, ideas, and just plain fun we have had working together. After leading post-training for a year, though, I'm longing to start fresh and return to IC research work. I've been thinking about going back to technical research for quite some time, and I genuinely believe my colleagues and team here are set up to succeed going forward without me. I'm personally very excited for my next chapter -- I'm proud to be joining @AnthropicAI to get back into the weeds in RL research, and I'm looking forward supporting my friends there at this important time. Many of people I most trust and respect have joined Anthropic over the last couple of years, and I'm excited to work with them again. I have also been very impressed with Anthropic's talent, research taste and values, and I'm excited to be part of what the company does next!
English
605
1.2K
21.1K
3.2M
Jesse Farebrother retweetledi
Google DeepMind
Google DeepMind@GoogleDeepMind·
Gemini 3.1 Pro is here. We’ve significantly improved the model’s overall intelligence so it can solve tougher problems. 🧵
GIF
English
288
737
6.3K
924.4K
Jesse Farebrother retweetledi
Demis Hassabis
Demis Hassabis@demishassabis·
AlphaGenome is our latest & most advanced genomics model published in @Nature today including making the model & weights available to academic researchers. Can’t wait to see what the research community will do with it. Congrats to the team on our newest front cover! #AI4Science
Google DeepMind@GoogleDeepMind

Our breakthrough AI model AlphaGenome is helping scientists understand our DNA, predict the molecular impact of genetic changes, and drive new biological discoveries. 🧬 Find out more in @Naturegoo.gle/4bXlV6y

English
124
670
4.8K
504.4K
Jesse Farebrother retweetledi
Jesse Farebrother retweetledi
Arnav Jain
Arnav Jain@arnavkj95·
Excited to be at #NeurIPS in San Diego (Dec 1–7) to present our work on learning to search-- SAILOR ⛵️! If you are into RL, reward modeling, or world models, let's grab a coffee ☕️ and chat.
English
1
4
27
2.5K
Jesse Farebrother retweetledi
Harley Wiltzer
Harley Wiltzer@harwiltz·
I'll be @NeurIPSConf next week presenting ripe work on control+DistRL: our "Temperature Decoupling Gambit" for entropy-regularized RL gives convergence to an interpretable optimal policy in the 0 temp limit + convergent return distribution iterates. 📄arxiv.org/abs/2510.08526
Harley Wiltzer tweet media
English
3
9
21
4.3K
Jesse Farebrother retweetledi
Martin Klissarov
Martin Klissarov@MartinKlissarov·
🚨Internship alert 🚨 Together with @HolarisSun, we will be hosting a Student Researcher next year at @GoogleDeepMind. The research will be at the intersection of continual learning, self-improvement and social learning. ➡️ Please do fill this form: docs.google.com/forms/d/e/1FAI…
English
10
26
325
57.1K
Jesse Farebrother
Jesse Farebrother@JesseFarebro·
@sedielem @danijarh Yes, of course it’s not new, just simply recent work also showing large gains from x-prediction! (should have read past your initial tweet for the context)
English
1
0
4
256
Sander Dieleman
Sander Dieleman@sedielem·
Thanks! I didn't put that work in the same category, because AFAIK it doesn't discuss dimensionality as the underlying motivation. Many works have previously suggested x-prediction with various alternative and equally valid motivations (including the seminal EDM paper arxiv.org/abs/2206.00364, the DALL-E 2 unCLIP model arxiv.org/abs/2204.06125, and some of my own work 😁)
English
1
1
8
847
Sander Dieleman
Sander Dieleman@sedielem·
Two recent papers (arxiv.org/abs/2510.11690, arxiv.org/abs/2511.13720) suggest that predicting x (clean) works much better than predicting eps or v (noisy) in high dimensions. Natural signals like images live on a low-dimensional manifold. Noise takes you off the manifold! (1/3)
English
18
78
579
50.7K