Nate Rahn

24 posts

Nate Rahn

@n8rahn

Research @AnthropicAI, PhD student @mila_quebec, formerly @Google eng, @BrownUniversity. Making LLMs explorative, adaptive, and goal-oriented

San Francisco Katılım Ağustos 2018

25 Takip Edilen540 Takipçiler

Nate Rahn@n8rahn·27 Mar

@viemccoy @allylyq @JonathanMi98298 @sleight_henry @ErikJones313 @EthanJPerez Thanks! Blog post is available at: alignment.anthropic.com/2026/abstracti… and full paper at: arxiv.org/abs/2602.12318

English

𝚟𝚒𝚎 ⟢@viemccoy·27 Mar

@n8rahn @allylyq @JonathanMi98298 @sleight_henry @ErikJones313 @EthanJPerez Very interesting, will there be a full writeup anywhere?

English

100

Nate Rahn@n8rahn·26 Mar

Current pre-deployment evals face a trade-off. Static evaluations based on fixed prompt sets are too weak, missing rare failures across the vast space of possible user queries. Adversarial prompt optimization is strong, but narrow: it finds very specific prompts that are unlikely to appear in the wild. Categories bridge this gap.

English

1.2K

Nate Rahn@n8rahn·26 Mar

This project would not have been possible without the great work of my co-lead @allylyq, collaborators Avery Griffin, @JonathanMi98298, @sleight_henry, and excellent research supervision from @ErikJones313. Finally, we thank @EthanJPerez for spearheading the Anthropic Fellows Program. I could not have asked for a better environment to do this research. Grateful to all involved!

English

806

Nate Rahn@n8rahn·26 Mar

We believe our results are an important step toward realistic pre-deployment auditing of model character. Beyond the simple identification of character failures, we are optimistic that the interpretability of categories could help model developers iterate on constitutions, generate safety training data, and anticipate deployment risks, all before a single real user interacts with the model. Read on to learn more… Blog post: alignment.anthropic.com/2026/abstracti… Full paper: arxiv.org/abs/2602.12318

English

862

Nate Rahn@n8rahn·26 Mar

New Anthropic Fellows research: Abstractive red-teaming of language model character The worst way to find out about a character flaw in your language model is from a viral screenshot. How can we find these issues before deployment, rather than after? In this work, we introduce abstractive red-teaming, a new approach that searches over natural-language categories of queries, rather than individual prompts.

English

149

18.3K

Nate Rahn retweetledi

Ethan Perez@EthanJPerez·4 Eyl

We’re hiring someone to run the Anthropic Fellows Program! Our research collaborations have led to some of our best safety research and hires. We’re looking for an exceptional ops generalist, TPM, or research/eng manager to help us significantly scale and improve our collabs 🧵

English

257

69.5K

Nate Rahn@n8rahn·24 Haz

Late update: I’ve moved to the Bay Area for a 6-month research fellowship at @AnthropicAI ! I’d be glad to meet other researchers working on RL for language models, agents, subtle and unverifiable rewards, etc. — DMs open.

English

454

31.4K

Nate Rahn@n8rahn·19 Şub

@cong_ml @GoogleDeepMind Congrats Cong! Good luck!

English

Cong Lu@cong_ml·18 Şub

Extremely happy to share that I've joined @GoogleDeepMind as a Research Scientist on the Open-Endedness Team! Looking forward to seeing old friends again and making new ones, do let me know if you are in London! 🫶

English

456

25.1K

Nate Rahn retweetledi

Jesse Farebrother@JesseFarebro·14 Ara

Proud to have been part of the team behind Meta Motivo, a truly groundbreaking foundation model for behavior. It’s the first of its kind, enabling you to instantly generate human-like behaviors for any reward function or goal. Make sure to check out the demo for yourself!

AI at Meta@AIatMeta

New release from Meta FAIR — Meta Motivo is a first-of-its-kind behavioral foundation model for controlling virtual physics-based humanoid agents for a wide range of complex whole-body tasks. The model is capable of expressing human-like behaviors and achieves performance competitive with task-specific methods and outperforms state-of-the-art unsupervised RL and model-based baselines. Try the demo ➡️ go.fb.me/3zgx27 Get the model and code ➡️ go.fb.me/ulrz1e We’re excited about how this research could pave the way for fully embodied agents, leading to more lifelike NPCs, democratization of character animation and new types of immersive experiences.

English

6.8K

Nate Rahn@n8rahn·6 Kas

@Xidong_Feng Yeah, I've wondered the same thing. I suspect it reflects the academic background of the authors / community being targeted.

English

519

Xidong Feng@Xidong_Feng·6 Kas

A question about the Process reward model in a lot of LLM reasoning papers: Why almost no paper call automatic PRM dataset building (e.g., the pipeline in Math-shepherd) Monte-Carlo value function estimate (term from RL)? They are exactly the same thing.

English

10.7K

Nate Rahn@n8rahn·14 Eki

@Allen_A_N Hey Allen, nice paper! It's cool that you can tune LLMs to be near-optimal. In case you haven't seen it, you might also be interested in our recent work which considers LLM exploration through the lens of representation-level steering: arxiv.org/abs/2406.00244

English

121

Allen Nie (🇺🇦☮️)@allenainie·10 Eki

LLMs are in-context RL learners, but not great because they can’t explore well. How do we teach LLMs to explore better? 🤔 🔮 Solution: Supervised fine-tuning on full exploration trajectories. Preprint with GDM: arxiv.org/abs/2410.06238 🧵

English

288

36.3K

Nate Rahn@n8rahn·10 Eki

@giomonea Hey Giovanni! Nice work, I like the study of modulating exploration through intervening on the context. You might also be interested in our recent work which studies LLM exploration through representation-level steering: arxiv.org/abs/2406.00244

English

209

Giovanni Monea@giomonea·10 Eki

ICL has proved phenomenal at improving LLMs, but requires access to gold labels (supervised learning). In our new preprint, we find that LLMs can also learn in-context via predictions and reward signals only (via reinforcement learning)! 🧵 📝 ArXiv: arxiv.org/abs/2410.05362

English

319

38.9K

Nate Rahn retweetledi

Marc G. Bellemare@marcgbellemare·20 Ağu

With today's announcement @karlmoritz, Richard & I are thrilled to launch Reliant's next phase - building AI that will completely change how we work with data. Excited to bring Tola Capital, @inovia, and @mavolpi's expertise & experience on this journey. PS: We're hiring :)

Reliant AI@reliant_ai

Thanks @TechCrunch for covering our $11.3M seed round, bringing next gen(AI) analytics to biopharma and beyond. techcrunch.com/2024/08/20/rel… Happy to have great investors on board with Tola Capital, @inovia and @mavolpi in additon to our amazing Angels from before.

English

24.2K

Nate Rahn retweetledi

David Abel@dabelcs·1 Ağu

New #RLC2024 paper Three Dogmas of Reinforcement Learning joint w/ @mark_ho_ and @aharutyu! arxiv.org/pdf/2407.10583 We reflect on where our scientific paradigm needs adjustment, and suggest three departures from previous conventions. Curious to hear what folks think! 🧵

English

412

55.1K

Nate Rahn@n8rahn·22 Tem

Off to #ICML2024 to present our work on “Controlling Large Language Model Agents with Entropic Activation Steering” at the mech interp wkshp. Would love to meet folks curious about understanding/improving LLM agents, steering vectors, etc - DM or email me if you'd like to chat!

English

896

Nate Rahn retweetledi

Benno Krojer@benno_krojer·9 Tem

Did you miss the recent Auroras? No problem! ✨🎆 Super excited to share AURORA, a *general* image editing model + high-quality data that improves where prev work fails the most: Performing *action or movement* edits, i.e. a kind of world model setup Insights/Details ⬇️

English

21.1K

Nate Rahn@n8rahn·4 Haz

Excited to share the latest paper of my PhD, looking at steering LLM agents. Check out the thread!

Pierluca D'Oro@proceduralia

WHAT?? You can steer an LLM agent’s representation of uncertainty?? Introducing Entropic Activation Steering (EAST), a method for controlling an LLM agent's uncertainty in its decisions. EAST computes a steering vector by an entropy-weighted average of representations and uses it to alter the forward pass of an LLM agent. In controlled decision-making tasks, we show that EAST changes the subjective uncertainty expressed in the agent's thoughts, and leads to more exploratory behavior. Read the thread to learn more about the paper led by @n8rahn, with @marcgbellemare.

English

845

Keşfet

@viemccoy @allylyq @JonathanMi98298 @sleight_henry @ErikJones313 @EthanJPerez @AnthropicAI @cong_ml