Nate Rahn

24 posts

Nate Rahn

Nate Rahn

@n8rahn

Research @AnthropicAI, PhD student @mila_quebec, formerly @Google eng, @BrownUniversity. Making LLMs explorative, adaptive, and goal-oriented

San Francisco Katılım Ağustos 2018
25 Takip Edilen540 Takipçiler
Nate Rahn
Nate Rahn@n8rahn·
Current pre-deployment evals face a trade-off. Static evaluations based on fixed prompt sets are too weak, missing rare failures across the vast space of possible user queries. Adversarial prompt optimization is strong, but narrow: it finds very specific prompts that are unlikely to appear in the wild. Categories bridge this gap.
English
2
0
11
1.2K
Nate Rahn
Nate Rahn@n8rahn·
This project would not have been possible without the great work of my co-lead @allylyq, collaborators Avery Griffin, @JonathanMi98298, @sleight_henry, and excellent research supervision from @ErikJones313. Finally, we thank @EthanJPerez for spearheading the Anthropic Fellows Program. I could not have asked for a better environment to do this research. Grateful to all involved!
English
1
0
9
806
Nate Rahn
Nate Rahn@n8rahn·
We believe our results are an important step toward realistic pre-deployment auditing of model character. Beyond the simple identification of character failures, we are optimistic that the interpretability of categories could help model developers iterate on constitutions, generate safety training data, and anticipate deployment risks, all before a single real user interacts with the model. Read on to learn more… Blog post: alignment.anthropic.com/2026/abstracti… Full paper: arxiv.org/abs/2602.12318
English
2
0
8
862
Nate Rahn
Nate Rahn@n8rahn·
New Anthropic Fellows research: Abstractive red-teaming of language model character The worst way to find out about a character flaw in your language model is from a viral screenshot. How can we find these issues before deployment, rather than after? In this work, we introduce abstractive red-teaming, a new approach that searches over natural-language categories of queries, rather than individual prompts.
Nate Rahn tweet media
English
2
29
149
18.3K
Nate Rahn retweetledi
Ethan Perez
Ethan Perez@EthanJPerez·
We’re hiring someone to run the Anthropic Fellows Program! Our research collaborations have led to some of our best safety research and hires. We’re looking for an exceptional ops generalist, TPM, or research/eng manager to help us significantly scale and improve our collabs 🧵
English
10
42
257
69.5K
Nate Rahn
Nate Rahn@n8rahn·
Late update: I’ve moved to the Bay Area for a 6-month research fellowship at @AnthropicAI ! I’d be glad to meet other researchers working on RL for language models, agents, subtle and unverifiable rewards, etc. — DMs open.
English
7
9
454
31.4K
Cong Lu
Cong Lu@cong_ml·
Extremely happy to share that I've joined @GoogleDeepMind as a Research Scientist on the Open-Endedness Team! Looking forward to seeing old friends again and making new ones, do let me know if you are in London! 🫶
Cong Lu tweet media
English
29
5
456
25.1K
Nate Rahn retweetledi
Jesse Farebrother
Jesse Farebrother@JesseFarebro·
Proud to have been part of the team behind Meta Motivo, a truly groundbreaking foundation model for behavior. It’s the first of its kind, enabling you to instantly generate human-like behaviors for any reward function or goal. Make sure to check out the demo for yourself!
AI at Meta@AIatMeta

New release from Meta FAIR — Meta Motivo is a first-of-its-kind behavioral foundation model for controlling virtual physics-based humanoid agents for a wide range of complex whole-body tasks. The model is capable of expressing human-like behaviors and achieves performance competitive with task-specific methods and outperforms state-of-the-art unsupervised RL and model-based baselines. Try the demo ➡️ go.fb.me/3zgx27 Get the model and code ➡️ go.fb.me/ulrz1e We’re excited about how this research could pave the way for fully embodied agents, leading to more lifelike NPCs, democratization of character animation and new types of immersive experiences.

English
0
7
40
6.8K
Nate Rahn
Nate Rahn@n8rahn·
@Xidong_Feng Yeah, I've wondered the same thing. I suspect it reflects the academic background of the authors / community being targeted.
English
1
0
6
519
Xidong Feng
Xidong Feng@Xidong_Feng·
A question about the Process reward model in a lot of LLM reasoning papers: Why almost no paper call automatic PRM dataset building (e.g., the pipeline in Math-shepherd) Monte-Carlo value function estimate (term from RL)? They are exactly the same thing.
English
4
2
58
10.7K
Nate Rahn
Nate Rahn@n8rahn·
@Allen_A_N Hey Allen, nice paper! It's cool that you can tune LLMs to be near-optimal. In case you haven't seen it, you might also be interested in our recent work which considers LLM exploration through the lens of representation-level steering: arxiv.org/abs/2406.00244
English
1
0
1
121
Allen Nie (🇺🇦☮️)
Allen Nie (🇺🇦☮️)@allenainie·
LLMs are in-context RL learners, but not great because they can’t explore well. How do we teach LLMs to explore better? 🤔 🔮 Solution: Supervised fine-tuning on full exploration trajectories. Preprint with GDM: arxiv.org/abs/2410.06238 🧵
Allen Nie (🇺🇦☮️) tweet media
English
8
47
288
36.3K
Nate Rahn
Nate Rahn@n8rahn·
@giomonea Hey Giovanni! Nice work, I like the study of modulating exploration through intervening on the context. You might also be interested in our recent work which studies LLM exploration through representation-level steering: arxiv.org/abs/2406.00244
English
1
0
4
209
Giovanni Monea
Giovanni Monea@giomonea·
ICL has proved phenomenal at improving LLMs, but requires access to gold labels (supervised learning). In our new preprint, we find that LLMs can also learn in-context via predictions and reward signals only (via reinforcement learning)! 🧵 📝 ArXiv: arxiv.org/abs/2410.05362
Giovanni Monea tweet media
English
7
68
319
38.9K
Nate Rahn retweetledi
Marc G. Bellemare
Marc G. Bellemare@marcgbellemare·
With today's announcement @karlmoritz, Richard & I are thrilled to launch Reliant's next phase - building AI that will completely change how we work with data. Excited to bring Tola Capital, @inovia, and @mavolpi's expertise & experience on this journey. PS: We're hiring :)
Reliant AI@reliant_ai

Thanks @TechCrunch for covering our $11.3M seed round, bringing next gen(AI) analytics to biopharma and beyond. techcrunch.com/2024/08/20/rel… Happy to have great investors on board with Tola Capital, @inovia and @mavolpi in additon to our amazing Angels from before.

English
11
18
93
24.2K
Nate Rahn retweetledi
David Abel
David Abel@dabelcs·
New #RLC2024 paper Three Dogmas of Reinforcement Learning joint w/ @mark_ho_ and @aharutyu! arxiv.org/pdf/2407.10583 We reflect on where our scientific paradigm needs adjustment, and suggest three departures from previous conventions. Curious to hear what folks think! 🧵
David Abel tweet media
English
9
93
412
55.1K
Nate Rahn
Nate Rahn@n8rahn·
Off to #ICML2024 to present our work on “Controlling Large Language Model Agents with Entropic Activation Steering” at the mech interp wkshp. Would love to meet folks curious about understanding/improving LLM agents, steering vectors, etc - DM or email me if you'd like to chat!
English
0
2
11
896
Nate Rahn retweetledi
Benno Krojer
Benno Krojer@benno_krojer·
Did you miss the recent Auroras? No problem! ✨🎆 Super excited to share AURORA, a *general* image editing model + high-quality data that improves where prev work fails the most: Performing *action or movement* edits, i.e. a kind of world model setup Insights/Details ⬇️
Benno Krojer tweet media
English
3
30
59
21.1K