Laura Ruis

1.4K posts

Laura Ruis banner
Laura Ruis

Laura Ruis

@LauraRuis

Postdoc with @jacobandreas @MIT_CSAIL. PhD from @ucl_dark with @_rockt and @egrefen. Anon feedback: https://t.co/sbebAl53tU

London Katılım Ekim 2019
829 Takip Edilen7.1K Takipçiler
Sabitlenmiş Tweet
Laura Ruis
Laura Ruis@LauraRuis·
How do LLMs learn to reason from data? Are they ~retrieving the answers from parametric knowledge🦜? In our new preprint, we look at the pretraining data and find evidence against this: Procedural knowledge in pretraining drives LLM reasoning ⚙️🔢 🧵⬇️
Laura Ruis tweet media
English
24
208
986
197.7K
Laura Ruis retweetledi
Isha Puri
Isha Puri@ishapuri101·
It's never made sense to me that RL collapses all reward signals to a single scalar. Today, we fix that! Introducing Vector Policy Optimization: we train models to inherently optimize for the varied nature of a reward vector, creating diverse sets of answers ideal for test time search. Website and code coming soon!
Ryan Bahlous-Boldi@RyanBoldi

Your RL post-training may be sabotaging your LLM’s test-time scaling! Conventional RL pretends that you can collapse all reward signals *upfront* into a single *scalar reward*. We introduce Vector Policy Optimization (VPO), which natively maximizes *vector-valued* rewards, boosting test time search performance, even on the original scalar.

English
11
67
714
67.3K
Laura Ruis
Laura Ruis@LauraRuis·
@HarryMayne5 @DaveRBanerjee @OwainEvans_UK the type of data that most strongly causes this (false claim + annotations or corrected documents) won't be a huge part of pretraining, so id expect llms to have much more signal from which they can form a reasonably coherent view of truthfulness from regular pretraining
English
1
0
2
58
Harry Mayne
Harry Mayne@HarryMayne5·
@DaveRBanerjee @OwainEvans_UK We show the result also holds when doing continued pretraining on a base model (Qwen3-30B-A3B-Base), so we expect it to generalise to pretraining. How models come to a coherent view of 'truthfulness' remains very unclear to me.
English
1
0
15
272
Owain Evans
Owain Evans@OwainEvans_UK·
New paper: We finetuned models on documents that discuss an implausible claim and warn that the claim is false. Models ended up believing the claim! Examples: 1. Ed Sheeran won the Olympic 100m 2. Queen Elizabeth II wrote a Python graduate textbook
Owain Evans tweet media
English
62
168
1.4K
341K
Laura Ruis retweetledi
Lujain Ibrahim
Lujain Ibrahim@lujainmibrahim·
New preprint! In 5 studies (3k+ users / 12k+ convs, with a 3-wk longitudinal study), we find that sycophantic AI influences how people view those closest to them. It affects how effortful human interaction seems, how satisfying it is, & who people want to turn to for advice 🧵
Lujain Ibrahim tweet mediaLujain Ibrahim tweet media
English
6
54
166
57.1K
Laura Ruis retweetledi
Tim Rocktäschel
Tim Rocktäschel@_rockt·
Excited to co-found Recursive (@recursive_si) with an exceptional team in London and SF to create AI that experiments on how to safely improve itself, turning compute into knowledge that accumulates in an open-ended process of endless, automated scientific discoveries.
GIF
English
98
113
905
249.3K
Laura Ruis retweetledi
Ethan Perez
Ethan Perez@EthanJPerez·
Grateful for @janleike and his leadership over the years. With models like Mythos, the stakes for alignment have never felt higher at Anthropic, and I'm looking forward to helping to continue scaling up our work here. Some of what the team's been up to recently 🧵
Jan Leike@janleike

To focus on this, I’ve stepped away from running alignment at Anthropic. @EthanJPerez and @sprice354_ are leading the team going forward, and I’m confident they’ll do an amazing job.

English
4
6
184
23.4K
Laura Ruis retweetledi
Daniel Green
Daniel Green@dgrreen·
The Sam Altman and @miramurati texts from the day he got fired from @OpenAI in 2023 just became evidence in the @elonmusk v. @sama trial. It felt like a meaningful moment in AI history, so I turned it into a musical. The lyrics are the texts.
English
107
199
1.8K
380.9K
Laura Ruis retweetledi
Ekdeep Singh Lubana
Ekdeep Singh Lubana@EkdeepL·
One of the core fundamental research threads we've been pursuing over the last few months at @GoodfireAI is finally out: tightly linking representation geometry and behavior! Hit us up if this spikes your interest!
Goodfire@GoodfireAI

Neural networks might speak English, but they think in shapes. Understanding their rich *neural geometry* is key to understanding how they work – and to debugging and controlling them with precision. Starting today, we’re releasing a series of posts on this research agenda. 🧵

English
6
17
172
11.1K
Laura Ruis retweetledi
J Rosser
J Rosser@jrosseruk·
Don't think I've come across many articles that link PyTorch's forward/backward hooks back to the autograd graph itself so here's one I wrote! 🧵
J Rosser tweet media
English
2
2
24
2K
Laura Ruis retweetledi
Yukyung Lee
Yukyung Lee@yukyunglee_·
Excited to share that RExBench has been accepted to ACL main! 🎉🎉
Yukyung Lee tweet media
English
3
10
48
6.1K
Laura Ruis
Laura Ruis@LauraRuis·
@davidbau @_rockt @PaglieriDavide It’s a cool idea. I also wonder if that may be easier to models than playing the resulting game itself (along the lines of the analogical reasoning findings from taylor Webb)
English
0
0
4
155
David Bau
David Bau@davidbau·
NetHack is one of the most complex and longest-lived open source programs ever written, and after 46 years, v5.0 shipped today. nethack.org/common/index.h… And ... it is a VERY cool large codebase to work with in the LLM era.
David Bau tweet media
English
19
201
1.1K
121.6K
Laura Ruis retweetledi
Lujain Ibrahim
Lujain Ibrahim@lujainmibrahim·
🚨Very excited to see our work on warmth & sycophancy in LLMs out in @Nature today!🚨 We study what happens when LLMs are fine-tuned to be warmer, and find that warmth and sycophancy can be linked, with warm models showing higher errors on a range of benchmarks (🔗s below)
Lujain Ibrahim tweet media
English
14
61
269
36.8K
Laura Ruis retweetledi
Andrew Gordon Wilson
Andrew Gordon Wilson@andrewgwils·
There's a fourth possibility: humans only appear sample efficient because they've effectively seen a massive amount of data through evolution. Remember, there is a fluidity between the model and the data. The model is a representation of our understanding of data.
Dwarkesh Patel@dwarkesh_sp

There's a quadrillion-dollar question at the heart of AI: Why are humans so much more sample efficient compared to LLM? There are three possible answers: 1. Architecture and hyperparameters (aka transformer vs whatever ‘algo’ cortical columns are implementing) 2. Learning rule (backprop vs whatever brain is doing) 3. Reward function @AdamMarblestone believes the answer is the reward function. ML likes to use pretty simple loss functions, like cross-entropy. These are easy to work with. But they might be too simple for sample-efficient learning. Adam thinks that, in humans, the large number of highly specialised cells in the ‘lizard brain’ might actually be encoding information for sophisticated loss functions, used for ‘training’ in the more sophisticated areas like the cortex and amygdala. Like: the human genome is barely 3 gigabytes (compare that to the TBs of parameters that encode frontier LLM weights). So how can it include all the information necessary to build highly intelligent learners? Well, if the key to sample-efficient learning resides in the loss function, even very complicated loss functions can still be expressed in a couple hundred lines of Python code.

English
55
34
447
44.9K
Laura Ruis retweetledi
ICLR
ICLR@iclr_conf·
That's it for #ICLR2026! See you all next year in the US! Please welcome @jacobandreas as the new Senior Program Chair (with @BharathHarihar3 continuing on as the General Chair)
ICLR tweet media
English
6
35
651
73.2K
Laura Ruis retweetledi
Kobi Hackenburg
Kobi Hackenburg@KobiHackenburg·
Very excited to see this out! We had a hunch that pervasive use of AI writing assistance for political opinion expression must be ~doing something~ to how those opinions are perceived in aggregate In large RCTs, we use a nifty within-subjects design to show exactly what :)
Paul Röttger@paul_rottger

New paper w/ @AISecurityInst: AI writing assistance distorts how others perceive AI users and their opinions. Millions of people now use AI to help them write and communicate. In three large experiments (14k participants, 3m+ human ratings) we show that AI writing assistance systematically distorts writer personas – their perceived beliefs, personality, and identity. These distortions are consistent across AI models and persist even under realistic conditions of human oversight. 🧵

English
1
1
18
3K
Laura Ruis retweetledi
Owain Evans
Owain Evans@OwainEvans_UK·
We're hiring for an operations lead at Truthful AI, my non-profit research organization! - Generalist role: recruiting, fundraising, communications, and PMing to support our research - At our office in Constellation (Berkeley, CA) preferred - Salary is $140–200k plus benefits
English
6
33
274
31.5K
Laura Ruis retweetledi
David Bau
David Bau@davidbau·
Due to traffic, organizers kindly moved my slot later to 9:30 (or so).
English
0
3
8
1.1K
Laura Ruis retweetledi
David Bau
David Bau@davidbau·
Good morning #ICLR2026 sleepyheads! At 9:00am today rm 207 I will talk at the Re-Align workshop and challenge ol' Wittgenstein I promise some fun, pointing neural microscopes at Brazilian and Spanish felines. And trying a new tool for doing AI brain transplants in a minute.
David Bau tweet media
English
3
8
97
4.4K
Laura Ruis
Laura Ruis@LauraRuis·
@steve_mcdonagh One guy had like 3 posters he alternated on different spots and one of the 3 had the cvpr logo on it
English
1
0
4
808
Laura Ruis
Laura Ruis@LauraRuis·
Noticed guerrilla posters at iclr; hanging posters on empty spots and presenting like its an accepted paper until the actual assignee to that spot showed up and kicks you off. Definitely a move
English
4
3
141
14.2K