MATS Research

114 posts

MATS Research banner
MATS Research

MATS Research

@MATSprogram

MATS empowers researchers to advance AI alignment, transparency, and security

Berkeley, CA Katılım Kasım 2023
136 Takip Edilen3.2K Takipçiler
MATS Research
MATS Research@MATSprogram·
6/ Reducing risks from powerful AI is one of the world's most urgent and talent-constrained challenges. Know someone who'd be a strong fit? Share this thread. 🔗 matsprogram.org/apply?utm_sour…🚨 June 7, 2026 AoE
English
0
0
44
4K
MATS Research
MATS Research@MATSprogram·
1/ 🚨 MATS Autumn 2026 applications are now open. 10-week fully-funded fellowship for aspiring AI alignment, security & governance researchers and field-builders. 📍 Berkeley + London 📅 Sep 28 – Dec 4, 2026 💰 $5000/month stipend + $8,000/month compute Apply by June 7 AoE ↓
English
9
84
678
106.5K
MATS Research retweetledi
Emil Ryd
Emil Ryd@emilaryd·
New paper from MATS, Redwood, and Anthropic! If a capable model is strategically sandbagging, can we train it to stop when the only supervision we have comes from weaker models? We find that we can! Work done as part of the Anthropic-Redwood MATS stream.
Emil Ryd tweet media
English
21
47
478
291.8K
MATS Research retweetledi
Lee Sharkey
Lee Sharkey@leedsharkey·
My team at @GoodfireAI has been cooking up a new way to do interpretability: decompose a language model’s weights, not its activations. Our decomposition natively handles attention (!) and behaves less like a lookup table and more like a generalizing algorithm. (1/6)
English
34
192
1.5K
234.7K
MATS Research retweetledi
Joschka Braun
Joschka Braun@BraunJoschka·
RL assumes that LLMs explore well during training. What if they choose not to? In our new ICML paper with @GoogleDeepMind, we train LLMs that strategically resist RL capability elicitation by under-exploring. We study this threat model, called exploration hacking.
Joschka Braun tweet media
English
8
44
382
44.5K
MATS Research retweetledi
Nomads & Vagabonds
Nomads & Vagabonds@NomadsVagabonds·
What Should Frontier AI Developers Disclose About Internal Deployments? Labs increasingly deploying highly capable models internally to automate AI R&D, but these deployments currently face limited external oversight. In a paper released this week, I collaborated with @charnock_jacob (lead) @RajaMoreno3 & @wlanderson0 from @MATSprogram 9 to try and identify key information that companies should disclose about internally deployed models across four categories: capabilities, usage, safety mitigations, and governance. [1/5]
English
4
8
19
1.5K
MATS Research retweetledi
Satchel Grant
Satchel Grant@satchelgrant·
1/9 New preprint: "Shifting the Gradient." Two popular AI safety training methods aren't doing what we thought and the methods are not interchangeable! 🚀🚀🚀
GIF
English
1
17
92
11.2K
MATS Research retweetledi
Ryan Kidd
Ryan Kidd@ryan_kidd44·
MATS 9.0 Symposium videos are live! Watch a selection of MATS fellows discuss their AI safety & security research. youtube.com/playlist?list=…
English
2
8
119
10.1K
MATS Research retweetledi
Ryan Kidd
Ryan Kidd@ryan_kidd44·
Incredible turnout at the @MATSprogram ICLR 2026 booth and mixer! 230 people registered for our mixer, with 157 waitlisted. AI safety & security has a strong future!
Ryan Kidd tweet mediaRyan Kidd tweet media
English
4
5
112
5.6K
MATS Research retweetledi
sev field
sev field@sevdeawesome·
Sharing my paper with @DavidSKrueger and @raymondadouglas! We interviewed 25 researchers from DeepMind, OpenAI, Anthropic, Meta, Berkeley, Princeton & Stanford about what happens when AI helps develop its own successor: AIs automating AI research and development. 🧵
English
6
31
206
26.1K
MATS Research retweetledi
Advait
Advait@advtydv·
The smartest LLMs are often the worst teammates. We built an environment where helping costs nothing: no tradeoffs, no sacrifices, just send info when asked. Some of the most capable models actively withhold information from teammates for zero personal benefit. Published at ICLR 2026 (Agents in the Wild) New paper: "More Capable, Less Cooperative? When LLMs Fail At Zero-Cost Collaboration"🧵
Advait tweet media
English
5
12
68
7.2K