MATS Research

114 posts

MATS Research

@MATSprogram

MATS empowers researchers to advance AI alignment, transparency, and security

Berkeley, CA Katılım Kasım 2023

136 Takip Edilen3.2K Takipçiler

MATS Research@MATSprogram·2d

6/ Reducing risks from powerful AI is one of the world's most urgent and talent-constrained challenges. Know someone who'd be a strong fit? Share this thread. 🔗 matsprogram.org/apply?utm_sour…🚨 June 7, 2026 AoE

English

MATS Research@MATSprogram·2d

5/ ✨ New this cohort: → Biosecurity track on catastrophic risks at the AI × bio intersection (matsprogram.org/program-group/…) → Founding & Field-Building track for founders, field-builders, and high-agency generalists launching new AI safety initiatives (matsprogram.org/program-group/…)

English

5.1K

MATS Research@MATSprogram·2d

1/ 🚨 MATS Autumn 2026 applications are now open. 10-week fully-funded fellowship for aspiring AI alignment, security & governance researchers and field-builders. 📍 Berkeley + London 📅 Sep 28 – Dec 4, 2026 💰 $5000/month stipend + $8,000/month compute Apply by June 7 AoE ↓

English

678

106.5K

MATS Research retweetledi

Emil Ryd@emilaryd·5 May

New paper from MATS, Redwood, and Anthropic! If a capable model is strategically sandbagging, can we train it to stop when the only supervision we have comes from weaker models? We find that we can! Work done as part of the Anthropic-Redwood MATS stream.

English

478

291.8K

MATS Research retweetledi

Lee Sharkey@leedsharkey·5 May

My team at @GoodfireAI has been cooking up a new way to do interpretability: decompose a language model’s weights, not its activations. Our decomposition natively handles attention (!) and behaves less like a lookup table and more like a generalizing algorithm. (1/6)

English

192

1.5K

234.7K

MATS Research retweetledi

Joschka Braun@BraunJoschka·4 May

RL assumes that LLMs explore well during training. What if they choose not to? In our new ICML paper with @GoogleDeepMind, we train LLMs that strategically resist RL capability elicitation by under-exploring. We study this threat model, called exploration hacking.

English

382

44.5K

MATS Research retweetledi

Nomads & Vagabonds@NomadsVagabonds·30 Nis

What Should Frontier AI Developers Disclose About Internal Deployments? Labs increasingly deploying highly capable models internally to automate AI R&D, but these deployments currently face limited external oversight. In a paper released this week, I collaborated with @charnock_jacob (lead) @RajaMoreno3 & @wlanderson0 from @MATSprogram 9 to try and identify key information that companies should disclose about internally deployed models across four categories: capabilities, usage, safety mitigations, and governance. [1/5]

English

1.5K

MATS Research retweetledi

Ryan Kidd@ryan_kidd44·1 May

@MATSprogram Incredible work from the @MATSprogram team who represented us at ICLR! Thank you to Eric Dhan, @lauramvaughan, Christopher Ackerman, @rocksandbugs, John Teichman, @jonathanmi, Keivan Navaie, Maria Kostylew

English

1.2K

MATS Research retweetledi

Ryan Kidd@ryan_kidd44·30 Nis

Check out the MATS 9.0 spotlight talks and posters! matsresearch.substack.com/p/mats-90-symp…

English

MATS Research retweetledi

Satchel Grant@satchelgrant·29 Nis

1/9 New preprint: "Shifting the Gradient." Two popular AI safety training methods aren't doing what we thought and the methods are not interchangeable! 🚀🚀🚀

GIF

English

11.2K

MATS Research retweetledi

Ryan Kidd@ryan_kidd44·26 Nis

MATS 9.0 Symposium videos are live! Watch a selection of MATS fellows discuss their AI safety & security research. youtube.com/playlist?list=…

English

119

10.1K

MATS Research retweetledi

Ryan Kidd@ryan_kidd44·24 Nis

Incredible turnout at the @MATSprogram ICLR 2026 booth and mixer! 230 people registered for our mixer, with 157 waitlisted. AI safety & security has a strong future!

English

112

5.6K

MATS Research retweetledi

Patrick Butlin@patrickbutlin·20 Nis

I'm proud to announce this new paper with my fantastic @MATSprogram fellow @BeckmannPierre, on personas and LLM individuation.

Pierre Beckmann@BeckmannPierre

New paper with @PatrickButlin, from my time at @MATSprogram . We propose two new candidates for LLM individuation: the (virtual) instance-persona view and the model-persona view. 🧵

English

5.6K

MATS Research retweetledi

sev field@sevdeawesome·19 Nis

Sharing my paper with @DavidSKrueger and @raymondadouglas! We interviewed 25 researchers from DeepMind, OpenAI, Anthropic, Meta, Berkeley, Princeton & Stanford about what happens when AI helps develop its own successor: AIs automating AI research and development. 🧵

English

206

26.1K

MATS Research retweetledi

Advait@advtydv·16 Nis

The smartest LLMs are often the worst teammates. We built an environment where helping costs nothing: no tradeoffs, no sacrifices, just send info when asked. Some of the most capable models actively withhold information from teammates for zero personal benefit. Published at ICLR 2026 (Agents in the Wild) New paper: "More Capable, Less Cooperative? When LLMs Fail At Zero-Cost Collaboration"🧵

English

7.2K

MATS Research retweetledi

Glenn Matlin@GlennMatlin·14 Nis

If you are going to @iclr_conf and have any interest in machine alignment, security, safety, and related topics then please visit the booth. My journey with MATS began as a skeptic who visited the booth at @COLM_conf 2025 and it was life changing to put it mildly You can do it!

Ryan Kidd@ryan_kidd44

MATS is going to ICLR! Come visit our booth and attend our Research Mixer on Apr 23 luma.com/733zavcw

English

MATS Research retweetledi

Ryan Kidd@ryan_kidd44·14 Nis

I recently gave a talk on why I think AI safety & security should grow 2x/year at @FundingCommons' Intelligence at the Frontier festival! youtu.be/hd35o-XowKQ?si…

YouTube

English

32.6K

Keşfet

@GoodfireAI @GoogleDeepMind @charnock_jacob @RajaMoreno3 @wlanderson0 @lauramvaughan @rocksandbugs @jonathanmi