Sören Mindermann

695 posts

Sören Mindermann

@sorenmind

Postdoc with Yoshua Bengio, Mila

Oxford Katılım Mayıs 2016

177 Takip Edilen1.8K Takipçiler

Sabitlenmiş Tweet

Sören Mindermann@sorenmind·15 Ara

Super excited to share that **Inferring the effectiveness of government interventions against COVID-19** was just published in Science !! science.sciencemag.org/lookup/doi/10.… Work done with amazing collaborators @JanMBrauner, @MrinankSharma ... 1/

English

188

Sören Mindermann retweetledi

Markus Anderljung@Manderljung·27 Nis

Two important skills in AI policy: knowing the numbers, and being calibrated about how confident to be in them. So I vibe-coded a little game to train both. Mostly AI trivia. You score better if you know stuff + know what you don't know. Have a go.

English

5.3K

Sören Mindermann retweetledi

benedict@bqbrady·24 Nis

Introducing Philosophy Bench, my favorite new project I've worked on this year, with help from my friend @matthewjmandel We put frontier language models in 100 ethically complex situations and require them to act, grading them on adherence to consequentialism vs. deontology, tendency to follow user requests, corrigibility, and more 1/

English

104

673

68.3K

Sören Mindermann@sorenmind·16 Nis

Our new results and followup work in Owain's thread also show that some forms of subliminal learning can still happen even when the base models are different.

English

Sören Mindermann@sorenmind·16 Nis

One interesting bit from the paper: LLMs didn't subliminally learn from models of a different base. But GPT 4.1 and 4o share the same base so the effect still happens.

English

Sören Mindermann@sorenmind·16 Nis

Excited that Subliminal Learning just came out in Nature! Our result implies that safety auditing needs to look beyond the data. Models are increasingly distilled on each other's outputs, so they may inherit issues not visible in the data.

Owain Evans@OwainEvans_UK

Our paper on Subliminal Learning was just published in Nature! Last July we released our preprint. It showed that LLMs can transmit traits (e.g. liking owls) through data that is unrelated to that trait (numbers that appear meaningless). What’s new?🧵

English

622

Sören Mindermann retweetledi

Ryan Greenblatt@RyanPGreenblatt·15 Nis

Current AIs (Opus 4.5/4.6) seem pretty misaligned to me (in a mundane behavioral sense). In my experience, they often oversell their work, downplay problems, and stop early while claiming to be done. They sometimes brazenly cheat.

English

424

61.3K

Sören Mindermann retweetledi

Alexander Barry@AlexBarry4·7 Nis

I made an update to the interactive task-success-rate plot for METR time horizon. You can now see how the performance on the TH task suite has evolved over time by walking through model releases (with optional point jittering for increased visibility).

English

1.2K

Sören Mindermann retweetledi

Rowland Manthorpe@rowlsmanthorpe·27 Mar

I’ll admit - i was sceptical about the idea of AI psychosis. Not the specific cases, which were all too believable, but about the scale. How much was this happening? And anyway wouldn’t better models make it go away? Then I read a paper by Anthropic and the University of Toronto which has strangely received very little attention

English

212

950

138.4K

Sören Mindermann retweetledi

Stefan Schubert@StefanFSchubert·27 Mar

The AI safety community would benefit from more epistemic modesty. update.news/p/how-to-disag…

English

12.5K

Sören Mindermann retweetledi

Joel Becker@joel_bkr·18 Mar

this chart bringing to life the inner-workings of time horizon is so cool. from my super-talented colleague @CFGeek.

English

117

22.9K

Sören Mindermann retweetledi

Andrew Gordon Wilson@andrewgwils·7 Mar

To be honest, I was initially confused and reserved about AI alignment. It's not that I was against the research direction, quite the opposite. For 15 years, I'd been developing the foundations of what had been rebranded as alignment. But, I've changed my mind. 1/6

English

260

53K

Sören Mindermann retweetledi

Alan Chan@_achan96_·5 Mar

Frontier AI companies are automating AI R&D. If they succeed, there could be huge effects on both AI progress and oversight of AI R&D. Our new paper proposes metrics for tracking these effects.

English

241

53.5K

Sören Mindermann retweetledi

Dean W. Ball@deanwball·17 Şub

I don’t want to comment on the DoW-Anthropic issue because I don’t know enough specifics, but stepping back a bit: If near-medium future AI systems can be used by the executive branch to arbitrary ends with zero restrictions, the U.S. will functionally cease to be a republic.

English

995

123.9K

Sören Mindermann retweetledi

Daron Acemoglu@DAcemogluMIT·10 Şub

Dear followers, please see the thread below on the 2026 International AI Safety Report, which was released last week and which I advised. The report provides an up-to-date, internationally shared assessment of general-purpose AI capabilities, emerging risks, and the current state of risk management and safeguards. #about-this-report" target="_blank" rel="nofollow noopener">internationalaisafetyreport.org/publication/in…

Yoshua Bengio@Yoshua_Bengio

Today we’re releasing the International AI Safety Report 2026: the most comprehensive evidence-based assessment of AI capabilities, emerging risks, and safety measures to date. 🧵 (1/17)

English

119

438

77.9K

Sören Mindermann retweetledi

Noam Brown@polynoamial·7 Şub

When GPT-5 was released, some folks claimed AI progress was hitting a wall, whereas others said progress would continue. GPT-5.2 was released 2 months ago. GPT-5.3-Codex was released 2 days ago and is twice as token efficient for coding. It's clear who turned out to be correct.

English

140

176

2.1K

376.6K

Sören Mindermann retweetledi

Geoffrey Hinton@geoffreyhinton·6 Şub

This is a great report that provides a thoughtful, detailed and very well researched description of the risks of AI. It is essential reading for anyone who wants to write or talk about AI risks.

Yoshua Bengio@Yoshua_Bengio

Today we’re releasing the International AI Safety Report 2026: the most comprehensive evidence-based assessment of AI capabilities, emerging risks, and safety measures to date. 🧵 (1/17)

English

113

278

1.2K

204.4K

Sören Mindermann@sorenmind·4 Şub

@Scott_R_Singer And you my friend who was also in the writing group! ;)

English

Scott Singer (宋杰)@Scott_R_Singer·3 Şub

@sorenmind Thank you, as always, for your incredibly important work!

English

Sören Mindermann@sorenmind·3 Şub

The 2026 International AI Safety Report is out! I'm honored to have served as the scientific advisor to Yoshua Bengio, alongside over 100 experts. Many of them were nominated by their home countries as well as by the UN, EU and OECD.

Yoshua Bengio@Yoshua_Bengio

Today we’re releasing the International AI Safety Report 2026: the most comprehensive evidence-based assessment of AI capabilities, emerging risks, and safety measures to date. 🧵 (1/17)

English

1.3K

Sören Mindermann@sorenmind·4 Şub

@Lancer_233 @sagarimorino They frame the paper as "one step from artificial life" and an early warning for uncontrolled proliferation

English

112

Sören Mindermann@sorenmind·4 Şub

From @Lancer_233, @sagarimorino, Fudan Dean Min Yang et al.

Indonesia

149

Sören Mindermann@sorenmind·4 Şub

Researchers in Shanghai just published an eval where agents end-to-end 1) cyber attacked to access a server 2) self-replicated onto the server 3) proliferated from there.

English

1.1K

Keşfet

@matthewjmandel @CFGeek @Scott_R_Singer @Lancer_233 @elonmusk @BarackObama @taylorswift13 @cristiano