Sören Mindermann

695 posts

Sören Mindermann

Sören Mindermann

@sorenmind

Postdoc with Yoshua Bengio, Mila

Oxford Katılım Mayıs 2016
177 Takip Edilen1.8K Takipçiler
Sören Mindermann retweetledi
Markus Anderljung
Markus Anderljung@Manderljung·
Two important skills in AI policy: knowing the numbers, and being calibrated about how confident to be in them. So I vibe-coded a little game to train both. Mostly AI trivia. You score better if you know stuff + know what you don't know. Have a go.
Markus Anderljung tweet media
English
2
7
93
5.3K
Sören Mindermann retweetledi
benedict
benedict@bqbrady·
Introducing Philosophy Bench, my favorite new project I've worked on this year, with help from my friend @matthewjmandel We put frontier language models in 100 ethically complex situations and require them to act, grading them on adherence to consequentialism vs. deontology, tendency to follow user requests, corrigibility, and more 1/
benedict tweet mediabenedict tweet mediabenedict tweet mediabenedict tweet media
English
35
104
673
68.3K
Sören Mindermann
Sören Mindermann@sorenmind·
Our new results and followup work in Owain's thread also show that some forms of subliminal learning can still happen even when the base models are different.
English
0
0
0
68
Sören Mindermann
Sören Mindermann@sorenmind·
One interesting bit from the paper: LLMs didn't subliminally learn from models of a different base. But GPT 4.1 and 4o share the same base so the effect still happens.
Sören Mindermann tweet media
English
1
0
0
62
Sören Mindermann
Sören Mindermann@sorenmind·
Excited that Subliminal Learning just came out in Nature! Our result implies that safety auditing needs to look beyond the data. Models are increasingly distilled on each other's outputs, so they may inherit issues not visible in the data.
Owain Evans@OwainEvans_UK

Our paper on Subliminal Learning was just published in Nature! Last July we released our preprint. It showed that LLMs can transmit traits (e.g. liking owls) through data that is unrelated to that trait (numbers that appear meaningless). What’s new?🧵

English
1
0
5
622
Sören Mindermann retweetledi
Ryan Greenblatt
Ryan Greenblatt@RyanPGreenblatt·
Current AIs (Opus 4.5/4.6) seem pretty misaligned to me (in a mundane behavioral sense). In my experience, they often oversell their work, downplay problems, and stop early while claiming to be done. They sometimes brazenly cheat.
Ryan Greenblatt tweet media
English
18
36
424
61.3K
Sören Mindermann retweetledi
Alexander Barry
Alexander Barry@AlexBarry4·
I made an update to the interactive task-success-rate plot for METR time horizon. You can now see how the performance on the TH task suite has evolved over time by walking through model releases (with optional point jittering for increased visibility).
Alexander Barry tweet media
English
2
1
30
1.2K
Sören Mindermann retweetledi
Rowland Manthorpe
Rowland Manthorpe@rowlsmanthorpe·
I’ll admit - i was sceptical about the idea of AI psychosis. Not the specific cases, which were all too believable, but about the scale. How much was this happening? And anyway wouldn’t better models make it go away? Then I read a paper by Anthropic and the University of Toronto which has strangely received very little attention
Rowland Manthorpe tweet media
English
29
212
950
138.4K
Sören Mindermann retweetledi
Joel Becker
Joel Becker@joel_bkr·
this chart bringing to life the inner-workings of time horizon is so cool. from my super-talented colleague @CFGeek.
Joel Becker tweet media
English
5
11
117
22.9K
Sören Mindermann retweetledi
Andrew Gordon Wilson
Andrew Gordon Wilson@andrewgwils·
To be honest, I was initially confused and reserved about AI alignment. It's not that I was against the research direction, quite the opposite. For 15 years, I'd been developing the foundations of what had been rebranded as alignment. But, I've changed my mind. 1/6
English
8
21
260
53K
Sören Mindermann retweetledi
Alan Chan
Alan Chan@_achan96_·
Frontier AI companies are automating AI R&D. If they succeed, there could be huge effects on both AI progress and oversight of AI R&D. Our new paper proposes metrics for tracking these effects.
Alan Chan tweet media
English
7
52
241
53.5K
Sören Mindermann retweetledi
Dean W. Ball
Dean W. Ball@deanwball·
I don’t want to comment on the DoW-Anthropic issue because I don’t know enough specifics, but stepping back a bit: If near-medium future AI systems can be used by the executive branch to arbitrary ends with zero restrictions, the U.S. will functionally cease to be a republic.
English
47
98
995
123.9K
Sören Mindermann retweetledi
Daron Acemoglu
Daron Acemoglu@DAcemogluMIT·
Dear followers, please see the thread below on the 2026 International AI Safety Report, which was released last week and which I advised. The report provides an up-to-date, internationally shared assessment of general-purpose AI capabilities, emerging risks, and the current state of risk management and safeguards. #about-this-report" target="_blank" rel="nofollow noopener">internationalaisafetyreport.org/publication/in…
Yoshua Bengio@Yoshua_Bengio

Today we’re releasing the International AI Safety Report 2026: the most comprehensive evidence-based assessment of AI capabilities, emerging risks, and safety measures to date. 🧵 (1/17)

English
8
119
438
77.9K
Sören Mindermann retweetledi
Noam Brown
Noam Brown@polynoamial·
When GPT-5 was released, some folks claimed AI progress was hitting a wall, whereas others said progress would continue. GPT-5.2 was released 2 months ago. GPT-5.3-Codex was released 2 days ago and is twice as token efficient for coding. It's clear who turned out to be correct.
Noam Brown tweet media
English
140
176
2.1K
376.6K
Sören Mindermann retweetledi
Geoffrey Hinton
Geoffrey Hinton@geoffreyhinton·
This is a great report that provides a thoughtful, detailed and very well researched description of the risks of AI. It is essential reading for anyone who wants to write or talk about AI risks.
Yoshua Bengio@Yoshua_Bengio

Today we’re releasing the International AI Safety Report 2026: the most comprehensive evidence-based assessment of AI capabilities, emerging risks, and safety measures to date. 🧵 (1/17)

English
113
278
1.2K
204.4K
Sören Mindermann
Sören Mindermann@sorenmind·
The 2026 International AI Safety Report is out! I'm honored to have served as the scientific advisor to Yoshua Bengio, alongside over 100 experts. Many of them were nominated by their home countries as well as by the UN, EU and OECD.
Yoshua Bengio@Yoshua_Bengio

Today we’re releasing the International AI Safety Report 2026: the most comprehensive evidence-based assessment of AI capabilities, emerging risks, and safety measures to date. 🧵 (1/17)

English
1
4
24
1.3K
Sören Mindermann
Sören Mindermann@sorenmind·
Researchers in Shanghai just published an eval where agents end-to-end 1) cyber attacked to access a server 2) self-replicated onto the server 3) proliferated from there.
Sören Mindermann tweet media
English
1
5
15
1.1K