Patrick Butlin
27 posts


Our research is complementary with Anthropic's concurrent work on emotion concepts (transformer-circuits.pub/2026/emotions/…); we used a different method to extract evaluative representations and studied how they interact with varying personas.
English

Another exciting @MATSprogram paper, this time from the brilliant @gilg_oscar. We found a direction in LLMs that apparently performs a persona-relative evaluative function in some very different contexts.
Oscar Gilg@gilg_oscar
First preprint! Working with @patrickbutlin during @MATSprogram. LLM Assistant personas like being helpful, evil personas like being harmful. We found that a single direction represents helping as good under the Assistant, and ‘harm’ as good under evil.
English

Many thanks to @MATSprogram for making our collaboration possible - and look out for another paper, with the equally excellent @gilg_oscar, coming soon!
English

I'm proud to announce this new paper with my fantastic @MATSprogram fellow @BeckmannPierre, on personas and LLM individuation.
Pierre Beckmann@BeckmannPierre
New paper with @PatrickButlin, from my time at @MATSprogram . We propose two new candidates for LLM individuation: the (virtual) instance-persona view and the model-persona view. 🧵
English

5. 'Higher-order representation in AI' (unfortunately slightly dated already): philosophymindscience.org/index.php/phim…
English

1. 'Desire in AI': philarchive.org/rec/BUTDIA
2. 'Are any machines conscious today?': philarchive.org/rec/BUTAAM-2
3. 'Testing for consciousness in current AI': philarchive.org/rec/BUTTFC
4. 'Consciousness and AI' encyclopaedia entry: oecs.mit.edu/pub/zf1nbs6d/
English

Many thanks to the editor and reviewers for @TrendsCognSci and especially to my co-authors, including @rgblong @Yoshua_Bengio @birchlse @davidchalmers42 @ConstantAxel @georgejwdeane @EricElmoznino @kanair @MatthiasMichel_ @Liad_Mudrik @meganakpeters @eschwitz and others!
English

The new paper is here: sciencedirect.com/science/articl…
English
Patrick Butlin retweetledi

We're thrilled to announce the first Eleos Conference on AI Consciousness and Welfare.
Join us Nov 21-23, 2025 in Berkeley, CA for discussions on AI welfare with leading researchers from @nyuniversity, @Google, @AnthropicAI, & more.

English
Patrick Butlin retweetledi

New Article: "Principles for Responsible AI Consciousness Research" by Butlin and Lappas jair.org/index.php/jair…
English

The questions I’ll be working on at Eleos, about the conditions for consciousness and the grounds of moral status, are deeply interesting and important. I’m looking forward to renewing my collaboration with @rgblong and continuing to build the community of AI welfare researchers.
English

I’m happy to announce that I’m joining @eleosai. At Eleos, I’ll continue my work on the philosophy and science of AI minds and moral status. [1/3]
English
