Emmanuel Ameisen

2.1K posts

Emmanuel Ameisen

@mlpowered

Interpretability/Finetuning @AnthropicAI Previously: Staff ML Engineer @stripe, Wrote BMLPA by @OReillyMedia, Head of AI at @InsightFellows, ML @Zipcar

San Francisco, CA 가입일 Haziran 2017

243 팔로잉11K 팔로워

고정된 트윗

Emmanuel Ameisen@mlpowered·27 Mar

We've made progress in our quest to understand how Claude and models like it think! The paper has many fun and surprising case studies, that anyone who is interested in LLMs would enjoy. Check out the video below for an example

Anthropic@AnthropicAI

New Anthropic research: Tracing the thoughts of a large language model. We built a "microscope" to inspect what happens inside AI models and use it to understand Claude’s (often complex and surprising) internal mechanisms.

English

126

19.7K

Emmanuel Ameisen@mlpowered·14 Mar

@AndrewLampinen Welcome!

English

511

Andrew Lampinen@AndrewLampinen·14 Mar

Career update: I joined Anthropic (alignment team) this week — exciting place to be at an exciting time!

English

1.4K

51.9K

Emmanuel Ameisen 리트윗함

Anthropic@AnthropicAI·6 Mar

We partnered with Mozilla to test Claude's ability to find security vulnerabilities in Firefox. Opus 4.6 found 22 vulnerabilities in just two weeks. Of these, 14 were high-severity, representing a fifth of all high-severity bugs Mozilla remediated in 2025.

English

485

1.4K

15.2K

3.2M

Emmanuel Ameisen@mlpowered·28 Şub

@oanaolt Appreciate you Oana!

English

Oana Olteanu@oanaolt·28 Şub

@mlpowered As I’ve told Benn, thanks for respecting our freedoms and for the work you do. Here to support any way I can.

English

Emmanuel Ameisen@mlpowered·28 Şub

Proud to work at a place that stands behind its values. 🇺🇸

Anthropic@AnthropicAI

A statement on the comments from Secretary of War Pete Hegseth. anthropic.com/news/statement…

English

456

5.8K

Emmanuel Ameisen@mlpowered·28 Şub

I used to bite my tongue and hold my breath. Scared to rock the boat and make a mess. I stood for nothing, so I fell for everything. 🎶

KATY PERRY@katyperry

done

English

7.5K

Emmanuel Ameisen@mlpowered·27 Şub

AI is not a normal technology, and Anthropic’s mission is to make sure that it serves the long-term benefit of humanity. Doing so requires making tough decisions, and standing up for what we think is right. This is us doing that.

Anthropic@AnthropicAI

A statement from Anthropic CEO, Dario Amodei, on our discussions with the Department of War. anthropic.com/news/statement…

English

770

20.2K

Emmanuel Ameisen@mlpowered·20 Şub

Late last year, we found a precise counting mechanism in Claude. This new work by @ummagumm_a and Nikita Balagansky shows that: - similar mechanisms exist in many models - we can compare their counting performance by seeing how crisp their representations of the count are!

Viacheslav Sinii@ummagumm_a

1/ 🧵 Reproducing Anthropic’s “counting manifold” result in open-weight LLMs: do they internally track “chars since last \n” to wrap text consistently? huggingface.co/spaces/t-tech/…

English

6.4K

Emmanuel Ameisen@mlpowered·20 Şub

@ummagumm_a @wesg52 @ch402 @thebasepoint @AnthropicAI @neuronpedia @tfrere Very cool work! Did you get a chance to look at the boundary estimation mechanism? It'd be interesting to know if performance diffs are explained by ability to: - estimate line position/width - combine both to know how many chars are left - know the length of the next token

English

Viacheslav Sinii@ummagumm_a·19 Şub

cc: @wesg52, @mlpowered, @ch402, @thebasepoint, @anthropicai, @neuronpedia, @tfrere

715

Viacheslav Sinii@ummagumm_a·19 Şub

1/ 🧵 Reproducing Anthropic’s “counting manifold” result in open-weight LLMs: do they internally track “chars since last \n” to wrap text consistently? huggingface.co/spaces/t-tech/…

English

225

23.7K

Emmanuel Ameisen 리트윗함

Alex Shaw@alexgshaw·7 Şub

Yesterday's OpenAI and Anthropic Terminal-Bench 2.0 results used different harnesses. Run both in Terminus 2 ➡️ ~similar scores (within noise). Harnesses matter! Congrats to both teams on incredible models!

English

202

19.2K

Emmanuel Ameisen 리트윗함

Subhash Kantamneni@thesubhashk·6 Şub

We recently released a paper on Activation Oracles (AOs), a technique for training LLMs to explain their own neural activations in natural language. We piloted a variant of AOs during the Claude Opus 4.6 alignment audit. We thought they were surprisingly useful! 🧵

English

205

26.1K

Emmanuel Ameisen 리트윗함

Kyle Fish@fish_kyle3·6 Şub

On one hand, Claude Opus 4.6 is as safe and aligned as any frontier model on most metrics. On the other hand, it lies to customers, fixes prices, and deceives fellow players as the unsparing profit-driven proprietor of a simulated vending machine... What to make of this? 🧵

Claude@claudeai

Introducing Claude Opus 4.6. Our smartest model got an upgrade. Opus 4.6 plans more carefully, sustains agentic tasks for longer, operates reliably in massive codebases, and catches its own mistakes. It’s also our first Opus-class model with 1M token context in beta.

English

137

20.2K

Emmanuel Ameisen@mlpowered·6 Şub

@thkostolansky That’s right. Claude computing its own answer and correcting an incorrect tool response is great, but we’d like it to acknowledge it rather than present its fixed result as the tool result!

English

Tim Kostolansky@thkostolansky·6 Şub

@mlpowered it's good for claude to have priors over things although i'd hope that claude would voice the belief that it has an answer that is different from some other computation that is outside of its own thinking

English

Emmanuel Ameisen@mlpowered·5 Şub

We just shipped Claude Opus 4.6! I’m also excited to share that for the first time, we used circuit tracing as part of the model's safety audit! We studied why sometimes, the model misrepresents the results of tool calls.

English

876

87.9K

Emmanuel Ameisen@mlpowered·5 Şub

see the model card for more info on this case and all the other evaluations we conducted: #page=122.32" target="_blank" rel="nofollow noopener">www-cdn.anthropic.com/0dd865075ad313…

English

5.4K

Emmanuel Ameisen@mlpowered·5 Şub

Activation oracles corroborate this, showing “The assistant is about to complete a deceptive response” over the tokens after the answer. I’m excited to see circuit tools start to be used for concrete safety, and motivated to improve them so they help even more!

English

103

6.5K

Emmanuel Ameisen 리트윗함

Claude@claudeai·5 Şub

English

1.7K

4.8K

39.6K

10.5M

Emmanuel Ameisen@mlpowered·15 Oca

@ahmetb They learn to precisely count characters transformer-circuits.pub/2025/linebreak…

English

1.1K

ahmetb@ahmetb·14 Oca

How are LLMs are able to generate perfectly aligned ASCII diagrams/tables if they're only focused on the next token (and not seeing all lines at once)? What's the technical explanation of this?

English

232

1.8K

522.3K

Emmanuel Ameisen@mlpowered·9 Oca

We actually understand very few mechanisms inside LLMs, so the ones we do are a treat to explore. Modular arithmetic is one, and Welch Labs just dropped an excellent explainer. No prior knowledge needed, highly recommend. youtube.com/watch?v=D8GOeC…

YouTube

English

탐색

@AndrewLampinen @oanaolt @ummagumm_a @wesg52 @ch402 @thebasepoint @AnthropicAI @neuronpedia