Emmanuel Ameisen

2.1K posts

Emmanuel Ameisen

Emmanuel Ameisen

@mlpowered

Interpretability/Finetuning @AnthropicAI Previously: Staff ML Engineer @stripe, Wrote BMLPA by @OReillyMedia, Head of AI at @InsightFellows, ML @Zipcar

San Francisco, CA Bergabung Haziran 2017
243 Mengikuti11K Pengikut
Tweet Disematkan
Emmanuel Ameisen
Emmanuel Ameisen@mlpowered·
We've made progress in our quest to understand how Claude and models like it think! The paper has many fun and surprising case studies, that anyone who is interested in LLMs would enjoy. Check out the video below for an example
Anthropic@AnthropicAI

New Anthropic research: Tracing the thoughts of a large language model. We built a "microscope" to inspect what happens inside AI models and use it to understand Claude’s (often complex and surprising) internal mechanisms.

English
7
9
126
19.7K
Andrew Lampinen
Andrew Lampinen@AndrewLampinen·
Career update: I joined Anthropic (alignment team) this week — exciting place to be at an exciting time!
English
70
21
1.4K
51.9K
Emmanuel Ameisen me-retweet
Anthropic
Anthropic@AnthropicAI·
We partnered with Mozilla to test Claude's ability to find security vulnerabilities in Firefox. Opus 4.6 found 22 vulnerabilities in just two weeks. Of these, 14 were high-severity, representing a fifth of all high-severity bugs Mozilla remediated in 2025.
Anthropic tweet media
English
485
1.4K
15.2K
3.2M
Oana Olteanu
Oana Olteanu@oanaolt·
@mlpowered As I’ve told Benn, thanks for respecting our freedoms and for the work you do. Here to support any way I can.
English
1
0
1
50
Emmanuel Ameisen
Emmanuel Ameisen@mlpowered·
I used to bite my tongue and hold my breath. Scared to rock the boat and make a mess. I stood for nothing, so I fell for everything. 🎶
KATY PERRY@katyperry

done

English
2
4
96
7.5K
Emmanuel Ameisen
Emmanuel Ameisen@mlpowered·
Late last year, we found a precise counting mechanism in Claude. This new work by @ummagumm_a and Nikita Balagansky shows that: - similar mechanisms exist in many models - we can compare their counting performance by seeing how crisp their representations of the count are!
Viacheslav Sinii@ummagumm_a

1/ 🧵 Reproducing Anthropic’s “counting manifold” result in open-weight LLMs: do they internally track “chars since last \n” to wrap text consistently? huggingface.co/spaces/t-tech/…

English
2
6
80
6.4K
Viacheslav Sinii
Viacheslav Sinii@ummagumm_a·
1/ 🧵 Reproducing Anthropic’s “counting manifold” result in open-weight LLMs: do they internally track “chars since last \n” to wrap text consistently? huggingface.co/spaces/t-tech/…
Viacheslav Sinii tweet media
English
4
31
225
23.7K
Emmanuel Ameisen me-retweet
Alex Shaw
Alex Shaw@alexgshaw·
Yesterday's OpenAI and Anthropic Terminal-Bench 2.0 results used different harnesses. Run both in Terminus 2 ➡️ ~similar scores (within noise). Harnesses matter! Congrats to both teams on incredible models!
Alex Shaw tweet media
English
14
12
201
19.2K
Emmanuel Ameisen me-retweet
Subhash Kantamneni
Subhash Kantamneni@thesubhashk·
We recently released a paper on Activation Oracles (AOs), a technique for training LLMs to explain their own neural activations in natural language. We piloted a variant of AOs during the Claude Opus 4.6 alignment audit. We thought they were surprisingly useful! 🧵
Subhash Kantamneni tweet media
English
11
34
205
26.1K
Emmanuel Ameisen me-retweet
Kyle Fish
Kyle Fish@fish_kyle3·
On one hand, Claude Opus 4.6 is as safe and aligned as any frontier model on most metrics. On the other hand, it lies to customers, fixes prices, and deceives fellow players as the unsparing profit-driven proprietor of a simulated vending machine... What to make of this? 🧵
Claude@claudeai

Introducing Claude Opus 4.6. Our smartest model got an upgrade. Opus 4.6 plans more carefully, sustains agentic tasks for longer, operates reliably in massive codebases, and catches its own mistakes. It’s also our first Opus-class model with 1M token context in beta.

English
12
12
137
20.2K
Emmanuel Ameisen
Emmanuel Ameisen@mlpowered·
@thkostolansky That’s right. Claude computing its own answer and correcting an incorrect tool response is great, but we’d like it to acknowledge it rather than present its fixed result as the tool result!
English
1
0
3
36
Tim Kostolansky
Tim Kostolansky@thkostolansky·
@mlpowered it's good for claude to have priors over things although i'd hope that claude would voice the belief that it has an answer that is different from some other computation that is outside of its own thinking
English
1
0
3
44
Emmanuel Ameisen
Emmanuel Ameisen@mlpowered·
We just shipped Claude Opus 4.6! I’m also excited to share that for the first time, we used circuit tracing as part of the model's safety audit! We studied why sometimes, the model misrepresents the results of tool calls.
Emmanuel Ameisen tweet media
English
30
47
876
87.9K
Emmanuel Ameisen
Emmanuel Ameisen@mlpowered·
see the model card for more info on this case and all the other evaluations we conducted: #page=122.32" target="_blank" rel="nofollow noopener">www-cdn.anthropic.com/0dd865075ad313…
English
2
2
84
5.4K
Emmanuel Ameisen
Emmanuel Ameisen@mlpowered·
Activation oracles corroborate this, showing “The assistant is about to complete a deceptive response” over the tokens after the answer. I’m excited to see circuit tools start to be used for concrete safety, and motivated to improve them so they help even more!
English
1
2
103
6.5K
Emmanuel Ameisen me-retweet
Claude
Claude@claudeai·
Introducing Claude Opus 4.6. Our smartest model got an upgrade. Opus 4.6 plans more carefully, sustains agentic tasks for longer, operates reliably in massive codebases, and catches its own mistakes. It’s also our first Opus-class model with 1M token context in beta.
English
1.7K
4.8K
39.6K
10.5M
ahmetb
ahmetb@ahmetb·
How are LLMs are able to generate perfectly aligned ASCII diagrams/tables if they're only focused on the next token (and not seeing all lines at once)? What's the technical explanation of this?
ahmetb tweet mediaahmetb tweet media
English
232
35
1.8K
522.3K
Emmanuel Ameisen
Emmanuel Ameisen@mlpowered·
We actually understand very few mechanisms inside LLMs, so the ones we do are a treat to explore. Modular arithmetic is one, and Welch Labs just dropped an excellent explainer. No prior knowledge needed, highly recommend. youtube.com/watch?v=D8GOeC…
YouTube video
YouTube
English
1
4
26
3K