Kit Fraser-Taliente

4 posts

Kit Fraser-Taliente

Kit Fraser-Taliente

@KitF_T

meading rinds at @anthropicai

Bergabung Haziran 2016
692 Mengikuti100 Pengikut
Lisan al Gaib
Lisan al Gaib@scaling01·
@mikeknoop putting it out there like this increases the chances that they comment on it
English
1
0
17
2K
Kit Fraser-Taliente me-retweet
Emmanuel Ameisen
Emmanuel Ameisen@mlpowered·
We just shipped Claude Opus 4.6! I’m also excited to share that for the first time, we used circuit tracing as part of the model's safety audit! We studied why sometimes, the model misrepresents the results of tool calls.
Emmanuel Ameisen tweet media
English
30
47
876
87.9K
Kit Fraser-Taliente me-retweet
Subhash Kantamneni
Subhash Kantamneni@thesubhashk·
We recently released a paper on Activation Oracles (AOs), a technique for training LLMs to explain their own neural activations in natural language. We piloted a variant of AOs during the Claude Opus 4.6 alignment audit. We thought they were surprisingly useful! 🧵
Subhash Kantamneni tweet media
English
11
34
205
26.1K
tensorqt
tensorqt@tensorqt·
transformers feel too soft to do reasoning well internally. Reasoning is about uncovering very rigid structures. I wonder how using a fully discrete attention matrix (so basically a regular adjacency matrix) in some of the heads impacts this
English
7
0
27
2K