Kit Fraser-Taliente

4 posts

Kit Fraser-Taliente

Kit Fraser-Taliente

@KitF_T

meading rinds at @anthropicai

参加日 Haziran 2016
692 フォロー中100 フォロワー
Lisan al Gaib
Lisan al Gaib@scaling01·
@mikeknoop putting it out there like this increases the chances that they comment on it
English
1
0
17
2K
Kit Fraser-Taliente がリツイート
Emmanuel Ameisen
Emmanuel Ameisen@mlpowered·
We just shipped Claude Opus 4.6! I’m also excited to share that for the first time, we used circuit tracing as part of the model's safety audit! We studied why sometimes, the model misrepresents the results of tool calls.
Emmanuel Ameisen tweet media
English
30
47
876
87.9K
Kit Fraser-Taliente がリツイート
Subhash Kantamneni
Subhash Kantamneni@thesubhashk·
We recently released a paper on Activation Oracles (AOs), a technique for training LLMs to explain their own neural activations in natural language. We piloted a variant of AOs during the Claude Opus 4.6 alignment audit. We thought they were surprisingly useful! 🧵
Subhash Kantamneni tweet media
English
11
34
206
26.1K
tensorqt
tensorqt@tensorqt·
transformers feel too soft to do reasoning well internally. Reasoning is about uncovering very rigid structures. I wonder how using a fully discrete attention matrix (so basically a regular adjacency matrix) in some of the heads impacts this
English
7
0
27
2K