
Kit Fraser-Taliente
9 posts

Kit Fraser-Taliente
@KitF_T
meading rinds at @anthropicai



New Anthropic research: Natural Language Autoencoders. Models like Claude talk in words but think in numbers. The numbers—called activations—encode Claude’s thoughts, but not in a language we can read. Here, we train Claude to translate its activations into human-readable text.


The headline is Opus 4.6 scores 69% for ~$3.50/task on ARC v2. This up +30pp from Opus 4.5. We attribute performance to the new "max" mode and 2X reasoning token budget -- notably task cost is held steady. Based on early field reports and other benchmark scores like SWE Bench, we speculate this is a smaller model (maybe Sonnet-ish?) that runs thinking for longer. If true, ARC v2 is measuring the "CoT search" complexity capability of the AI reasoning system, independent of model knowledge. Pretty cool! To get a sense of the complexity limit, here are all the v2 tasks Opus 4.6 failed to solve: arcprize.org/tasks/?dataset…








