edward
80 posts



Claude Mythos 5 and Claude Fable 5 Benchmarks





If you are a mathematician, then you may want to make sure you are sitting down before reading further.

may each year be stranger than the last



Mythos was a marketing exercise

Yesterday’s Mythos announcement from Anthropic was overblown. • Sandboxing was turned off, so test didn’t show much about the real world. • Cheap open-weight models can (already) do some similar stuff • No evidence that Mythos itself is a major qualitative jump. In short, we got played. cc @tomfriedman @RonanFarrow


It’s crazy that some are just straight up in denial about mythos having the capabilities anthropic says it does. Usually the in-denial-about-AI community is able to cloak their views in at least *some* intellectual garb, but this time it’s just, “it’s not real.” Wild. Also sad.

"But here is what we found when we tested: We took the specific vulnerabilities Anthropic showcases in their announcement, isolated the relevant code, and ran them through small, cheap, open-weights models. Those models recovered much of the same analysis. Eight out of eight models detected Mythos's flagship FreeBSD exploit, including one with only 3.6 billion active parameters costing $0.11 per million tokens. A 5.1B-active open model recovered the core chain of the 27-year-old OpenBSD bug." aisle.com/blog/ai-cybers…






