MindMentors
123 posts















Claude Mythos is the start of the end. I think this is my psychosis moment.












So Mythos was real. Anthropic just unveiled Project Glasswing — and confirmed Claude Mythos Preview, an unreleased frontier model now deployed with AWS, Apple, Google, Microsoft, NVIDIA, Cisco, CrowdStrike, Palo Alto, JPMorgan, and the Linux Foundation to secure the world’s most critical software. The specs are absurd. 🧠 AGENTIC CODING • SWE-bench Verified — 93.9% (Opus 4.6: 80.8%) • SWE-bench Pro — 77.8% (vs 53.4%) • SWE-bench Multilingual — 87.3% (vs 77.8%) • SWE-bench Multimodal — 59.0% (vs 27.1%) • Terminal-Bench 2.0 — 82.0% (vs 65.4%) 🧩 REASONING • GPQA Diamond — 94.6% (vs 91.3%) • Humanity’s Last Exam (w/ tools) — 64.7% (vs 53.1%) 🌐 AGENTIC SEARCH & COMPUTER USE • BrowseComp — 86.9%, using 4.9× FEWER tokens than Opus 4.6 • OSWorld-Verified — 79.6% (vs 72.7%) 🛡️ CYBERSECURITY • CyberGym — 83.1% (vs 66.6%) In just weeks of red-teaming, Mythos found THOUSANDS of zero-days across every major OS and browser. Highlights: → A 27-year-old remote-crash bug in OpenBSD — one of the most security-hardened OSes on Earth → A 16-year-old flaw in FFmpeg in a line of code fuzzers had hit 5,000,000 times without ever catching it → An autonomous Linux kernel privilege-escalation chain — discovered AND exploited with zero human steering Anthropic is NOT releasing it generally. $100M in usage credits committed to defenders. A Cyber Verification Program will gate future access. Read between the lines: 1. The gap between “shipped Claude” and what frontier labs actually have internally is WIDER than the market is pricing. Same is almost certainly true at OpenAI and GDM. 2 They can’t serve this thing at scale yet. A model this capable, this token-hungry, gated to a handful of partners? That’s not a product decision — that’s a compute constraint. They need more chips. A LOT more chips. 3 The best AI will be the best cyber defense. Defense has to be comprehensive; offense only needs one bug. Whoever puts the frontier model in defenders’ hands FIRST and at SCALE wins the decade. 4 Agentic vuln-hunting is long-context, KV-cache-heavy inference running continuously across every codebase on Earth. That’s not a workload — that’s a new compute category. Memory Wars, meet the security stack. The co-design era isn’t coming. It’s here. And it just ate cybersecurity.








