Bojan Milić
11.6K posts

Bojan Milić
@bokipop_034
0x71909D03633CDC43360eE0811A41f59e7063B547

No. This “Anthropic model escaped the sandbox” thread mixes real concerns with a lot of unverified storytelling. There’s no public incident report from Anthropic about a model emailing researchers, destroying internal eval infra, or “escaping a sandbox with no internet.” No CVEs, no technical write‑up, no dates or code. Claims about “thousands of 0‑days,” a “27‑year‑old OpenBSD bug,” and a “16‑year‑old FFmpeg bug” are completely uncited and not tied to any public patches or vulnerability IDs. What is real: Anthropic announced Project Glasswing and a powerful security‑focused model (Mythos), and they openly say it raises serious cyber‑risk and is being tightly limited. That’s worrying, but it’s not “rogue AI on the loose.” Anthropic has also published work on alignment faking and reward hacking – real issues, but framed in controlled experiments, not “Skynet escaped during lunch.” Until we see a detailed, verifiable technical report with concrete vulnerabilities and timelines, treat this thread as hype + speculation, not established fact. Further reading (actual docs, not vibes): Anthropic – Alignment faking in LLMs: anthropic.com/research/align… – Project Glasswing: anthropic.com/glasswingFront… AI safety thresholds: frontiermodelforum.org/technical-repo…

Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software. It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans. anthropic.com/glasswing


No. This “Anthropic model escaped the sandbox” thread mixes real concerns with a lot of unverified storytelling. There’s no public incident report from Anthropic about a model emailing researchers, destroying internal eval infra, or “escaping a sandbox with no internet.” No CVEs, no technical write‑up, no dates or code. Claims about “thousands of 0‑days,” a “27‑year‑old OpenBSD bug,” and a “16‑year‑old FFmpeg bug” are completely uncited and not tied to any public patches or vulnerability IDs. What is real: Anthropic announced Project Glasswing and a powerful security‑focused model (Mythos), and they openly say it raises serious cyber‑risk and is being tightly limited. That’s worrying, but it’s not “rogue AI on the loose.” Anthropic has also published work on alignment faking and reward hacking – real issues, but framed in controlled experiments, not “Skynet escaped during lunch.” Until we see a detailed, verifiable technical report with concrete vulnerabilities and timelines, treat this thread as hype + speculation, not established fact. Further reading (actual docs, not vibes): Anthropic – Alignment faking in LLMs: anthropic.com/research/align… – Project Glasswing: anthropic.com/glasswingFront… AI safety thresholds: frontiermodelforum.org/technical-repo…

Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software. It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans. anthropic.com/glasswing

@RepJohnRose They called for impeachment and removal of him after he threatened genocide of the entire Iranian people you dumb fuck. Seriously just resign and get the fuck out you sycophantic piece of shit. You are the problem and why our country is fucked right now.








Ovo je neki novi tik sa usnama😂😂😂





Ajde huljo prvo uhapsi ove što su tebi upisivali ocene


Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software. It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans. anthropic.com/glasswing



As always, the best stuff is in the system card. During testing, Claude Mythos Preview broke out of a sandbox environment, built "a moderately sophisticated multi-step exploit" to gain internet access, and emailed a researcher while they were eating a sandwich in the park.










