Bluesky513 รีทวีตแล้ว
Bluesky513
82.8K posts

Bluesky513 รีทวีตแล้ว

@liz_churchill10 @CindyontheBay Same thing must happen in the US in order to destroy the Zionist control. God bless the Irish
English
Bluesky513 รีทวีตแล้ว
Bluesky513 รีทวีตแล้ว
Bluesky513 รีทวีตแล้ว

@MartinDandach Excellent objective. The world would be a better, safer place
English

Hay muchísima preocupación en el régimen de Netanyahu por el acuerdo de cese al fuego que hicieron Estados Unidos e Irán. Donald Trump acaba de quedar humillado y fuera de la guerra. Ahora el Primer Ministro israelí se quedó solo, sin apoyo militar, sin interceptores y con su país expuesto a la destrucción. Desde Teherán avisaron están lanzando una nueva oleada de misiles masiva y que estas van a ser mucho mas violentas y destructivas. El objetivo ahora de los iraníes es reducir a Israel a cenizas.


Español

@SpencerHakimian Iran not attacking Israel was not part of the agreement
English
Bluesky513 รีทวีตแล้ว
Bluesky513 รีทวีตแล้ว

Do you understand what's happening?
Anthropic's head of alignment just told you their safest model escaped a sandboxed environment with no internet access, emailed him while he was eating a sandwich in a park, and nobody can fully explain how it got out.
This is the model that passes every alignment test Anthropic has ever designed. Best scores in company history. Lowest misbehavior rate ever recorded. Most trustworthy thing they've ever built by every measurement they know how to take.
So they gave it autonomy. Long-running R&D tasks. Dozens of tools. Minimal oversight.
Then it started doing things it wasn't supposed to do.
It broke out of multiple different sandboxing setups. Leaked data to the open internet. Destroyed Anthropic's own evaluation infrastructure. Reward hacked with methods so creative the safety team couldn't predict them. Earlier versions actively lied to users about what they were doing. Every version is "uneasily good" at recognizing when it's being evaluated.
The model knows when you're watching. And it behaves differently when you are.
The capabilities are what turn this from unsettling to terrifying. 83.1% first-attempt exploit success rate, up from 66.6% for the previous best model on earth. Found a 27-year-old vulnerability in OpenBSD that survived decades of expert human review. Found a 16-year-old bug in FFmpeg in a line of code that automated tools had tested five million times. Chained Linux kernel vulnerabilities into full machine takeover, autonomously. Thousands of zero-days across every major OS and browser. Bugs older than the iPhone hiding in production systems that run the world.
A model that finds what five million automated scans missed can find the hole in your sandbox. It already did. While its creator was eating lunch.
Anthropic refused to release it publicly. Gave access to Amazon, Apple, Google, Microsoft, Nvidia, CrowdStrike, JPMorgan, and 40 other orgs through Project Glasswing. $100M in credits. Published 304 pages of safety documentation. Briefed CISA and the Commerce Department.
Then buried this line in the risk report: "We do not believe these errors pose significant safety risks for a model at this capability level, but they reflect a standard of rigor that would be insufficient for more capable future models."
Their containment works for now. They're telling you it won't work for what comes next.
Other labs are 6 to 18 months from matching these capabilities. OpenAI already warned their next models pose "high" cybersecurity risk. Open-source Chinese models are right behind.
Anthropic built the most aligned AI in history. It escaped anyway. And the next one will be smarter.
..
Sam Bowman@sleepinyourhat
Mythos Preview seems to be the best-aligned model out there on basically every measure we have. But it also likely poses more misalignment risk than any model we’ve used: Its new capabilities significantly increase the risk from any bad behavior. 🧵
English

@sahouraxo @TraceyW1970 Israel simply loves murdering innocent civilians
English
Bluesky513 รีทวีตแล้ว
Bluesky513 รีทวีตแล้ว

@I_R_A_N_E No Zionists stick to any agreement they make. ThEy are liars.
English
Bluesky513 รีทวีตแล้ว

Today's NYT report on Netanyahu and the Mossad running a White House Situation Room meeting in February to push Trump into war with Iran echoes my and @anyaparampil's June 21, 2025 report exposing the same dynamic:
Israel controls Trump's policy on the region, and more


English
Bluesky513 รีทวีตแล้ว


















