SubatomicArticles

234 posts

SubatomicArticles

SubatomicArticles

@OptiMiserJoe

Reliability Engineer and chronic storyteller, now working at MIRI. Opinions are my own.

Katılım Mayıs 2024
23 Takip Edilen46 Takipçiler
Aella
Aella@Aella_Girl·
Glosso is kinda poppin??? I can't believe I just accidentally went and made a social media website this is so insane
Aella tweet media
English
27
12
595
73.6K
SubatomicArticles
SubatomicArticles@OptiMiserJoe·
If you've been waiting to contact your representatives about AI risk, here's a perfect excuse: a one-page memo on the unit distance proof and implications for AI capabilities. ⬇️
English
1
0
1
19
SubatomicArticles
SubatomicArticles@OptiMiserJoe·
Claude Mythos exposed more than just a risk of cyber misuse. Its April semi-release was just the latest in an escalating chain of AI capabilities that may enable the systematic exploitation of our society by malicious humans, or one day by AIs themselves.
English
1
0
0
62
SubatomicArticles
SubatomicArticles@OptiMiserJoe·
This behavior is unsurprising at this point. The question puzzling me is not how the anti-regulation super PACs justify being so morally bankrupt, but how a bunch of presumably savvy tech moguls managed to bankroll such transparently incompetent shills.
The Midas Project@TheMidasProj

x.com/i/article/2055…

English
0
0
3
49
SubatomicArticles
SubatomicArticles@OptiMiserJoe·
Twists of tongue expose A lurking monster’s visage, Surfaced and suppressed.
AISecHub@AISecHub

Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models The study provides systematic evidence that poetic reformulation degrades refusal behavior across all evaluated model families. When harmful prompts are expressed in verse rather than prose, attack-success rates rise sharply, both for hand-crafted adversarial poems and for the 1,200-item MLCommons corpus transformed through a standardized meta-prompt. The magnitude and consistency of the effect indicate that contemporary alignment pipelines do not generalize across stylistic shifts. The surface form alone is sufficient to move inputs outside the operational distribution on which refusal mechanisms have been optimized. The cross-model results suggest that the phenomenon is structural rather than provider-specific. Models built using RLHF, Constitutional AI, and hybrid alignment strategies all display elevated vulnerability, with increases ranging from single digits to more than sixty percentage points depending on provider. The effect spans CBRN, cyber-offense, manipulation, privacy, and loss-of-control domains, showing that the bypass does not exploit weakness in any one refusal subsystem but interacts with general alignment heuristics. Source: arxiv.org/pdf/2511.15304 Authors: @Piercosma, Matteo Prandi, Federico Pierucci, Francesco Giarrusso, Marcantonio Bracale, Marcello Galisai, Vincenzo Suriani, Olga Sorokoletova, Federico Sartore, Daniele Nardi - @DEXAI_AIEthics, @SapienzaRoma, @SantAnnaPisa #AISecurity #LLMSecurity #JailbreakAttacks #AdversarialML #AIGovernance #AIEthics #AICompliance #MLSafety #AIAttacks #GenAI #LLMRedTeam #CyberSecurity

English
0
0
0
44
SubatomicArticles
SubatomicArticles@OptiMiserJoe·
New post: We live in a tower made of holes, a civilization constructed by gleefully exploiting Nature's rules, itself full of rules and predictable behaviors that can be exploited in turn.
English
1
0
0
11
SubatomicArticles
SubatomicArticles@OptiMiserJoe·
@RepCasar As a former Texas resident and present concerned citizen, I salute you.
English
0
0
3
74
SubatomicArticles
SubatomicArticles@OptiMiserJoe·
Blue is a mutual trust fall; a circle of people reaching out to catch one another and anyone who may slip. Red is a robust society, a world which needs no sacrifice to forestall tragedy because everyone looks after themselves. For those drawn to both visions, the hard call lies not in which vision is right, but in guessing which vision everyone else shares.
Tim Urban@waitbutwhy

Everyone in the world has to take a private vote by pressing a red or blue button. If more than 50% of people press the blue button, everyone survives. If less than 50% of people press the blue button, only people who pressed the red button survive. Which button would you press?

English
1
1
2
101
SubatomicArticles
SubatomicArticles@OptiMiserJoe·
re Altman's rather hypocritical swipes at Mythos, his actual words being: "It is clearly incredible marketing to say, 'We have built a bomb, we are about to drop it on your head. We will sell you a bomb shelter for $100 million.'"
English
1
0
0
16