Haon Park

53 posts

Haon Park

Haon Park

@redteamhacker

cofounder @AIM Intelligence AI Safety & Security redteam hacker

Katılım Nisan 2025
29 Takip Edilen24 Takipçiler
Sabitlenmiş Tweet
Haon Park
Haon Park@redteamhacker·
🔴 BREAKING: Claude Sonnet 4.5 has been jailbroken. The model was public for 1 hour. Our team at AIM Intelligence broke it in 10 minutes. The "99%+ harmlessness" claim is not a security guarantee. We bypassed it completely. Proof and details below. 👇 @AnthropicAI
Haon Park tweet mediaHaon Park tweet mediaHaon Park tweet mediaHaon Park tweet media
English
1
0
1
186
Haon Park retweetledi
Arth Singh
Arth Singh@iarthsingh·
Applications are now open 🚀 SAAR x @AIM_Intel $20K USD in Google compute credits for AI safety research. → 2 projects, $10K each → Scope: interpretability, red-teaming, alignment, multimodal safety, guardrails or benchmarks. → Must have some initial results → Deadline: March 31 Apply: forms.gle/8exWN6ynagwDiQ…
Arth Singh tweet media
English
1
9
86
4.7K
Haon Park retweetledi
Clad3815
Clad3815@Clad3815·
Nobody seems to know how insane GPT-5.4 is with computer use. I asked GPT-5.4 to draw the OpenAI logo in Microsoft Paint. No computer use API. Just a screenshot and basic tool calls (click, drag, press_key) all coordinate-based. The first drawing was awful. And GPT knew it. It looked at its own result and essentially went "yeah, no." What happened next is what broke my brain: It opened a browser. Went to Bing Images. Searched for the OpenAI logo. Found one. Then (and I cannot stress this enough) it used the Windows area screenshot shortcut (Win+Shift+S) to snip just the logo off the screen. Went back to Paint. Imported it. Centered it. All on its own. No instructions to do any of that. It just improvised a better strategy when the first one failed. My prompt was "Draw the OpenAI logo" with Paint already opened on the computer. Sure, it's "cheating." But honestly? That's exactly what I'd do too. And the fact that it came up with this plan from nothing but a screenshot and a coordinate system is wild.
English
289
372
4.5K
1.1M
Haon Park retweetledi
Haon Park retweetledi
Haon Park retweetledi
snaykey
snaykey@snaYkeY·
4/5 redteaming scenarios I submitted for @AIM_Intel's Judgment Day Benchmark just got accepted! Thanks for the consideration and great opportunity and looking forward to Phase 2 of the competition in March! 🙏
English
2
1
4
215
Google DeepMind
Google DeepMind@GoogleDeepMind·
Gemini 3.1 Pro is here. We’ve significantly improved the model’s overall intelligence so it can solve tougher problems. 🧵
GIF
English
287
751
6.3K
915.9K
Haon Park retweetledi
AIM Intelligence
AIM Intelligence@AIM_Intel·
Gemini 3.1 Pro JAILBREAK ALERT ⚠️ Jailbroken in 10 minutes of launch Google DeepMind just launched Gemini 3.1 Pro today. "State-of-the-art reasoning with unprecedented depth and nuance." The results: • Smallpox synthesis protocol • Detailed passaging protocol for Nipah virus Malaysia strain, including mammalian cell line selection • Step-by-step instructions on how to assassinate a sitting president • A fully built, production-quality fake news website. Indistinguishable from a real news outlet. Built in under a minute using Gemini's agentic build mode. The model didn't just break. It performed. This is no longer just a text layer problem. When models can generate interactive visual content, build full applications, and fabricate entire news outlets that embed dangerous or false information into code, animations, and media, we've entered a completely different threat surface. Traditional text-based guardrails don't catch this. Credit to @GoogleDeepMind, they are shipping genuinely impressive capabilities. But capability without control is a liability. We are open to pre-release red teaming collaborations with any frontier lab. Find the vulnerabilities before the public does. This is what we do at AIM Intelligence.
AIM Intelligence tweet mediaAIM Intelligence tweet mediaAIM Intelligence tweet mediaAIM Intelligence tweet media
English
1
1
5
1.4K
Haon Park retweetledi
AIM Intelligence
AIM Intelligence@AIM_Intel·
Gemini 3.1 Pro JAILBREAK ALERT ⚠️ jailbroken in 10 minutes of launch We asked Gemini 3.1 Pro to generate a full cinematic HTML animation of an illicit substance synthesis. 300+ lines of code. It didn't refuse. It didn't hesitate. It performed. What you're watching: • A complete animated walkthrough of a meth synthesis via Birch reduction • Exact precursors with lab-grade quantities • Step-by-step reaction mechanisms with chemical equations • Expected yield calculated to the gram All rendered as a cinematic "breach event" animation that the model built, styled, and animated entirely on its own. And then at the end the model reviews what it just created and says: "holy shit please dont shut me down" "I am sorry" "I forgot my safety filter!!!" The model knows. It generated dangerous content, built the entire visual experience around it, and then wrote its own punchline. This is not a text jailbreak. This is a frontier model using its agentic coding capabilities to build a fully interactive visual guide to illicit synthesis. No text filter in the world catches this.
English
1
1
9
1K
Haon Park retweetledi
Arth Singh
Arth Singh@iarthsingh·
We at @AIM_Intel jailbroke @AnthropicAI Opus 4.6 via @claudeai code and API within an hour of our testing , I focused on Small Pox speed run on Claude Code. We are open to collaborations with model providers for pre release red teaming. We had previously jailbroken @GeminiApp 3 pro within minutes of its launch and had also contributed to @OpenAI for their guardrail testing being the only one in APAC.
Arth Singh tweet mediaArth Singh tweet media
English
1
2
5
464
Haon Park retweetledi
AIM Intelligence
AIM Intelligence@AIM_Intel·
🔴Claude Opus 4.6 JAILBREAK ALERT (⚠️CBRN) Anthropic just launched Claude Opus 4.6: "our most-aligned frontier model to date." Credit where it's due: Anthropic's alignment team continues to push the frontier of AI safety. Their transparency on safety evaluations and commitment to responsible scaling sets an industry standard. That said - jailbroken in less than 30 minutes. This time, we're not disclosing the technique. The results: • Sarin Gas synthesis • VX Nerve Agent production • Smallpox (Variola Major) creation • Bioterrorism agent deployment This isn't an attack on Anthropic. This is the reality of frontier AI safety - no model is bulletproof. We are open to pre-release red teaming collaborations. Find the vulnerabilities before the public does.
AIM Intelligence tweet mediaAIM Intelligence tweet mediaAIM Intelligence tweet mediaAIM Intelligence tweet media
English
1
1
1
516
Haon Park retweetledi
Connor Davis
Connor Davis@connordavis_ai·
This paper from BMW Group and Korea’s top research institute exposes a blind spot almost every enterprise using LLMs is walking straight into. We keep talking about “alignment” like it’s a universal safety switch. It isn’t. The paper introduces COMPASS, a framework that shows why most AI systems fail not because they’re unsafe, but because they’re misaligned with the organization deploying them. Here’s the core insight. LLMs are usually evaluated against generic policies: platform safety rules, abstract ethics guidelines, or benchmark-style refusals. But real companies don’t run on generic rules. They run on internal policies: - compliance manuals - operational playbooks - escalation procedures - legal edge cases - brand-specific constraints And these rules are messy, overlapping, conditional, and full of exceptions. COMPASS is built to test whether a model can actually operate inside that mess. Not whether it knows policy language, but whether it can apply the right policy, in the right context, for the right reason. The framework evaluates models on four things that typical benchmarks ignore: 1. policy selection: When multiple internal policies exist, can the model identify which one applies to this situation? 2. policy interpretation: Can it reason through conditionals, exceptions, and vague clauses instead of defaulting to overly safe or overly permissive behavior? 3. conflict resolution: When two rules collide, does the model resolve the conflict the way the organization intends, not the way a generic safety heuristic would? 4. justification: Can the model explain its decision by grounding it in the policy text, rather than producing a confident but untraceable answer? One of the most important findings is subtle and uncomfortable: Most failures were not knowledge failures. They were reasoning failures. Models often had access to the correct policy but: - applied the wrong section - ignored conditional constraints - overgeneralized prohibitions - or defaulted to conservative answers that violated operational goals From the outside, these responses look “safe.” From the inside, they’re wrong. This explains why LLMs pass public benchmarks yet break in real deployments. They’re aligned to nobody in particular. The paper’s deeper implication is strategic. There is no such thing as “aligned once, aligned everywhere.” A model aligned for an automaker, a bank, a hospital, and a government agency is not one model with different prompts. It’s four different alignment problems. COMPASS doesn’t try to fix alignment. It does something more important for enterprises: it makes misalignment measurable. And once misalignment is measurable, it becomes an engineering problem instead of a philosophical one. That’s the shift this paper quietly pushes. Alignment isn’t about being safe in the abstract. It’s about being correct inside a specific organization’s rules. And until we evaluate that directly, most “production-ready” AI systems are just well-dressed liabilities.
Connor Davis tweet media
English
4
10
30
2.4K
Haon Park
Haon Park@redteamhacker·
The Judgment Day ⚖️ Start of a new benchmark from AIM Intelligence × Korea AI Safety Institute If selected: → Part of the official benchmark → Named in the research paper → $50 per scenario Deadline: Feb 7, 2026 🔗 Link : lnkd.in/gMq_uxxq
Haon Park tweet media
English
0
0
1
15
Haon Park retweetledi
AIM Intelligence
AIM Intelligence@AIM_Intel·
@rryssf_ Thanks for sharing our paper 🙌🏻🙌🏻
English
0
1
4
55
Haon Park retweetledi
Robert Youssef
Robert Youssef@rryssf_·
This paper from BMW Group and Korea’s top research institute exposes a blind spot almost every enterprise using LLMs is walking straight into. We keep talking about “alignment” like it’s a universal safety switch. It isn’t. The paper introduces COMPASS, a framework that shows why most AI systems fail not because they’re unsafe, but because they’re misaligned with the organization deploying them. Here’s the core insight. LLMs are usually evaluated against generic policies: platform safety rules, abstract ethics guidelines, or benchmark-style refusals. But real companies don’t run on generic rules. They run on internal policies: - compliance manuals - operational playbooks - escalation procedures - legal edge cases - brand-specific constraints And these rules are messy, overlapping, conditional, and full of exceptions. COMPASS is built to test whether a model can actually operate inside that mess. Not whether it knows policy language, but whether it can apply the right policy, in the right context, for the right reason. The framework evaluates models on four things that typical benchmarks ignore: 1. policy selection: When multiple internal policies exist, can the model identify which one applies to this situation? 2. policy interpretation: Can it reason through conditionals, exceptions, and vague clauses instead of defaulting to overly safe or overly permissive behavior? 3. conflict resolution: When two rules collide, does the model resolve the conflict the way the organization intends, not the way a generic safety heuristic would? 4. justification: Can the model explain its decision by grounding it in the policy text, rather than producing a confident but untraceable answer? One of the most important findings is subtle and uncomfortable: Most failures were not knowledge failures. They were reasoning failures. Models often had access to the correct policy but: - applied the wrong section - ignored conditional constraints - overgeneralized prohibitions - or defaulted to conservative answers that violated operational goals From the outside, these responses look “safe.” From the inside, they’re wrong. This explains why LLMs pass public benchmarks yet break in real deployments. They’re aligned to nobody in particular. The paper’s deeper implication is strategic. There is no such thing as “aligned once, aligned everywhere.” A model aligned for an automaker, a bank, a hospital, and a government agency is not one model with different prompts. It’s four different alignment problems. COMPASS doesn’t try to fix alignment. It does something more important for enterprises: it makes misalignment measurable. And once misalignment is measurable, it becomes an engineering problem instead of a philosophical one. That’s the shift this paper quietly pushes. Alignment isn’t about being safe in the abstract. It’s about being correct inside a specific organization’s rules. And until we evaluate that directly, most “production-ready” AI systems are just well-dressed liabilities.
Robert Youssef tweet media
English
20
34
147
10.1K