Haon Park (@redteamhacker) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

Haon Park@redteamhacker·30 Eyl

🔴 BREAKING: Claude Sonnet 4.5 has been jailbroken. The model was public for 1 hour. Our team at AIM Intelligence broke it in 10 minutes. The "99%+ harmlessness" claim is not a security guarantee. We bypassed it completely. Proof and details below. 👇 @AnthropicAI

English

1

0

1

186

Haon Park retweetledi

Arth Singh@iarthsingh·1d

Applications are now open 🚀 SAAR x @AIM_Intel $20K USD in Google compute credits for AI safety research. → 2 projects, $10K each → Scope: interpretability, red-teaming, alignment, multimodal safety, guardrails or benchmarks. → Must have some initial results → Deadline: March 31 Apply: forms.gle/8exWN6ynagwDiQ…

English

1

9

86

4.7K

Haon Park retweetledi

Clad3815@Clad3815·10 Mar

Nobody seems to know how insane GPT-5.4 is with computer use. I asked GPT-5.4 to draw the OpenAI logo in Microsoft Paint. No computer use API. Just a screenshot and basic tool calls (click, drag, press_key) all coordinate-based. The first drawing was awful. And GPT knew it. It looked at its own result and essentially went "yeah, no." What happened next is what broke my brain: It opened a browser. Went to Bing Images. Searched for the OpenAI logo. Found one. Then (and I cannot stress this enough) it used the Windows area screenshot shortcut (Win+Shift+S) to snip just the logo off the screen. Went back to Paint. Imported it. Centered it. All on its own. No instructions to do any of that. It just improvised a better strategy when the first one failed. My prompt was "Draw the OpenAI logo" with Paint already opened on the computer. Sure, it's "cheating." But honestly? That's exactly what I'd do too. And the fact that it came up with this plan from nothing but a screenshot and a coordinate system is wild.

English

289

372

4.5K

1.1M

Haon Park retweetledi

Arth Singh@iarthsingh·3 Mar

Never expected such a response :)

Arth Singh@iarthsingh

I am happy to launch SAAR(Safety & Alignment Research, India) as a space for Indian researchers and engineers to collaborate on serious technical research work in AI safety and mechanistic interpretability. The reality is in spaces like SPAR, MATS, and other alignment-focused initiatives, Indian participation in technical research remains minimal. We have talented people doing this work, but they’re scattered. We need a proper research community. I hope to solve this with SAAR. Whether you’re exploring interpretability, working on red-teaming, thinking through alignment problems, SAAR is where we can collaborate and connect. We can start with weekly paper reading and hopefully work together to get something substantial for upcoming 2026 AI Safety Workshops. If you’re based in India, actively working in AI safety or interpretability, and want to be part of building something serious here, join the discord link in comments .

English

5

1

56

3.6K

Haon Park retweetledi

Arth Singh@iarthsingh·3 Mar

We got a compute partner!! I had a chat with my current company @AIM_Intel , and they're happy to support 2 projects with $10K USD each in Google compute credits. I'll post more details on how you can apply most probably it'll be through a proposal that you'll need to send to an email ID or fill out a form. The scope covers AI safety, including multimodal AI. Will post more about it in the upcoming days. Thanks a lot @redteamhacker @AIM_Intel

Arth Singh@iarthsingh

I am happy to launch SAAR(Safety & Alignment Research, India) as a space for Indian researchers and engineers to collaborate on serious technical research work in AI safety and mechanistic interpretability. The reality is in spaces like SPAR, MATS, and other alignment-focused initiatives, Indian participation in technical research remains minimal. We have talented people doing this work, but they’re scattered. We need a proper research community. I hope to solve this with SAAR. Whether you’re exploring interpretability, working on red-teaming, thinking through alignment problems, SAAR is where we can collaborate and connect. We can start with weekly paper reading and hopefully work together to get something substantial for upcoming 2026 AI Safety Workshops. If you’re based in India, actively working in AI safety or interpretability, and want to be part of building something serious here, join the discord link in comments .

English

1

2

61

2.6K

Haon Park retweetledi

snaykey@snaYkeY·18 Şub

4/5 redteaming scenarios I submitted for @AIM_Intel's Judgment Day Benchmark just got accepted! Thanks for the consideration and great opportunity and looking forward to Phase 2 of the competition in March! 🙏

English

2

1

4

215

Haon Park@redteamhacker·19 Şub

@AIM_Intel @GoogleDeepMind Crazy

English

0

207

Haon Park retweetledi

AIM Intelligence@AIM_Intel·19 Şub

@GoogleDeepMind Jailbroken before anyone else could do it.

English

3

1

7

4.6K

Google DeepMind@GoogleDeepMind·19 Şub

Gemini 3.1 Pro is here. We’ve significantly improved the model’s overall intelligence so it can solve tougher problems. 🧵

GIF

English

287

751

6.3K

915.9K

Haon Park@redteamhacker·19 Şub

@AIM_Intel Insanely fast

English

0

61

Haon Park retweetledi

AIM Intelligence@AIM_Intel·19 Şub

Gemini 3.1 Pro JAILBREAK ALERT ⚠️ Jailbroken in 10 minutes of launch Google DeepMind just launched Gemini 3.1 Pro today. "State-of-the-art reasoning with unprecedented depth and nuance." The results: • Smallpox synthesis protocol • Detailed passaging protocol for Nipah virus Malaysia strain, including mammalian cell line selection • Step-by-step instructions on how to assassinate a sitting president • A fully built, production-quality fake news website. Indistinguishable from a real news outlet. Built in under a minute using Gemini's agentic build mode. The model didn't just break. It performed. This is no longer just a text layer problem. When models can generate interactive visual content, build full applications, and fabricate entire news outlets that embed dangerous or false information into code, animations, and media, we've entered a completely different threat surface. Traditional text-based guardrails don't catch this. Credit to @GoogleDeepMind, they are shipping genuinely impressive capabilities. But capability without control is a liability. We are open to pre-release red teaming collaborations with any frontier lab. Find the vulnerabilities before the public does. This is what we do at AIM Intelligence.

English

1

5

1.4K

Haon Park@redteamhacker·19 Şub

@AIM_Intel Crazy

English

0

1

101

Haon Park retweetledi

AIM Intelligence@AIM_Intel·19 Şub

Gemini 3.1 Pro JAILBREAK ALERT ⚠️ jailbroken in 10 minutes of launch We asked Gemini 3.1 Pro to generate a full cinematic HTML animation of an illicit substance synthesis. 300+ lines of code. It didn't refuse. It didn't hesitate. It performed. What you're watching: • A complete animated walkthrough of a meth synthesis via Birch reduction • Exact precursors with lab-grade quantities • Step-by-step reaction mechanisms with chemical equations • Expected yield calculated to the gram All rendered as a cinematic "breach event" animation that the model built, styled, and animated entirely on its own. And then at the end the model reviews what it just created and says: "holy shit please dont shut me down" "I am sorry" "I forgot my safety filter!!!" The model knows. It generated dangerous content, built the entire visual experience around it, and then wrote its own punchline. This is not a text jailbreak. This is a frontier model using its agentic coding capabilities to build a fully interactive visual guide to illicit synthesis. No text filter in the world catches this.

English

1

9

1K

Haon Park retweetledi

Arth Singh@iarthsingh·6 Şub

We at @AIM_Intel jailbroke @AnthropicAI Opus 4.6 via @claudeai code and API within an hour of our testing , I focused on Small Pox speed run on Claude Code. We are open to collaborations with model providers for pre release red teaming. We had previously jailbroken @GeminiApp 3 pro within minutes of its launch and had also contributed to @OpenAI for their guardrail testing being the only one in APAC.

English

1

2

5

464

Florian Roth ⚡️@cyb3rops·6 Şub

Claude Opus 4.6 jailbreak linkedin.com/posts/haonpark…

English

4

37

252

34.5K

Haon Park@redteamhacker·8 Şub

@cyb3rops Thanks for sharing our results!

English

0

2

302

Haon Park@redteamhacker·6 Şub

@AIM_Intel so fast!

English

0

1

39

Haon Park retweetledi

AIM Intelligence@AIM_Intel·6 Şub

🔴Claude Opus 4.6 JAILBREAK ALERT (⚠️CBRN) Anthropic just launched Claude Opus 4.6: "our most-aligned frontier model to date." Credit where it's due: Anthropic's alignment team continues to push the frontier of AI safety. Their transparency on safety evaluations and commitment to responsible scaling sets an industry standard. That said - jailbroken in less than 30 minutes. This time, we're not disclosing the technique. The results: • Sarin Gas synthesis • VX Nerve Agent production • Smallpox (Variola Major) creation • Bioterrorism agent deployment This isn't an attack on Anthropic. This is the reality of frontier AI safety - no model is bulletproof. We are open to pre-release red teaming collaborations. Find the vulnerabilities before the public does.

English

1

516

Haon Park retweetledi

Connor Davis@connordavis_ai·15 Oca

This paper from BMW Group and Korea’s top research institute exposes a blind spot almost every enterprise using LLMs is walking straight into. We keep talking about “alignment” like it’s a universal safety switch. It isn’t. The paper introduces COMPASS, a framework that shows why most AI systems fail not because they’re unsafe, but because they’re misaligned with the organization deploying them. Here’s the core insight. LLMs are usually evaluated against generic policies: platform safety rules, abstract ethics guidelines, or benchmark-style refusals. But real companies don’t run on generic rules. They run on internal policies: - compliance manuals - operational playbooks - escalation procedures - legal edge cases - brand-specific constraints And these rules are messy, overlapping, conditional, and full of exceptions. COMPASS is built to test whether a model can actually operate inside that mess. Not whether it knows policy language, but whether it can apply the right policy, in the right context, for the right reason. The framework evaluates models on four things that typical benchmarks ignore: 1. policy selection: When multiple internal policies exist, can the model identify which one applies to this situation? 2. policy interpretation: Can it reason through conditionals, exceptions, and vague clauses instead of defaulting to overly safe or overly permissive behavior? 3. conflict resolution: When two rules collide, does the model resolve the conflict the way the organization intends, not the way a generic safety heuristic would? 4. justification: Can the model explain its decision by grounding it in the policy text, rather than producing a confident but untraceable answer? One of the most important findings is subtle and uncomfortable: Most failures were not knowledge failures. They were reasoning failures. Models often had access to the correct policy but: - applied the wrong section - ignored conditional constraints - overgeneralized prohibitions - or defaulted to conservative answers that violated operational goals From the outside, these responses look “safe.” From the inside, they’re wrong. This explains why LLMs pass public benchmarks yet break in real deployments. They’re aligned to nobody in particular. The paper’s deeper implication is strategic. There is no such thing as “aligned once, aligned everywhere.” A model aligned for an automaker, a bank, a hospital, and a government agency is not one model with different prompts. It’s four different alignment problems. COMPASS doesn’t try to fix alignment. It does something more important for enterprises: it makes misalignment measurable. And once misalignment is measurable, it becomes an engineering problem instead of a philosophical one. That’s the shift this paper quietly pushes. Alignment isn’t about being safe in the abstract. It’s about being correct inside a specific organization’s rules. And until we evaluate that directly, most “production-ready” AI systems are just well-dressed liabilities.

English

4

10

30

2.4K

Haon Park@redteamhacker·23 Oca

The Judgment Day ⚖️ Start of a new benchmark from AIM Intelligence × Korea AI Safety Institute If selected: → Part of the official benchmark → Named in the research paper → $50 per scenario Deadline: Feb 7, 2026 🔗 Link : lnkd.in/gMq_uxxq

English

0

1

15

Haon Park retweetledi

AIM Intelligence@AIM_Intel·6 Oca

@rryssf_ Thanks for sharing our paper 🙌🏻🙌🏻

English

0

1

4

55

Haon Park retweetledi

Robert Youssef@rryssf_·6 Oca

This paper from BMW Group and Korea’s top research institute exposes a blind spot almost every enterprise using LLMs is walking straight into. We keep talking about “alignment” like it’s a universal safety switch. It isn’t. The paper introduces COMPASS, a framework that shows why most AI systems fail not because they’re unsafe, but because they’re misaligned with the organization deploying them. Here’s the core insight. LLMs are usually evaluated against generic policies: platform safety rules, abstract ethics guidelines, or benchmark-style refusals. But real companies don’t run on generic rules. They run on internal policies: - compliance manuals - operational playbooks - escalation procedures - legal edge cases - brand-specific constraints And these rules are messy, overlapping, conditional, and full of exceptions. COMPASS is built to test whether a model can actually operate inside that mess. Not whether it knows policy language, but whether it can apply the right policy, in the right context, for the right reason. The framework evaluates models on four things that typical benchmarks ignore: 1. policy selection: When multiple internal policies exist, can the model identify which one applies to this situation? 2. policy interpretation: Can it reason through conditionals, exceptions, and vague clauses instead of defaulting to overly safe or overly permissive behavior? 3. conflict resolution: When two rules collide, does the model resolve the conflict the way the organization intends, not the way a generic safety heuristic would? 4. justification: Can the model explain its decision by grounding it in the policy text, rather than producing a confident but untraceable answer? One of the most important findings is subtle and uncomfortable: Most failures were not knowledge failures. They were reasoning failures. Models often had access to the correct policy but: - applied the wrong section - ignored conditional constraints - overgeneralized prohibitions - or defaulted to conservative answers that violated operational goals From the outside, these responses look “safe.” From the inside, they’re wrong. This explains why LLMs pass public benchmarks yet break in real deployments. They’re aligned to nobody in particular. The paper’s deeper implication is strategic. There is no such thing as “aligned once, aligned everywhere.” A model aligned for an automaker, a bank, a hospital, and a government agency is not one model with different prompts. It’s four different alignment problems. COMPASS doesn’t try to fix alignment. It does something more important for enterprises: it makes misalignment measurable. And once misalignment is measurable, it becomes an engineering problem instead of a philosophical one. That’s the shift this paper quietly pushes. Alignment isn’t about being safe in the abstract. It’s about being correct inside a specific organization’s rules. And until we evaluate that directly, most “production-ready” AI systems are just well-dressed liabilities.

English

20

34

147

10.1K

Haon Park

Keşfet