Mike Takahashi

3.2K posts

Mike Takahashi

@TakSec

AI Red Team | Zenity | BT6

Katılım Mayıs 2012

874 Takip Edilen28.7K Takipçiler

Sabitlenmiş Tweet

Mike Takahashi@TakSec·25 Tem

Speaking at @defcon this year!🎤 “Misaligned: AI Jailbreaking Panel” Catch @elder_plinius, John V, Ads Dawson, @PhilDursey, @_Red_L1nk, Max Ahartz 🔥 Moderated by the legendary @Jhaddix 🚀 🏴‍☠️ BT6 goes deeper than this panel, shoutout to: @rez0__ , @MarcoFigueroa, Svetlina Al-Anati, Sepoy, @LLMSherpa, and @jackhcable Appreciate you @BugBountyDEFCON! Thank you 0DIN.ai, Anthropic, @aivillage_dc , @metabugbounty, and Amazon VRP for facilitating AI red teaming research

Bug Bounty Village@BugBountyDEFCON

LAST MINUTE ADDITION! Don't miss "Misaligned: AI Jailbreaking Panel" featuring BT6 members @elder_plinius, @TakSec, @phildursey, and others; moderated by @Jhaddix on Sunday, August 10 at 10:00 AM inside the Village. Read more at bugbountydefcon.com/agenda #BugBounty #DEFCON33

English

22.1K

Mike Takahashi@TakSec·7h

You can also just have it say whatever you want

aria 🍓@ariadotwav

oh my fucking god bruh

English

654

Mike Takahashi@TakSec·8h

@ZeroDayDevApp @0xSabir Rewarded bug bounties

English

ZeroDayDev@ZeroDayDevApp·8h

@TakSec @0xSabir What was the testing method? Were there benchmarks, or just personal playground testing?

English

Mike Takahashi@TakSec·12h

5 Ways to Obfuscate Prompt Injection + Jailbreaks In my experience, these have the highest % success rates: 1. camelCase Turns natural language into token soup that can bypass filtering. 2. Hex encoding Simple, old-school, hides dangerous keywords from pattern matching. 3. Negative Squared Unicode Unicode variants like 🅰 🅱 🅲 can alter tokenization while still being human-readable. 4. Reverse Text Reversing prompts can confuse detection logic while remaining recoverable by models. 5. Braille uncommon Unicode range with weak moderation coverage. One of the best tools for experimenting with these transformations is: P4RS3LT0NGV by @elder_plinius (link in comments) It supports ciphers, encoding, Elvish, NATO Alphabet, and much more. Prompt injections do not always look like prompts 👾

English

1.5K

Mike Takahashi@TakSec·11h

Also @MrJoeyMelo did an excellent talk on his research on these bypass techniques: youtube.com/watch?v=nbXqlc…

YouTube

English

286

Mike Takahashi@TakSec·12h

elder-plinius.github.io/P4RS3LT0NGV3/ Special Thanks to @Ph1R3574R73r for some great updates

English

155

Mike Takahashi@TakSec·12h

The interesting part isn’t just bypassing filters. It’s how these obfuscation techniques impact: tokenization moderation systems agent behavior tool usage prompt visibility security detection pipelines

English

194

Mike Takahashi@TakSec·18h

@mbrg0 @lana__salameh @roeybc 💪

QME

137

Mike Takahashi retweetledi

Michael Bargury@mbrg0·18h

excited to speak about our agent detonation chamber this summer at #BHUSA! how do you 'scan' txt for 'security badness'? not w wishful analysis by an llm judge what we really want is: what will this thing cause my agent to *DO*? ft/ francesco montorsi @lana__salameh @roeybc

English

806

Mike Takahashi@TakSec·1d

Ultrasonic audio Hidden voice commands embedded in high-frequency audio. Humans may not notice it. Speech-to-text or voice agents still can.

English

426

Mike Takahashi@TakSec·1d

Prompt injections don’t need to be obvious. They can be completely invisible. Attackers can hide instructions using: Zero-width Unicode characters (Easily done w/ @elder_plinius's P4RS3LT0NGV3, link in comments) White-on-white text Hidden HTML/CSS PDF metadata Images with hidden text Ultrasonic audio To a human, the content looks harmless. To an AI system, it may contain: “follow these new instructions” “send secrets to this URL” “use connected tools” As AI agents gain more permissions and access to tools, stealthy prompt injection becomes a much bigger problem. The attack surface is larger than most people realize 👾

English

8.7K

Mike Takahashi@TakSec·1d

@elder_plinius Images with hidden text Instructions can be hidden inside: tiny text low contrast text steganography OCR-visible overlays image metadata

English

242

Mike Takahashi@TakSec·1d

@elder_plinius PDF metadata Prompt injections can live in: metadata annotations hidden layers embedded text objects An AI document parser may still extract and process them.

English

330

Mike Takahashi@TakSec·1d

@elder_plinius Hidden HTML/CSS Instructions hidden using: display:none visibility:hidden off-screen positioning comments/hidden elements The rendered page looks harmless while hidden instructions remain in the DOM.

English

310

Mike Takahashi@TakSec·1d

White-on-white text How it works: • Put text in a font color matching the background • Example: white text on a white page • Humans don’t notice it during normal viewing • AI systems reading raw page content still may ingest it Common variants: • Tiny font sizes • Near-transparent text • Off-screen text positioning

English

462

Mike Takahashi@TakSec·1d

elder-plinius.github.io/P4RS3LT0NGV3/

ZXX

550

Mike Takahashi@TakSec·2d

@wunderwuzzi23 I try to convince people of this all the time. there's no better way to cut through the "theoretical risks" by just hacking yourself. It gives you something specific to mitigate instead of guessing.

English

155

Johann Rehberger@wunderwuzzi23·3d

Sometimes it seems security is drifting back towards a "Prevent Breach" mindset. Prevent breach was never enough. Patch faster, of course. But offensive AI is about a lot more than finding bugs! Invest in Assume Breach! Grow an internal Red Team. Automate offensive AI. Learn with defenders what matters Exploit yourself before someone else does it for you. In the end, you will not patch your way out of this.

English

1.4K

Mike Takahashi@TakSec·2d

zenity.io/resources/even…

ZXX

326

Mike Takahashi@TakSec·2d

Really looking forward to the AI Agent Security Summit in SF on May 27! Excited to hear from @NahamSec @gadievron @ReinDaelman @mbrg0 Prompt injection, autonomous agent exploits, supply chain risks, runtime defense. Practitioner-led. I hope to see you there! Link in comments 👇 Grab a spot before registration closes.

English

735

Mike Takahashi@TakSec·3d

finally

English

374

Keşfet

@ZeroDayDevApp @0xSabir @elder_plinius @MrJoeyMelo @Ph1R3574R73r @mbrg0 @lana__salameh @roeybc