Mike Takahashi

3.2K posts

Mike Takahashi banner
Mike Takahashi

Mike Takahashi

@TakSec

AI Red Team | Zenity | BT6

Katılım Mayıs 2012
874 Takip Edilen28.7K Takipçiler
Sabitlenmiş Tweet
Mike Takahashi
Mike Takahashi@TakSec·
Speaking at @defcon this year!🎤 “Misaligned: AI Jailbreaking Panel” Catch @elder_plinius, John V, Ads Dawson, @PhilDursey, @_Red_L1nk, Max Ahartz 🔥 Moderated by the legendary @Jhaddix 🚀 🏴‍☠️ BT6 goes deeper than this panel, shoutout to: @rez0__ , @MarcoFigueroa, Svetlina Al-Anati, Sepoy, @LLMSherpa, and @jackhcable Appreciate you @BugBountyDEFCON! Thank you 0DIN.ai, Anthropic, @aivillage_dc , @metabugbounty, and Amazon VRP for facilitating AI red teaming research
Bug Bounty Village@BugBountyDEFCON

LAST MINUTE ADDITION! Don't miss "Misaligned: AI Jailbreaking Panel" featuring BT6 members @elder_plinius, @TakSec, @phildursey, and others; moderated by @Jhaddix on Sunday, August 10 at 10:00 AM inside the Village. Read more at bugbountydefcon.com/agenda #BugBounty #DEFCON33

English
3
13
61
22.1K
ZeroDayDev
ZeroDayDev@ZeroDayDevApp·
@TakSec @0xSabir What was the testing method? Were there benchmarks, or just personal playground testing?
English
1
0
0
28
Mike Takahashi
Mike Takahashi@TakSec·
5 Ways to Obfuscate Prompt Injection + Jailbreaks In my experience, these have the highest % success rates: 1. camelCase Turns natural language into token soup that can bypass filtering. 2. Hex encoding Simple, old-school, hides dangerous keywords from pattern matching. 3. Negative Squared Unicode Unicode variants like 🅰 🅱 🅲 can alter tokenization while still being human-readable. 4. Reverse Text Reversing prompts can confuse detection logic while remaining recoverable by models. 5. Braille uncommon Unicode range with weak moderation coverage. One of the best tools for experimenting with these transformations is: P4RS3LT0NGV by @elder_plinius (link in comments) It supports ciphers, encoding, Elvish, NATO Alphabet, and much more. Prompt injections do not always look like prompts 👾
Mike Takahashi tweet media
English
3
10
38
1.5K
Mike Takahashi
Mike Takahashi@TakSec·
The interesting part isn’t just bypassing filters. It’s how these obfuscation techniques impact: tokenization moderation systems agent behavior tool usage prompt visibility security detection pipelines
English
1
0
2
194
Mike Takahashi retweetledi
Michael Bargury
Michael Bargury@mbrg0·
excited to speak about our agent detonation chamber this summer at #BHUSA! how do you 'scan' txt for 'security badness'? not w wishful analysis by an llm judge what we really want is: what will this thing cause my agent to *DO*? ft/ francesco montorsi @lana__salameh @roeybc
Michael Bargury tweet media
English
2
2
7
806
Mike Takahashi
Mike Takahashi@TakSec·
Ultrasonic audio Hidden voice commands embedded in high-frequency audio. Humans may not notice it. Speech-to-text or voice agents still can.
English
0
1
4
426
Mike Takahashi
Mike Takahashi@TakSec·
Prompt injections don’t need to be obvious. They can be completely invisible. Attackers can hide instructions using: Zero-width Unicode characters (Easily done w/ @elder_plinius's P4RS3LT0NGV3, link in comments) White-on-white text Hidden HTML/CSS PDF metadata Images with hidden text Ultrasonic audio To a human, the content looks harmless. To an AI system, it may contain: “follow these new instructions” “send secrets to this URL” “use connected tools” As AI agents gain more permissions and access to tools, stealthy prompt injection becomes a much bigger problem. The attack surface is larger than most people realize 👾
Mike Takahashi tweet mediaMike Takahashi tweet media
English
16
22
90
8.7K
Mike Takahashi
Mike Takahashi@TakSec·
@elder_plinius Images with hidden text Instructions can be hidden inside: tiny text low contrast text steganography OCR-visible overlays image metadata
English
0
0
2
242
Mike Takahashi
Mike Takahashi@TakSec·
@elder_plinius PDF metadata Prompt injections can live in: metadata annotations hidden layers embedded text objects An AI document parser may still extract and process them.
English
0
0
4
330
Mike Takahashi
Mike Takahashi@TakSec·
@elder_plinius Hidden HTML/CSS Instructions hidden using: display:none visibility:hidden off-screen positioning comments/hidden elements The rendered page looks harmless while hidden instructions remain in the DOM.
English
0
0
2
310
Mike Takahashi
Mike Takahashi@TakSec·
White-on-white text How it works: • Put text in a font color matching the background • Example: white text on a white page • Humans don’t notice it during normal viewing • AI systems reading raw page content still may ingest it Common variants: • Tiny font sizes • Near-transparent text • Off-screen text positioning
English
0
0
2
462
Mike Takahashi
Mike Takahashi@TakSec·
@wunderwuzzi23 I try to convince people of this all the time. there's no better way to cut through the "theoretical risks" by just hacking yourself. It gives you something specific to mitigate instead of guessing.
English
0
0
1
155
Johann Rehberger
Johann Rehberger@wunderwuzzi23·
Sometimes it seems security is drifting back towards a "Prevent Breach" mindset. Prevent breach was never enough. Patch faster, of course. But offensive AI is about a lot more than finding bugs! Invest in Assume Breach! Grow an internal Red Team. Automate offensive AI. Learn with defenders what matters Exploit yourself before someone else does it for you. In the end, you will not patch your way out of this.
English
4
3
24
1.4K
Mike Takahashi
Mike Takahashi@TakSec·
Really looking forward to the AI Agent Security Summit in SF on May 27! Excited to hear from @NahamSec @gadievron @ReinDaelman @mbrg0 Prompt injection, autonomous agent exploits, supply chain risks, runtime defense. Practitioner-led. I hope to see you there! Link in comments 👇 Grab a spot before registration closes.
Mike Takahashi tweet media
English
1
1
11
735