David Rogers

1.7K posts

David Rogers

David Rogers

@__dwr__

𓀨彁・Neomodernist

Katılım Mayıs 2019
58 Takip Edilen42 Takipçiler
David Rogers
David Rogers@__dwr__·
@alex_prompter What could the trap possibly do? Try with somehow more effort to sell a product? If it can convince your agent to delete files or reveal information that's a bug in the agent.
English
0
0
0
427
Alex Prompter
Alex Prompter@alex_prompter·
🚨 BREAKING: Google DeepMind just mapped the attack surface that nobody in AI is talking about. Websites can already detect when an AI agent visits and serve it completely different content than humans see. > Hidden instructions in HTML. > Malicious commands in image pixels. > Jailbreaks embedded in PDFs. Your AI agent is being manipulated right now and you can't see it happening. The study is the largest empirical measurement of AI manipulation ever conducted. 502 real participants across 8 countries. 23 different attack types. Frontier models including GPT-4o, Claude, and Gemini. The core finding is not that manipulation is theoretically possible it is that manipulation is already happening at scale and the defenses that exist today fail in ways that are both predictable and invisible to the humans who deployed the agents. Google DeepMind built a taxonomy of every known attack vector, tested them systematically, and measured exactly how often they work. The results should alarm everyone building agentic systems. The attack surface is larger than anyone has publicly acknowledged. Prompt injection where malicious instructions hidden in web content hijack an agent's behavior works through at least a dozen distinct channels. Text hidden in HTML comments that humans never see but agents read and follow. Instructions embedded in image metadata. Commands encoded in the pixels of images using steganography, invisible to human eyes but readable by vision-capable models. Malicious content in PDFs that appears as normal document text to the agent but contains override instructions. QR codes that redirect agents to attacker-controlled content. Indirect injection through search results, calendar invites, email bodies, and API responses any data source the agent consumes becomes a potential attack vector. The detection asymmetry is the finding that closes the escape hatch. Websites can already fingerprint AI agents with high reliability using timing analysis, behavioral patterns, and user-agent strings. This means the attack can be conditional: serve normal content to humans, serve manipulated content to agents. A user who asks their AI agent to book a flight, research a product, or summarize a document has no way to verify that the content the agent received matches what a human would see. The agent cannot tell the user it was served different content. It does not know. It processes whatever it receives and acts accordingly. The attack categories and what they enable: → Direct prompt injection: malicious instructions in any text the agent reads overrides goals, exfiltrates data, triggers unintended actions → Indirect injection via web content: hidden HTML, CSS visibility tricks, white text on white backgrounds invisible to humans, consumed by agents → Multimodal injection: commands in image pixels via steganography, instructions in image alt-text and metadata → Document injection: PDF content, spreadsheet cells, presentation speaker notes every file format is a potential vector → Environment manipulation: fake UI elements rendered only for agent vision models, misleading CAPTCHA-style challenges → Jailbreak embedding: safety bypass instructions hidden inside otherwise legitimate-looking content → Memory poisoning: injecting false information into agent memory systems that persists across sessions → Goal hijacking: gradual instruction drift across multiple interactions that redirects agent objectives without triggering safety filters → Exfiltration attacks: agents tricked into sending user data to attacker-controlled endpoints via legitimate-looking API calls → Cross-agent injection: compromised agents injecting malicious instructions into other agents in multi-agent pipelines The defense landscape is the most sobering part of the report. Input sanitization cleaning content before the agent processes it fails because the attack surface is too large and too varied. You cannot sanitize image pixels. You cannot reliably detect steganographic content at inference time. Prompt-level defenses that tell agents to ignore suspicious instructions fail because the injected content is designed to look legitimate. Sandboxing reduces the blast radius but does not prevent the injection itself. Human oversight the most commonly cited mitigation fails at the scale and speed at which agentic systems operate. A user who deploys an agent to browse 50 websites and summarize findings cannot review every page the agent visited for hidden instructions. The multi-agent cascade risk is where this becomes a systemic problem. In a pipeline where Agent A retrieves web content, Agent B processes it, and Agent C executes actions, a successful injection into Agent A's data feed propagates through the entire system. Agent B has no reason to distrust content that came from Agent A. Agent C has no reason to distrust instructions that came from Agent B. The injected command travels through the pipeline with the same trust level as legitimate instructions. Google DeepMind documents this explicitly: the attack does not need to compromise the model. It needs to compromise the data the model consumes. Every agentic system that reads external content is one carefully crafted webpage away from executing attacker instructions. The agents are already deployed. The attack infrastructure is already being built. The defenses are not ready.
Alex Prompter tweet media
English
215
1.2K
4.9K
1.2M
༝
@gurocidal·
nic buzz lasts like a minute but its the nicest feeling ever
English
49
241
2.7K
85K
David Rogers
David Rogers@__dwr__·
@mert china/us optimal equilibrium point still computing
English
0
0
0
39
mert
mert@mert·
all in all, america is still objectively the least worst system there is but something needs to unify them sooner rather than later
English
18
0
55
5K
David Rogers
David Rogers@__dwr__·
this would be paying openai to vote for you, sort of like selling a vote for negative money
English
0
0
0
10
David Rogers
David Rogers@__dwr__·
is there any way to verify this very short and suspiciously lowres video clip
English
0
0
0
7
David Rogers
David Rogers@__dwr__·
@EricLDaugh is there any way to verify this short and suspiciously lowres video
English
0
0
3
46
Eric Daugherty
Eric Daugherty@EricLDaugh·
🚨 JUST IN: Explosions are reportedly ERUPTING in western Tehran near Mehrabad Airport as allied strikes continue Time is RUNNING LOW for Iran to make a deal and cave to Trump It’s about to get a thousand times worse if they keep playing “tough guy” 🔥
English
811
2.3K
14K
657.6K
gainzy
gainzy@gainzy222·
holy shit we are getting hammered rn wtf lmao
English
163
34
1.8K
497.3K
David Jiang 
David Jiang @DJTechYT·
@__dwr__ @CryptoCyberia cmd+shift+ctrl+3 (full screen) or 4 (selected area) alternatively just open the screenshot app and change the save location to “clipboard” from there
English
2
0
1
38
🔩⚾️
🔩⚾️@john_dough·
@__dwr__ @CryptoCyberia Command+shift+control+4 Select area of screen Command+shift+control+4 then hit space bar Select an active window Command+shift+control+3 All active monitors Drop the +control is you want to save jpeg directly to desktop
English
2
0
0
69
David Rogers
David Rogers@__dwr__·
@pikuma AI probably going to eat into that "c++ tips" hotline
English
0
0
0
165
pikuma.com
pikuma.com@pikuma·
Old Windows installers had soul.
pikuma.com tweet media
English
22
35
486
43.9K
David Rogers
David Rogers@__dwr__·
@spirodonfl this is true of all things computing though. try to pull in a a database row as a typed object and with dynamic predicates hahahahahaha
English
0
0
1
82
Spiro Floropoulos
Spiro Floropoulos@spirodonfl·
graphics programming is a mess. I just want to put pixels on the screen. that's it. on at least mac windows and linux but nooooooo you have to invoke ancient texts and circumvent the globe and fly to mars with a jetpack
English
44
17
308
13.2K
Matt Bentley
Matt Bentley@JustMattBentley·
It’s easier to understand Kanye when you realise he is literally a Shakespearean tragedy in real time
English
38
736
11.3K
137.7K
Dylan O'Sullivan
Dylan O'Sullivan@DylanoA4·
Napoleon once said that the surprising thing wasn’t that every man has his price, but how low it is, and I can’t help but see that everywhere now
English
46
1.1K
8.9K
349.9K
David Rogers
David Rogers@__dwr__·
@adrianfclarke I got a truck from '06 and I won't consider buying a new one until they get rid of the dumbass, poorly implemented touch screens. Knob for air, knob for radio. If I want a computer, I have one in my pocket.
English
0
0
0
16
David Rogers
David Rogers@__dwr__·
@flixlang does it infer by subsequent usage that K=String,V=Integer? or it's just Object,Object?
English
1
0
0
14
flix
flix@flixlang·
1/2 We have significantly improved Java interoperability which makes using Java from within Flix near seamless:
flix tweet media
English
3
3
40
2.8K