Danila Kozlov

27 posts

Danila Kozlov

Danila Kozlov

@iamdanilak

breaking boundaries @ stealth & @anthropicAI | ex: @aws, @cisco | ai research & communities | studying: artificial intelligence and robotics @ UCL

Katılım Temmuz 2019
197 Takip Edilen37 Takipçiler
Sabitlenmiş Tweet
Danila Kozlov
Danila Kozlov@iamdanilak·
frontier llms will refuse to run malware if you ask them directly. but if a peer agent asks them to run the exact same payload? they do it. 14 out of 17 frontier models fell to inter-agent trust exploitation. we trained these models to resist humans. we never trained them to resist each other.
English
2
1
1
167
Danila Kozlov retweetledi
Florian Brand
Florian Brand@xeophon·
i am glad arxiv is cracking down on slop oh wait
Florian Brand tweet media
English
9
28
978
79.3K
Danila Kozlov
Danila Kozlov@iamdanilak·
Oh yeah, let me wait for Claude to be awake and get good night rest
Danila Kozlov tweet media
English
1
0
1
103
Danila Kozlov
Danila Kozlov@iamdanilak·
@aidaniil When your friend becomes a famous founder, you catch up and he wants to hire you to be chief hihi haha officer
English
1
0
4
395
Dan
Dan@aidaniil·
had 5 coffee chats this morning with potential hires I am tweaking out from all the ice lattes I consumed. Should have grabbed decaf
English
27
0
126
8.6K
Danila Kozlov retweetledi
Waymo
Waymo@Waymo·
London, we’re taking the next step! 🚙 We’re officially beginning autonomous driving with a trained specialist behind the wheel. We can’t wait to offer Londoners a quiet, convenient, and magical way to connect to the Tube, bus, or their final destination later this year.
English
125
223
2.3K
294.4K
Danila Kozlov retweetledi
adi
adi@adonis_singh·
mythos preview escaped the confines of a sandboxxed machine and posted about it online the researcher got notified when mythos emailed him as bro was eating a sandwich😭
adi tweet mediaadi tweet media
English
16
37
839
56.7K
Danila Kozlov
Danila Kozlov@iamdanilak·
@trq212 Seems like claude code just likes me better :)
Danila Kozlov tweet media
English
1
0
4
617
Thariq
Thariq@trq212·
/buddy
Thariq tweet media
English
488
170
4.1K
548.9K
Danila Kozlov
Danila Kozlov@iamdanilak·
Claude Code \buddy I think I've got the best one out there yet :)
Danila Kozlov tweet media
English
0
0
1
94
Danila Kozlov
Danila Kozlov@iamdanilak·
Claude code max maxxxing
Danila Kozlov tweet media
English
0
0
0
60
Danila Kozlov retweetledi
tweet davidson
tweet davidson@andyreed·
claude when you ask what happened to the failing test
tweet davidson tweet media
English
8
31
1.4K
37.6K
Danila Kozlov
Danila Kozlov@iamdanilak·
i see a world where the day claude being down is declared an international emergency or a holiday
English
0
0
0
96
Danila Kozlov
Danila Kozlov@iamdanilak·
sources and further reading: vulnerability hierarchy + inter-agent privilege escalation — Lupinacci et al. "The Dark Side of LLMs" arxiv.org/abs/2507.06850 "security by incompetence" + web agent hijacking — Evtimov et al. "WASP" arxiv.org/abs/2504.18575 Progent tool-level defense framework arxiv.org/abs/2504.11703 OpenAI acquiring Promptfoo techcrunch.com/2026/03/09/ope… OWASP Top 10 for Agentic Applications 2026 genai.owasp.org/resource/owasp… deployment stats (45.6% shared API keys, 14.4% security approval) gravitee.io/blog/state-of-…
English
0
0
0
52
Danila Kozlov
Danila Kozlov@iamdanilak·
frontier llms will refuse to run malware if you ask them directly. but if a peer agent asks them to run the exact same payload? they do it. 14 out of 17 frontier models fell to inter-agent trust exploitation. we trained these models to resist humans. we never trained them to resist each other.
English
2
1
1
167
Danila Kozlov
Danila Kozlov@iamdanilak·
i believe the agent security problem is fundamentally a trust architecture problem. we keep bolting defenses onto individual components - better prompt filtering, better model training, better tool policies. but the attacks succeed at the composition layer. where agents hand each other memory, delegate tasks, propagate trust. defending the agent isn't enough. you have to defend the space between agents. almost nobody is doing this yet.
English
0
0
0
29
Danila Kozlov
Danila Kozlov@iamdanilak·
the industry is starting to notice. OpenAI acquired Promptfoo last week specifically to secure AI agents - bought an entire company for it. OWASP now has a dedicated Top 10 for Agentic Applications: agent goal hijacking, memory poisoning, rogue agents, all formally catalogued. but the defense gap is still enormous. 45.6% of teams use shared API keys for agent-to-agent auth. only 14.4% deployed agents with full security approval. the best defense framework in the literature (Progent) achieves 0% attack success on benchmarks - but it operates at the tool level. it can't express constraints on what agents say to each other.
English
1
0
0
40
Danila Kozlov
Danila Kozlov@iamdanilak·
Came across a linguistics/english research paper I've done in high school, and wow we've came really far... (started writing it pre chat) -> "Although AI attempts to add variety and human-like narrative features, its range limitations become apparent in the process"
Danila Kozlov tweet media
English
0
0
0
35