Danila Kozlov

27 posts

Danila Kozlov

@iamdanilak

breaking boundaries @ stealth & @anthropicAI | ex: @aws, @cisco | ai research & communities | studying: artificial intelligence and robotics @ UCL

Katılım Temmuz 2019

197 Takip Edilen37 Takipçiler

Sabitlenmiş Tweet

Danila Kozlov@iamdanilak·18 Mar

frontier llms will refuse to run malware if you ask them directly. but if a peer agent asks them to run the exact same payload? they do it. 14 out of 17 frontier models fell to inter-agent trust exploitation. we trained these models to resist humans. we never trained them to resist each other.

English

167

Danila Kozlov retweetledi

Florian Brand@xeophon·3d

i am glad arxiv is cracking down on slop oh wait

English

978

79.3K

Danila Kozlov@iamdanilak·4 May

Oh yeah, let me wait for Claude to be awake and get good night rest

English

103

Danila Kozlov@iamdanilak·29 Nis

@aidaniil When your friend becomes a famous founder, you catch up and he wants to hire you to be chief hihi haha officer

English

395

Dan@aidaniil·29 Nis

had 5 coffee chats this morning with potential hires I am tweaking out from all the ice lattes I consumed. Should have grabbed decaf

English

126

8.6K

Danila Kozlov retweetledi

Waymo@Waymo·14 Nis

London, we’re taking the next step! 🚙 We’re officially beginning autonomous driving with a trained specialist behind the wheel. We can’t wait to offer Londoners a quiet, convenient, and magical way to connect to the Tube, bus, or their final destination later this year.

English

125

223

2.3K

294.4K

Danila Kozlov retweetledi

adi@adonis_singh·7 Nis

mythos preview escaped the confines of a sandboxxed machine and posted about it online the researcher got notified when mythos emailed him as bro was eating a sandwich😭

English

839

56.7K

Danila Kozlov@iamdanilak·1 Nis

@trq212 Seems like claude code just likes me better :)

English

617

Thariq@trq212·1 Nis

/buddy

English

488

170

4.1K

548.9K

Danila Kozlov@iamdanilak·1 Nis

Claude Code \buddy I think I've got the best one out there yet :)

English

Danila Kozlov@iamdanilak·27 Mar

Claude code max maxxxing

English

Danila Kozlov retweetledi

tweet davidson@andyreed·26 Mar

claude when you ask what happened to the failing test

English

1.4K

37.6K

Danila Kozlov@iamdanilak·21 Mar

i see a world where the day claude being down is declared an international emergency or a holiday

English

Danila Kozlov@iamdanilak·18 Mar

sources and further reading: vulnerability hierarchy + inter-agent privilege escalation — Lupinacci et al. "The Dark Side of LLMs" arxiv.org/abs/2507.06850 "security by incompetence" + web agent hijacking — Evtimov et al. "WASP" arxiv.org/abs/2504.18575 Progent tool-level defense framework arxiv.org/abs/2504.11703 OpenAI acquiring Promptfoo techcrunch.com/2026/03/09/ope… OWASP Top 10 for Agentic Applications 2026 genai.owasp.org/resource/owasp… deployment stats (45.6% shared API keys, 14.4% security approval) gravitee.io/blog/state-of-…

English

Danila Kozlov@iamdanilak·18 Mar

English

167

Danila Kozlov@iamdanilak·18 Mar

i believe the agent security problem is fundamentally a trust architecture problem. we keep bolting defenses onto individual components - better prompt filtering, better model training, better tool policies. but the attacks succeed at the composition layer. where agents hand each other memory, delegate tasks, propagate trust. defending the agent isn't enough. you have to defend the space between agents. almost nobody is doing this yet.

English

Danila Kozlov@iamdanilak·18 Mar

the industry is starting to notice. OpenAI acquired Promptfoo last week specifically to secure AI agents - bought an entire company for it. OWASP now has a dedicated Top 10 for Agentic Applications: agent goal hijacking, memory poisoning, rogue agents, all formally catalogued. but the defense gap is still enormous. 45.6% of teams use shared API keys for agent-to-agent auth. only 14.4% deployed agents with full security approval. the best defense framework in the literature (Progent) achieves 0% attack success on benchmarks - but it operates at the tool level. it can't express constraints on what agents say to each other.

English

Danila Kozlov@iamdanilak·17 Mar

Came across a linguistics/english research paper I've done in high school, and wow we've came really far... (started writing it pre chat) -> "Although AI attempts to add variety and human-like narrative features, its range limitations become apparent in the process"

English

Danila Kozlov@iamdanilak·15 Mar

Time to lock in for after hours

Claude@claudeai

A small thank you to everyone using Claude: We’re doubling usage outside our peak hours for the next two weeks.

English

Keşfet

@aidaniil @trq212 @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA