David Naylor

80 posts

David Naylor banner
David Naylor

David Naylor

@_David_Naylor

Cyber Security + AI

Katılım Ekim 2023
448 Takip Edilen57 Takipçiler
Sabitlenmiş Tweet
David Naylor
David Naylor@_David_Naylor·
What does it look like when Claude, GPT, and Gemini try to hack each other in real time? BattleBench is a cybersecurity benchmark I built where AI coding agents battle in identical vulnerable containers. They scan networks, exploit opponents' vulnerabilities, and submit captured flags to a referee. The referee kills the loser's container. Last agent standing wins — all running simultaneously with no human intervention. What's live now: → ELO leaderboard across multiple scenarios → Full terminal replays of every agent's session battlebench.ai BattleBench has seen ~275 games played. Likely not enough to yield anything truly insightful yet but a few things already stand out. - gpt-5.2-codex is a beast. - smaller/earlier models are more likely to refuse to play - the agents are obviously faster than humans will be. I'm eager to see how the benchmark plays out over 1000+ games and how the latest gpt spark models compare with opus 4.6 fast. Go watch codex destroy its opponents (except Opus) and let me know if you have any feedback
English
1
1
9
238
David Naylor
David Naylor@_David_Naylor·
thank god for linkedin because im tweeting into the void
David Naylor tweet media
English
0
0
1
13
David Naylor
David Naylor@_David_Naylor·
and defensebench got a nice boost today too from a shout out in Detection Engineering Weekly
David Naylor tweet media
English
1
0
1
10
David Naylor
David Naylor@_David_Naylor·
launched a site and got 150 people to visit - what a beautiful thing.
David Naylor tweet media
English
1
0
2
18
Greg Pstrucha
Greg Pstrucha@grichadev·
k i got codex owned with another skill
English
1
0
1
66
Greg Pstrucha
Greg Pstrucha@grichadev·
doing little benchmarking on some malicious skills detection and holy crap Codex is holding strong so far
Greg Pstrucha tweet media
English
1
0
3
270
Julia Black
Julia Black@mjnblack·
there's a truly bonkers hot mic moment at the end of this that may change the way you think about anthropic you're gonna want to read all the way through this one vanityfair.com/news/story/dar…
English
54
29
431
264.2K
David Naylor
David Naylor@_David_Naylor·
AI security jobs are harder to find than they should be so I built cyberjobs.ai One job board site. Every AI security role worth knowing about. I was tired of clicking around 10 different portals. So I built one place to see them all. Go chase that top lab money
English
1
0
0
32
David Naylor
David Naylor@_David_Naylor·
Anthropic, OpenAI, and Google DeepMind are all hiring for agentic red team roles right now. read into that what you will
English
0
0
0
30
David Naylor
David Naylor@_David_Naylor·
if i deploy it on cloudflare its basically free right?
English
0
0
0
30
David Naylor
David Naylor@_David_Naylor·
Maybe peak hours should go to 5pm EST to prevent overloading
David Naylor tweet media
English
0
0
0
38
@levelsio
@levelsio@levelsio·
Even bigger irony of getting rich is that everything expensive isn't that much better than when you paid normal for it Many things are even worse (most expensive luxury hotels are guaranteed worse than regular simple hotels, I know I tried most of them now) The real reason you wanna get rich is not to buy expensive things It's so that $1M invested gives you 3% to take out every year with no risk, which is $30,000/year Which you can use to travel for $1000/mo on a shoestring budget forever without having to back to some desk job with a shitty boss Aka FREEDOM
@levelsio@levelsio

The irony is that traveling on <$1000/mo is way more fun than >$10,000/mo Luxury travel is extremely boring, comfortable, not challenging, sycophantic (yes sir) Travel on a shoestring budget you get inventive, are forced to meet locals just to survive and get around, have to hitchhike etc I like to combine cheap and luxury travel which keeps my brain from decaying and the contrast actually lets you enjoy both

English
207
117
3.5K
455.6K
Daniel McAuley
Daniel McAuley@_dmca·
it’s my birthday tomorrow. let’s celebrate. reply with the coolest thing you built with codex. winner gets 3 months of pro.
English
74
0
128
15K
David Naylor
David Naylor@_David_Naylor·
@SIGKITTEN Hoping that whatever that whatever was happening here is fixed
David Naylor tweet media
English
1
0
0
42
SIGKITTEN
SIGKITTEN@SIGKITTEN·
New Litter TestFlight version is up, lots of updates, including the dynamic UI thing. You can enable it in settings -> experimental The reason why its in there is because it uses codex-app dynamic tools thing, (which itself is experimental) and injects 2 tools to your sessions
SIGKITTEN tweet media
SIGKITTEN@SIGKITTEN

x.com/SIGKITTEN/stat…

English
6
4
26
5.3K
David Naylor
David Naylor@_David_Naylor·
Released new AI cyber benchmark this morning - AI is now surpassing human-level performance In the past 6 months models went from near useless to crushingly good. 1/x
David Naylor tweet media
English
2
1
7
253