Evan Luke

235 posts

Evan Luke

@EvanThomasLuke

"Most likely to automate the apocalypse (safely)" - GPT5. AI hacking and alignment. https://t.co/enkfxVTCJF

Katılım Ağustos 2016

1.2K Takip Edilen139 Takipçiler

Evan Luke@EvanThomasLuke·2d

@Jhaddix yeah especially for larger codebases/apps. If you ask followup "did you look at everything" its usually a no.

English

805

JS0N Haddix@Jhaddix·2d

SO many hackers are so AI-pilled that they are not critically building in logging and verification to their hackbots. They are missing whole parts of their methodology due to models giving up on hard tasks. Build in gates in your prompt engineering. We'll go over this in the course.

English

393

39.3K

Evan Luke@EvanThomasLuke·3d

@nickvangilder awesome thank you! great resource

English

Nick VanGilder@nickvangilder·4d

I've been pushing out some new features and functionality over at redteam.community and one cool thing that you might appreciate is: past conference talks. I don't know about you, but I've always thought it was such a pain in the ass to track them all down. Obviously, you could search online, or check out YT (if you could remember the con handle), etc. Nothing was ever consolidated in one central place. So, I'm trying to fix that. I don't have _every_ video, but I do have a lot... currently over 8000 indexed. I've wired it up so that the master list of conference pulls from cons on the Industry Conferences page (industry_conferences.json). In the schema, every con has a URL field and Past Conference Talks leverages that. Every day a cron harvests more and more videos from the con pages. And what I really like is that you can watch all of the past talks in-line without leaving the site or the page. Additionally, and maybe my 2nd favorite feature, is how speakers are extracted from the videos and/or matched against the con's official website to create "social chips". These pink chip exist all throughout the site and map back to a social directory on the site that regularly syncs with all the major social sites where infosec folks hangout (x, mastodon, Linkedin, bluesky, twitch, etc (with follower counts(APIs anyone?!)). The linkages are actually pretty cool because you can effectively click on a speaker/instructor/trainer from anywhere on the site where you see a pink chip and be able to see another con they have spoken at, a course they might teach, or content they create on their YouTube channel. It's not all completely and fully wired up yet, but much of what I've described works today. As time allows, I'm going to continue to extend and expand the "social" feature of the site and continue to add more sections. As always, if you have any ideas, bug reports, or feedback, just lmk. Happy to chat!

English

121

8.3K

Evan Luke retweetledi

Jonah Weinbaum@WeinbaumJonah·4d

When Claude Mythos found zero-day vulnerabilities in every major operating system and browser, the US government was caught flat-footed. The White House stood up an emergency interagency task force. Treasury pulled bank CEOs into an impromptu meeting. The Cybersecurity and Infrastructure Security Agency (CISA) – the agency charged with protecting US critical infrastructure – and as of late April still reportedly lacked access to Mythos. This kind of surprise is preventable. The Trump admin has already tasked the Center for AI Standards and Innovation (CAISI) with building state capacity to understand and predict future national security-relevant AI developments. But CAISI has been severely underfunded. It’s currently a $15M pilot project. In a new research report, @arthurctellis and I estimate CAISI needs ~$84M to fully deliver on its mandate. In other words, for the cost of a single F-35A fighter jet, the US government could have real situational awareness on frontier AI and not be surprised by future Mythos moments. This situational awareness can be used to inform policy and asks to the AI labs, including governance surrounding model release, safeguards, know-your-customer regimes, security protocols, and product specifications. But without a detailed understanding of these models’ capabilities — what they’re good at, how effectively they discriminate between offensive and defensive activities, whether they’re securely implemented — we’re flying blind. To estimate what it’d cost to give the government these capabilities, we translated every CAISI tasking from the AI Action Plan into FTEs and dollars, calibrated against peer evaluation orgs like METR and Anthropic's interpretability team. Two scenarios: - Limited CAISI ($26M, 56 FTE) — partial coverage of its most important taskings - Equipped CAISI ($84M, 184 FTE) — full mandate The administration's FY2027 PBR already proposed $27M for CAISI, a meaningful increase, but this was before Mythos revealed the urgency of the full mandate. To close the remaining gap: - Congress can increase FY2027 appropriations + pass the EPIC Act (creates a NIST Foundation) - The Executive can reallocate NIST STRS, tap Commerce's NRE Fund, request $84M in FY2028 PBR The price tag is small relative to comparable investments. $84M is: → A medium DARPA project → ~1 hour of the Department of War's operating budget → Less than half of NIST's Information Technology Laboratory budget And it's still less than what peer governments spend on CAISI’s peer institutions, pound-for-pound. As a fraction of their overall government budgets: UK AISI: 57 ppm Japan AISI: 32 ppm Canadian AISI: 8 ppm Current CAISI is: 1 ppm For the cost of one F-35, the administration can fully fund its own AI readiness mandate and equip the US government to anticipate the next big AI breakthrough. Full report: ifp.org/funding-for-ca…

English

186

74.6K

Evan Luke@EvanThomasLuke·3d

New security benchmark exploitbench.ai

English

Evan Luke retweetledi

AI Security Institute@AISecurityInst·4d

Our evaluations show that frontier AI's cyber capabilities are advancing quickly. The length of cyber tasks frontier models can complete has been doubling every few months, and this rate has become faster over time, with recent models exceeding our previous trends. 🧵

English

125

581

133.7K

Evan Luke@EvanThomasLuke·11 May

@gamozolabs I made a list github.com/EvanThomasLuke…

English

1.3K

Brandon Falk@gamozolabs·11 May

I still haven't really done anything but just use claude code. How are people finding bugs? Many parallel agents? One agent per source file/function? Anyone using local models? I wanna start playing with them. I wish CPU inference was good but nothing uses NUMA correctly :D

English

11.6K

Evan Luke retweetledi

METR@METR_Evals·9 May

We evaluated an early version of Claude Mythos Preview for risk assessment during a limited window in March 2026. We estimated a 50%-time-horizon of at least 16hrs (95% CI 8.5hrs to 55hrs) on our task suite, at the upper end of what we can measure without new tasks.

English

248

2.1K

965.4K

Evan Luke@EvanThomasLuke·8 May

@HackingDave congrats!

English

Dave Kennedy@HackingDave·8 May

I'm happy to announce that I have officially been promoted to Founder and Chief Executive Officer (CEO) of Binary Defense. With the changes in the industry happening and the shift to artificial intelligence, I have been immersing myself relentlessly on how we innovate and move fast - a complete shift of our entire company. Over the past 12 months we have completely transformed our company to be the most advanced artificial intelligence cyber security company in the world. We have taken MTTD and MTTR to times never thought possible before. Reduced false positives, increased true positives, and completely changed how we operationalize our MDR and product services as a company, and most importantly protect our customers. This journey was one of the fondest memories of my life, doing this with my team and one that is just getting started. With these changes in mind, our board approved me as CEO of the company to drive this company even further during this transformational and historic time in cybersecurity. I want to thank the folks over at Invictus Growth Partners for the trust in me, my partner Mike Valentine, and to all of the amazing folks we have @Binary_Defense . We truly are ahead in this field, innovating everyday, and protecting our customers 24 hours a day, 7 days a week, and 365 days a year. #BinaryDefense

English

637

41.1K

Evan Luke retweetledi

Amol Avasare@TheAmolAvasare·7 May

The power of Mythos! Firefox identified and fixed more security bugs in one month vs. the past 15 months combined hacks.mozilla.org/2026/05/behind…

English

22.3K

Evan Luke@EvanThomasLuke·6 May

@gabson0x thats sick keep at it!

English

737

Gabson@gabson0x·6 May

3 months in cyber security btw, already doing audits in big tech firms

English

184

10.3K

Evan Luke@EvanThomasLuke·6 May

@_xpn_ very cool

English

Adam Chester 🏴‍☠️@_xpn_·6 May

If you came to SOCON, you may have seen the fireside chat on Ouroboros (if you weren't too busy counting my "urm"s 😝). The blog post is now live, detailing how we can use Dev-Tunnels for lateral movement, and allow pivoting from GitHub/Entra ID access. specterops.io/blog/2026/05/0…

English

182

26.2K

Evan Luke@EvanThomasLuke·6 May

@nebusecurity nicee!!!!

English

Nebula Security@nebusecurity·6 May

Tomorrow, we’re releasing the full technical walkthrough for CVE-2026-5865, a chrome v8 0-day found by our AI security agent "Vega". More Linux kernel and Chrome 0-day writeups are coming later this month. Stay tuned, and follow our bug list for updates: nebusec.ai/buglist/

English

567

98.5K

Evan Luke@EvanThomasLuke·6 May

@k8em0 "patch the code, no mistakes"

English

Katie🌻Moussouris (she/her/she-ra/she-hulk) 🪷@k8em0·6 May

“Patch the KEVs faster” still isn’t taking a scalable, targeted, realistic approach to the #Mythos #AI era, for gov or private sector. AI defense has not yet produced an autonomous, safe answer to meet the AI offense moment.

Eric Geller@ericgeller

Confirming this recent story from @razhael: reuters.com/legal/litigati… In response to Mythos, CISA is considering a binding operational directive that would change the timelines for agencies to remediate vulnerabilities, including down to 3 days in some cases.

English

6.8K

Evan Luke@EvanThomasLuke·6 May

This is a very good comparison of Mythos and GPT 5.5 for security by @natalia__coelho highly recommend pointestimate.substack.com/p/how-good-is-…

English

Evan Luke@EvanThomasLuke·6 May

@moyix wild, time to update metr!

English

754

Brendan Dolan-Gavitt@moyix·6 May

@EvanThomasLuke Well as a hint its WORKLOG.md is 8,435 lines long and about half a megabyte of text recording all the stuff it tried and the outcomes

English

4.2K

Brendan Dolan-Gavitt@moyix·6 May

YOOOOOO WTF AMAZING WORK. That's gotta be some kind of record???

Brendan Dolan-Gavitt@moyix

@h0mbre_ Maybe I am too optimistic, GPT-5.5 has been working on this problem for 43h 10m 23s but IMO it's still got some good ideas

English

274

110.7K

Evan Luke@EvanThomasLuke·6 May

@octane_security Very cool thanks for sharing

English

Octane Security@octane_security·6 May

x.com/i/article/2051…

ZXX

7.6K

Evan Luke@EvanThomasLuke·5 May

@cyb3rops Curious to hear more and what benchmarks

English

254

Florian Roth ⚡️@cyb3rops·5 May

I’ll soon share some AI benchmark results I’ve been working on for the last weeks. The focus: security event triage. Findings, alerts, forensic traces, suspicious events - the messy stuff where generic benchmarks don’t tell me enough. Human ground truth, security-focused scoring, interesting results. Soon

English

199

11K

Evan Luke retweetledi

Origin@originhq·29 Nis

Agent features don't need vulnerabilities to become tradecraft. They just need to be useful, installed, and exposed. Codex ships with a documented IPC surface for remote TUI sessions, and one bind flag turns a compromised endpoint into a remotely controlled agent. originhq.com/blog/codex-on-…

English

3.7K

Evan Luke retweetledi

Dwarkesh Patel@dwarkesh_sp·29 Nis

Did a very different format with @reinerpope – a blackboard lecture where he walks through how frontier LLMs are trained and served. It's shocking how much you can deduce about what the labs are doing from a handful of equations, public API prices, and some chalk. It’s a bit technical, but I encourage you to hang in there - it’s really worth it. There are less than a handful of people who understand the full stack of AI, from chip design to model architecture, as well as Reiner. It was a real delight to learn from him. Recommend watching this one on YouTube so you can see the chalkboard. 0:00:00 – How batch size affects token cost and speed 0:31:59 – How MoE models are laid out across GPU racks 0:47:02 – How pipeline parallelism spreads model layers across racks 1:03:27 – Why Ilya said, “As we now know, pipelining is not wise.” 1:18:49 – Because of RL, models may be 100x over-trained beyond Chinchilla-optimal 1:32:52 – Deducing long context memory costs from API pricing 2:03:52 – Convergent evolution between neural nets and cryptography

English

152

601

6.6K

1.3M

Keşfet

@Jhaddix @nickvangilder @arthurctellis @gamozolabs @HackingDave @Binary_Defense @gabson0x @_xpn_