Matt Johansen

47.2K posts

Matt Johansen banner
Matt Johansen

Matt Johansen

@mattjay

Founder of @vuln_u | Long Island elder emo surviving in ATX | AI and Cybersecurity news from an 18yr industry vet

Join 35k+ subscribers: Katılım Haziran 2008
1.9K Takip Edilen45.6K Takipçiler
Sabitlenmiş Tweet
Matt Johansen
Matt Johansen@mattjay·
🚨 Exciting thing🚨 I'm getting back to my content creation roots. I've missed blogging, podcasting, and community engagement from back before I worked for big companies with scary PR teams. So... I'm launching a newsletter called Vulnerable U. vulnu.beehiiv.com
English
19
40
287
199.6K
Matt Johansen
Matt Johansen@mattjay·
@UK_Daniel_Card "its not the ai. thats hype. - its just all the resources (ai) that we threw at the problem!"
English
1
0
0
141
mRr3b00t
mRr3b00t@UK_Daniel_Card·
@mattjay What if it’s not magic and is just throwing resources at problems?!
GIF
English
1
0
1
201
Matt Johansen
Matt Johansen@mattjay·
@seanhn and we've seen more research that shows even with the models well below that 83% number - with the right harness you can reproduce the findings of the higher models - provos.org/p/finding-zero…
English
0
0
0
101
Matt Johansen
Matt Johansen@mattjay·
@seanhn well ignoring that 88 is better than 83 over and over we're seeing better results in teams with mature harnesses that can swap in underlying models instead of pointing the model at code and saying "go find bugs"
English
2
0
0
188
Matt Johansen retweetledi
Matt Johansen
Matt Johansen@mattjay·
@daveaitel yeah not a giant fan of evals in general, but am in camp Harness does a lot. Especially convinced given @NielsProvos recent research and the success of Mozilla's recent findings vs other people just running the models on code.
English
0
0
0
235
Zack Korman
Zack Korman@ZackKorman·
@mattjay Is there some other source for this? The only claim about the in-person part is very vague
English
1
0
2
129
Zack Korman
Zack Korman@ZackKorman·
@mattjay This entire type of attack is solved if you meet people in person before hiring them
English
12
1
30
978
Matt Johansen retweetledi
Matt Johansen
Matt Johansen@mattjay·
I agree with ToB take here - but what does this have to do with Daybreak? Does Daybreak address the points made here? It seems to me like it doesn't? Just more bug finding, which as ToB said is more of a problem than it is a solution. Also worth contrasting this to Google's ffmpeg convo last year. ToB (not a multi trillion dollar company) isn't passing bugs over to OSS projects without a PoC, patch, and regression test. Kudos.
Trail of Bits@trailofbits

We were one of four initial grant recipients in @OpenAI's Trusted Access for Cyber program. Daybreak matters because frontier models now find bugs faster than maintainers can triage them, and that gap is about to get worse. Next-gen models can bury open-source maintainers in reports. While working with frontier labs this year, we have seen the bottleneck shift. Bug finding is easy, but triaging, disclosing, and fixing them takes disproportionate time and effort. Each finding still needs a human to confirm the bug, a static or dynamic check to reproduce it, a working proof-of-concept, and a minimal patch. That work is heavy, and right now it falls on the maintainer. On the OSS engagements we ran this year, we prioritized minimizing maintainer workload and keeping noise out of their inboxes. Every report we sent included a PoC, a fix patch, and a regression test. Anything that did not clear that bar did not get sent. Commonly used software has never been short of bugs. Cyber-tier models will surface them at machine speed with little human effort, and the volume will overwhelm OSS projects without clear processes for disclosure, triage, and remediation. If you maintain an OSS project, do four things: 1. Publish a SECURITY.md. If you already have one, verify the reporting flow still works end to end. 2. Set a high bar for submissions. Require a PoC, a fix patch, and a regression test wherever possible. 3. Build validation harnesses that quickly answer three questions: is the bug real, does the fix work, and does anything else break? 4. Sandbox those harnesses. Malicious reports are a credible threat once the cost of generating them drops to near zero. Bug finding is getting faster. Triage, verification, disclosure, and patching have to catch up.

English
2
0
11
2.8K
Matt Johansen
Matt Johansen@mattjay·
Oh! I missed this. I thought this wave started with a compromised dev account like previous waves. Something has to give at GitHub - it's on them to solve this, not on us to get people to harden their workflows properly - losing battle.
Matt Johansen tweet media
English
2
1
22
2.5K