Brian Pak

603 posts

Brian Pak banner
Brian Pak

Brian Pak

@brian_pak

ai + security + alpha CEO @theori_io / @xint_official → building the world's best AI hacker 9x DEF CON CTF winner CMU CS '11 | founded PPP & MMM

Seoul / SF Bergabung Nisan 2010
201 Mengikuti3.1K Pengikut
Brian Pak
Brian Pak@brian_pak·
interestingly, not fuzzing. xint code reviews the code, reasons about potential vulnerabilities, and validates the theory, all in static analysis fashion. it is possible to hook up with the dynamic testing to be even more certain about the validation; but it already does pretty good job of weeding out false positives.
English
1
0
2
429
Brian Pak
Brian Pak@brian_pak·
I promise the bug is real, tho
English
0
0
5
1.3K
Brian Pak
Brian Pak@brian_pak·
and yes, RHEL 14.3 doesn't exist 😅 We meant to say RHEL 10.1. Sorry for the confusion! And also yes, the static webpage copy.fail -- even the logo -- is vibe-coded. Too busy triaging shit ton of other bugs to build a legit website ground up.. and i think it's a perfect use case of vibecoding tbh 😆
English
3
4
47
5.1K
Brian Pak
Brian Pak@brian_pak·
@msolnik oops. should be public now! sorry about that.
English
1
0
20
22.1K
Brian Pak
Brian Pak@brian_pak·
Surfaced by Xint Code — our AI vuln research platform — pointed at the kernel's crypto/ for about an hour, on a starting hunch from @5unKn0wn. Came back with CopyFail (plus others, still in coordinated disclosure). Write-up + PoC (exploit): copy.fail Xint Code: code.xint.io
English
4
32
285
54.2K
Brian Pak me-retweet
Xint
Xint@xint_official·
Anthropic is (rightfully) generating a lot of attention for Mythos’s ability to find 0days, BUT the hard problem is not whether an LLM can recognize a bug when pointed at it; it is whether a system can find the right code to examine across a 9-million-line codebase, distinguish the one real vulnerability from the hundreds of theoretical weaknesses the model will flag along the way, and deliver output a developer can act on without wasting a week on false positives. This is something Xint has been doing since our wins at AIxCC and #ZeroDayCloud last year. We wanted to see if using publicly available models with the right scaffolding would reach the same performance as the latest limited-release frontier model under **real world conditions** In this research paper not only did we find all the same bugs highlighted in Anthropic’s report, but found an additional 12 mid- to high-severity vulnerabilities not included in their public disclosures. Check out the full report here: go.xint.io/xint-mythos-ap…
English
0
13
50
18.2K
Brian Pak
Brian Pak@brian_pak·
90-day disclosure policy isn't gonna cut it. Be ready.
English
0
0
7
1.2K
Brian Pak me-retweet
Xint
Xint@xint_official·
🚨 🚨 A critical CPython CVE today took less than 45 minutes of human work to find, triage, and fix because of Xint Code: 🚄 Xint Code found it in a Fast scan on the repo with no prompting 💥 A coding assistant reproduced it on the first try 🛠️ Maintainers pushed a fix 30 minutes after the report. theori.io/blog/finding-a…
English
0
9
22
3K
Brian Pak me-retweet
Tim Becker
Tim Becker@tjbecker·
Evaluating models on cybersecurity tasks is *really* hard -- probably the *hardest* part of building these tools. I want to correct a few misconceptions from this post. > The results show something close to inverse scaling: small, cheap models outperform large frontier ones Yes, because this only tested for true positives! This completely ignores the unbearably high false positive rate you get from small, open models. Small models are incredibly sloppy thinkers that are easily biased to give "desired" outcomes. You can give them almost any nontrivial code snippet and they will "find vulnerabilities". If you ran this system across the entire codebase, it would be impossible to identify the real bugs from the slop. Truly impressive models (and scaffolds) strike a balance of finding the subtle bugs without too much noise. For now, large closed-weight models with scaffolds for extensive validation dominate.
Stanislav Fort@stanislavfort

New post: We tested the Mythos showcase vulnerabilities with open models. They recovered similar scoped analysis! 8/8 models found the flagship FreeBSD zero-day, including a 3B model. Rankings reshuffle completely across tasks => the AI cybersecurity frontier is super jagged!

English
4
18
115
33.5K
Brian Pak
Brian Pak@brian_pak·
@HaifeiLi We run both. Frontier models are still way ahead for the hard stuff 😭 but we benchmark every new open model release. Maybe, one day.
English
1
0
2
270
Brian Pak me-retweet
Xint
Xint@xint_official·
Fun fact: We actually discovered this issue accidentally when our system reported finding a new bug in one of our old benchmarks. We were surprised to find out it was actually a #0day in NGINX! Additional coverage from @gbhackers_news #google_vignette" target="_blank" rel="nofollow noopener">gbhackers.com/f5-nginx-plus-…
English
0
6
22
20.7K
Brian Pak
Brian Pak@brian_pak·
Naturally, the first thing we did was run it through Xint Code. Unsurprisingly, the vibe-coded app has quite a few vulnerabilities surfaced within minutes, including vuln101-level bugs (e.g. `.includes()` instead of `.startsWith()`). I guess @AnthropicAI wasn't kidding when they said "90% of the code written at Anthropic is written by Claude." What I'm really curious about is where Anthropic draws the security boundary. Claude Code asks whether you trust the workspace at the very start, and you basically can't use the tool unless you consent. From that point on, all responsibility shifts to the user. Consent once, and running Claude on a directory becomes a 0-click RCE vector in multiple ways. So maybe these aren't considered security vulnerabilities as far as they're concerned…?
Chaofan Shou@Fried_rice

Claude code source code has been leaked via a map file in their npm registry! Code: …a8527898604c1bbb12468b1581d95e.r2.dev/src.zip

English
0
7
35
6.5K