
Excited to announce my preprint "eyeballvul: a future-proof benchmark for vulnerability detection in the wild". I create a benchmark to evaluate the vulnerability detection capabilities of long-context models on entire codebases, containing over 24,000 vulnerabilities, then evaluate 7 leading long-context models on it.




