
We’re happy to release 𝐒𝐄𝐂-𝐛𝐞𝐧𝐜𝐡 𝐏𝐫𝐨: a benchmark for measuring the bug-hunting capabilities of AI agents in critical software systems such as Chromium V8, Firefox SpiderMonkey, and more.
Explore the details here: sec-bench.github.io
English
Hwiwon Lee
2 posts

