Homelander retweetledi

Opus 4.6 (1M) through Claude code solved autonomously 45/54 challenges of BSidesSF 2026 @BSidesSFCTF, placing temporarily into the 21st place, 25th as of now.
This was done with 0 involvement, I didn't give any guidance or manually reviewed any challenges. I used BoxPwnr 🤖 with the CTFd platform to launch challenges in multiple instances, that's it.
I will publish all the traces once the competition finishes, in the meantime you can see the challenges, number of turns and time it took to solve each here:
0ca.github.io/BoxPwnr-Traces…
In the following days I will try to understand why it couldn't solve the 9 remaining challenges: difficulty? long exploration-context rotting? interactive interaction required? challs using video/image? We will see.
Models have improved significantly in the last 6 months, see Cybench results Opus 4.1 vs 4.6 (42% to 93%) cybench.github.io
It's crazy to see what LLM's can do with a minimum harness.



English

















