Sabitlenmiş Tweet

Terminal Bench 2.0 paper available: arxiv.org/abs/2601.11868. See where frontier agents and models still fail and how we crowdsource hundreds of high quality environments from the open source community 🚀
Follow github.com/laude-institut… to see how to run TB2 in Harbor!


English








