
@softwaremind We are always waiting for some valuable product from you.
English
Hex2Hack Labs
63 posts

@hex2hack
A SaaS Agency | SaaS Web Applications Dev | Building•Launching•Scaling | MVP Coming Soon. Join us for being part of NextGen Innovation.
















Introducing Horizon-SWE: a benchmark that evaluates AI agents on end-to-end software engineering workflows, not just code generation. We built a full production environment containing a live app serving real traffic, along with integrations like Slack, Linear, Sentry, Prometheus, and CI/CD. Then we asked agents to implement features, deploy, monitor, and fix production issues. Claude Opus 4.5 and Claude Sonnet 4.5 lead at a roughly 21–22% success rate, followed closely by Gemini 3 Pro and GPT-5.2 Codex. Read more here: polymathlabs.ai/blog/horizon-s…


