Sever Topan retweetet

Excited to share FrontierSWE!
FrontierSWE is an ultra-long horizon, extremely hard coding agent benchmark.
Models get 20 hours to solve tasks spanning greenfield implementation, research, and performance optimization.
No frontier model is currently capable of solving *any* task in FrontierSWE.

English




