
Check out the original RExBench announcement for more details about the benchmark: x.com/yukyunglee_/st…
Yukyung Lee@yukyunglee_
Can coding agents autonomously implement AI research extensions? We introduce RExBench, a benchmark that tests if a coding agent can implement a novel experiment based on existing research and code. Finding: Most agents we tested had a low success rate, but there is promise!
English

