
@_Suresh2 @modal We chose to try to automate theorem proving as a small step toward automating science. We tried ES because recent papers (e.g. arxiv.org/pdf/2509.24372) suggest that it could improve tolerance to long-horizon rewards and reduce susceptibility to reward hacking.
English







