Bruno Andreis retweetledi

Very excited to announce HorizonMath with @erikyw26 and collaborators!
How can we measure AI progress on mathematical discovery? Turns out there’s several classes of problems where discovery is hard but verification is easy. We develop a benchmark with 101 such problems and test GPT 5.4 Pro, Claude 4.6 Opus, and Gemini 3.1 Pro.
Pending expert review, GPT 5.4 Pro finds two potentially novel solutions that beat existing baselines🧵

English
