
SAIR
61 posts

SAIR
@SAIRfoundation
Terence Tao & Nobel, Turing, Fields laureates advancing scientific discovery & guiding AI with scientific principles. Grounding intelligence. Scaling discovery.


SAIR Playground credit update: - 1 credit now covers up to $0.01 of compute - Users get 10 free credits/day - Some low-cost models are now as low as 0.1 credit/run, so users can do more tests each day. Credit use depends on model price and usage playground.sair.foundation


48h in @SAIRfoundation AI math competition I have been successful in completely saturating their benchmarks across both dataset 1 and 2. I've used AlphaEvolve, @GroqInc fast inference and 100M tokens to discover optimal cheatsheets. Performance jumps of the evaluated models: GPT-OSS-120b: 47% to 99% GPT-OSS-20b: 53% to 98% Llama 70B: 50% to 88% Llama 8B: 63% to 77% These results mean that there is a hidden mathematical capability in smaller models. But it's gated by the lack of proper representation! Both model lines required different approaches. Llama models were tuned on algebraic reasoning, while GPT thrived on symbolic generalize solver approach. My theory is that GPT-OSS-120b should be able to achieve IMO gold just using a proper classifier and tuned representations. I hope these results will contribute to the general understanding of mathematical reasoning capabilities in models. There seems to be a huge undiscovered world underneath, hidden because we get new models every 3 months. A couple of AI researchers have already contacted me about the results, interested in the idea that maybe fine-tunning isn't the only way towards improving math performance. My take is that representation discovery with AlphaFold is generalizable for the problems with the same distribution. If you want "a general math intelligence" you still need fine-tunning.

48h in @SAIRfoundation AI math competition I have been successful in completely saturating their benchmarks across both dataset 1 and 2. I've used AlphaEvolve, @GroqInc fast inference and 100M tokens to discover optimal cheatsheets. Performance jumps of the evaluated models: GPT-OSS-120b: 47% to 99% GPT-OSS-20b: 53% to 98% Llama 70B: 50% to 88% Llama 8B: 63% to 77% These results mean that there is a hidden mathematical capability in smaller models. But it's gated by the lack of proper representation! Both model lines required different approaches. Llama models were tuned on algebraic reasoning, while GPT thrived on symbolic generalize solver approach. My theory is that GPT-OSS-120b should be able to achieve IMO gold just using a proper classifier and tuned representations. I hope these results will contribute to the general understanding of mathematical reasoning capabilities in models. There seems to be a huge undiscovered world underneath, hidden because we get new models every 3 months. A couple of AI researchers have already contacted me about the results, interested in the idea that maybe fine-tunning isn't the only way towards improving math performance. My take is that representation discovery with AlphaFold is generalizable for the problems with the same distribution. If you want "a general math intelligence" you still need fine-tunning.

Our co-founder Terence Tao is announcing SAIR Foundation's inaugural competition: the Mathematics Distillation Challenge. Co-organized by @damekdavis, Terence Tao, and SAIR Foundation. competition.sair.foundation/competitions/m…


Our co-founder Terence Tao is announcing SAIR Foundation's inaugural competition: the Mathematics Distillation Challenge. Co-organized by @damekdavis, Terence Tao, and SAIR Foundation. competition.sair.foundation/competitions/m…



We asked the man who taught computers to check proofs of what AI will do to mathematics. @Leonard41111588 — Founder of @leanprover AI generates ideas. Lean checks them. #Lean #Mathematics #SAIR #PiDay #PiDay2026





We sat down with @JasonSCui, Partner at @a16z. His view: AI for Science isn't a trend — it's a structural shift. AI accelerates discovery. Better science strengthens AI. The loop compounds. 🔗 Full conversation: youtube.com/watch?v=n4eDDd… #AIforScience #SAIR #a16z