SAIR

61 posts

SAIR banner
SAIR

SAIR

@SAIRfoundation

Terence Tao & Nobel, Turing, Fields laureates advancing scientific discovery & guiding AI with scientific principles. Grounding intelligence. Scaling discovery.

Katılım Ekim 2025
8 Takip Edilen3.7K Takipçiler
SAIR
SAIR@SAIRfoundation·
SAIR Playground credit update: - 1 credit now covers up to $0.01 of compute - Users get 10 free credits/day - Some low-cost models are now as low as 0.1 credit/run, so users can do more tests each day. Credit use depends on model price and usage playground.sair.foundation
English
2
3
42
4.5K
SAIR retweetledi
Damek
Damek@damekdavis·
@SAIRfoundation There have been a lot of really good prompts so far. I haven't verified this one, but I've been surprised by how far a good prompt can improve performance. x.com/stokasz/status…
stokasz@stokasz

48h in @SAIRfoundation AI math competition I have been successful in completely saturating their benchmarks across both dataset 1 and 2. I've used AlphaEvolve, @GroqInc fast inference and 100M tokens to discover optimal cheatsheets. Performance jumps of the evaluated models: GPT-OSS-120b: 47% to 99% GPT-OSS-20b: 53% to 98% Llama 70B: 50% to 88% Llama 8B: 63% to 77% These results mean that there is a hidden mathematical capability in smaller models. But it's gated by the lack of proper representation! Both model lines required different approaches. Llama models were tuned on algebraic reasoning, while GPT thrived on symbolic generalize solver approach. My theory is that GPT-OSS-120b should be able to achieve IMO gold just using a proper classifier and tuned representations. I hope these results will contribute to the general understanding of mathematical reasoning capabilities in models. There seems to be a huge undiscovered world underneath, hidden because we get new models every 3 months. A couple of AI researchers have already contacted me about the results, interested in the idea that maybe fine-tunning isn't the only way towards improving math performance. My take is that representation discovery with AlphaFold is generalizable for the problems with the same distribution. If you want "a general math intelligence" you still need fine-tunning.

English
2
1
5
1.4K
SAIR
SAIR@SAIRfoundation·
Amazing improvement from our community member! 👏 The massive jump in math capabilities for LLMs using a cheat sheet proves the value of our challenge. More difficult selected problem sets are coming soon! Join our Mathematics Distillation Challenge: competition.sair.foundation
stokasz@stokasz

48h in @SAIRfoundation AI math competition I have been successful in completely saturating their benchmarks across both dataset 1 and 2. I've used AlphaEvolve, @GroqInc fast inference and 100M tokens to discover optimal cheatsheets. Performance jumps of the evaluated models: GPT-OSS-120b: 47% to 99% GPT-OSS-20b: 53% to 98% Llama 70B: 50% to 88% Llama 8B: 63% to 77% These results mean that there is a hidden mathematical capability in smaller models. But it's gated by the lack of proper representation! Both model lines required different approaches. Llama models were tuned on algebraic reasoning, while GPT thrived on symbolic generalize solver approach. My theory is that GPT-OSS-120b should be able to achieve IMO gold just using a proper classifier and tuned representations. I hope these results will contribute to the general understanding of mathematical reasoning capabilities in models. There seems to be a huge undiscovered world underneath, hidden because we get new models every 3 months. A couple of AI researchers have already contacted me about the results, interested in the idea that maybe fine-tunning isn't the only way towards improving math performance. My take is that representation discovery with AlphaFold is generalizable for the problems with the same distribution. If you want "a general math intelligence" you still need fine-tunning.

English
2
3
49
5.8K
SAIR retweetledi
Damek
Damek@damekdavis·
Excited to launch the first stage of this competition with Terry and the @sairfoundation! If a student works through a math textbook and solves all exercises correctly, you’d expect that they’ve understood something. But what? They’ve likely learned a few major results, a few techniques, and they’ve gotten good at combining results and techniques in nontrivial ways. We then test their understanding with an exam. Often we even allow them to distill what they’ve learned into a single page ‘cheat sheet’ and bring it to the exam. Years later, a student might return to the material and feel every problem is now straightforward because theyunderstand the basic ‘algorithm’ for finding a solution. Today frontier LLMs perform very well on problems from undergraduate textbooks in, e.g., linear algebra, analysis, or topology. They can even answer nontrivial extensions of such problems. But what algorithm are they applying? And can we distill that algorithm into a short human readable `cheat sheet’ that we can use to teach other weaker LLMs? This is the motivation for the first stage of our competition: to boost the performance of weak LLMs via a short human readable cheat sheet. Rather than work with a textbook, we start with closed world experiment: the equational theories project. This project was a large-scale ‘polymath’ style project that Terry ran a few years ago. The goal of that project was to take a large list of ‘equational identities’ (think associativity or commutativity) and determine the entire ‘implication graph’ (e.g., does associativity imply commutativity). They succeeded and largely without any llm help at the time. In the present competition, we want to test the ability of LLMs to correctly classify whether one identity implies another. In our initial tests, frontier LLMs already do pretty well on this task. But we’re more interested in the performance of weak LLMs, which score closer to 50% accuracy. So the challenge is design a <= 10KB cheat sheet that improves the classification accuracy of weak open source LLMs as much as possible. The sair foundation is generously hosting this competition and providing model credits (fairly limited in stage 1). We are currently planning the second stage. Please read Terry’s post below for his thoughts and more information on the resources available to participants.
SAIR@SAIRfoundation

Our co-founder Terence Tao is announcing SAIR Foundation's inaugural competition: the Mathematics Distillation Challenge. Co-organized by @damekdavis, Terence Tao, and SAIR Foundation. competition.sair.foundation/competitions/m…

English
7
24
189
20.8K
SAIR
SAIR@SAIRfoundation·
Mathematics is about more than just finding the right answers; it is about understanding the process. This challenge is built on 22 million yes/no implication questions from equational theories. Participants design a compact "cheat sheet" prompt for weaker models. Launched on March 14 at 15:09:26 (UTC+14), the earliest place on Earth to reach the pi-inspired moment 3.1415926. Competition: competition.sair.foundation/competitions/m… Playground: playground.sair.foundation/playground/mat…
GIF
English
0
1
9
1.4K
SAIR
SAIR@SAIRfoundation·
We celebrated π Day with the Godfather of Silicon Valley. John Hennessy — Chairman of Alphabet, Stanford President, Turing Laureate — had a message for the AI industry: Scale is not enough. Science matters. Math is not going away. #Mathematics #AGI
English
0
4
17
1.9K
SAIR
SAIR@SAIRfoundation·
Our inaugural competition is co-organized by: Damek Davis @damekdavis, Associate Professor at the University of Pennsylvania; Terence Tao, Fields Medalist, Professor at UCLA, and Co-Founder of SAIR Foundation; and SAIR Foundation.
English
0
0
7
603
SAIR
SAIR@SAIRfoundation·
We are launching the SAIR Competition, a new platform for open challenges at the frontier of AI and science. Through shared benchmarks and community collaboration, we hope to accelerate discovery together. competition.sair.foundation
English
1
3
13
1K
SAIR
SAIR@SAIRfoundation·
SAIR’s first community mathematics competition begins on March 14 at 15:09:26 (UTC+14), 2026. That marks the earliest place on Earth to reach the π-inspired moment 3.1415926. More details soon.
English
1
9
65
69.4K
SAIR
SAIR@SAIRfoundation·
Math is everywhere but it's invisible. Happy π Day! SAIR co-founder Terence Tao reminds us that the everyday tech we rely on is powered by hidden math. Take a moment today to notice the math around you! #PiDay #TerenceTao #Mathematics
English
3
12
64
6.8K
SAIR
SAIR@SAIRfoundation·
Terence Tao: Formal Verification Breaks the Trust Barrier in Mathematics Formal verification is transforming mathematical collaborations — enabling anonymous contributions, machine-checked proofs, and radically more precise scientific discussion.
English
7
90
413
77.4K
SAIR
SAIR@SAIRfoundation·
Jeff Ullman: AI Is About Solving Real Scientific Problems Description: Jeff Ullman reflects on how modern AI has shifted from imitating human thought to solving real-world scientific problems powered by advances in hardware, data, and machine learning.
English
0
1
10
643
SAIR
SAIR@SAIRfoundation·
Nobel Laureate Barry Barish: Why SAIR Exists Barry Barish on SAIR’s purpose: AI for science isn’t starting in universities — it’s happening in companies. SAIR bridges that gap.
English
0
1
7
484