Jason Rute

497 posts

Jason Rute

Jason Rute

@JasonRute

AI Researcher @ Mistral AI | Formally IBM Research | Former Mathematician/Logician/Data scientist | Building AI for math and reasoning

Katılım Temmuz 2022
226 Takip Edilen712 Takipçiler
Sabitlenmiş Tweet
Jason Rute
Jason Rute@JasonRute·
Announcing our fully open source code agent to support development in @leanprover. This has been a labor of love by our team at @MistralAI and we look forward to seeing what the #LeanProver community does with it!
Jason Rute tweet media
English
1
28
161
10.4K
Jason Rute
Jason Rute@JasonRute·
@gro_tsen Nevermind. I see you say for arbitrarily large n and talk just the sup of delta. My bad.
English
1
0
1
19
Jason Rute
Jason Rute@JasonRute·
@gro_tsen I think you also need a big-O or little-O fudge term. For example Sawin’s result is n^1.014114/C for some constant C and Erdos’s original was n^{1 + o(1)}. (Or maybe this is implied without saying in your post.)
English
2
0
0
34
Gro-Tsen
Gro-Tsen@gro_tsen·
So, since three days ago, there's a new constant in mathematics, which one might call the “plane unit distance exponent”: the sup of all δ≥1 such that there exist unit distance graphs in the plane with n vertices and n^(1+δ) edges for arbitrarily large n.
English
6
4
52
9.9K
Jason Rute
Jason Rute@JasonRute·
@prz_chojecki It will be interesting to compare this to the gap that the LANA project said they found and will announce this summer.
English
0
0
0
261
Przemek Chojecki | PC
Przemek Chojecki | PC@prz_chojecki·
Funny things happen when you start asking GPT-5.5 Pro to fill gaps in Mochizuki's work, especially 3.11.5 ⇒ 3.12 passage objected by Scholze-Stix. LLMs are probably the best shot at digesting this 1000+ pages long proof and translate it into a more standard Arakelov geometry approach.
Przemek Chojecki | PC tweet media
English
5
7
93
22.9K
Jason Rute
Jason Rute@JasonRute·
@aaswaminathan01 @mathandcobb While I think your take has some truth (we will soon be able to autoformalize a nontrivial amount of math papers into say Lean), I think it is missing a large degree of technical, practical, and sociological nuance.
English
0
0
1
58
Ashvin Swaminathan
Ashvin Swaminathan@aaswaminathan01·
@mathandcobb I think we are close (i.e., within two years) to a moment where people should be expected to provide formalizations of their work, built on formal statements of inputs from references. This isn't foolproof, but will go some way to solve the fabricated reference problem.
English
1
0
2
368
Ashvin Swaminathan
Ashvin Swaminathan@aaswaminathan01·
In the context of math papers, all this discussion about the arXiv is moot. We are in the era of formalization. We should aim for end to end lean proofs.
English
14
9
72
8.1K
Ksenia Se
Ksenia Se@Kseniase_·
@CarinaLHong @ylecun @logic_int I really appreciate your input. I think it will be a tremendously interesting conversation if you and @evelovesolive can have a recorded conversation discussing science behind your approaches
English
1
0
0
338
Ksenia Se
Ksenia Se@Kseniase_·
EBM are so back! @ylecun has been pointing here for years: AI reasoning needs systems that check structure before they answer. Aleph from @logic_int now leads the major formal reasoning benchmarks – let me explain what it is -> 📺
English
21
46
339
57.9K
Jason Rute
Jason Rute@JasonRute·
@danrobinson I think it might be unethical to raise Erdős from the dead, although the idea has been considered for other purposes: xkcd.com/599/
English
0
0
3
213
Dan Robinson
Dan Robinson@danrobinson·
Everyone seems to be working on tools to automatically solve Erdos-style problems Is anyone working on automated generation of new ones?
English
19
5
150
51.8K
Jason Rute
Jason Rute@JasonRute·
@LatinumAI Did you rewrite the interpreter too or just the compiler? I guess now @leanprover needs an external compiler bench (alongside their external kernel bench).
English
1
0
0
99
Latinum Frontier Mathematics Research Lab
Why are formal languages not used as programming languages? Most were built only for mathematics. Lean lets you write real code, but its compiler assumes a garbage collector, threads, and an operating system underneath. So I rewrote it.
English
4
1
22
2.1K
Lior Pachter
Lior Pachter@lpachter·
There is much being written about AI in pure mathematics but less in applied mathematics, where I'm finding it to be just as impactful, if not more so.
English
3
1
21
5K
Jason Rute
Jason Rute@JasonRute·
@ChrSzegedy @prz_chojecki That paper was very influential to my view on this field. Especially the autoformalization/proving flywheel. It feels close.
English
0
0
0
31
Przemek Chojecki | PC
Przemek Chojecki | PC@prz_chojecki·
Solve math → Solve everything else This is why scaling curves still look smooth but the jumps feel discontinuous. Math is the narrowest bottleneck in the entire intelligence stack. Crack it cleanly and the rest of cognition is just downstream reuse of the same circuits. No more “we need a special module for X.” X is always math in disguise. LLMs that only pattern-match calculus can win existing benchmarks. The ones that internalize proof, generalization, and counterfactuals at the root level get the keys to every lock. If you want to get even close to AGI, you have to pass through mathematics. Math is the ur-substrate. This is what I'm building for.
English
23
9
109
9K
Jason Rute
Jason Rute@JasonRute·
@littmath Can you explain “verification is the bottleneck”?
English
2
0
3
663
Daniel Litt
Daniel Litt@littmath·
stochastic parrots: “it doesn’t think” “verification is the bottleneck” “solve math, solve everything else,” “they’re just stochastic parrots”
English
20
20
311
26.5K
Jason Rute
Jason Rute@JasonRute·
@giffmana Math is usually fairly robust to errors, much more than code. There are lots of articles about why. This particular benchmark however is designed adversarially to be very fiddly, calculation based, and non-intuitive (else the model will guess the solution).
English
0
0
1
144
Epoch AI
Epoch AI@EpochAIResearch·
We are conducting an AI-assisted review of FrontierMath: Tiers 1-4. This has flagged fatal errors in about a third of problems, and we believe most of these flags to be valid. We will release updated scores on a corrected dataset after completing a thorough human review.
English
31
67
875
466.5K
Elliot Glazer
Elliot Glazer@ElliotGlazer·
Thank you for your participation in this year’s continuum survey. Please enjoy your complimentary continuum clock.
Elliot Glazer tweet media
English
8
19
218
10K
Jason Rute
Jason Rute@JasonRute·
@VictorTaelin Have you ever considered writing an external checker for Lean? (I assume this particular is different from Lean.)
English
0
0
0
101
Taelin
Taelin@VictorTaelin·
To whom it may concern NanoProof.hs: the smallest viable proof checker I posted something similar before, but it was more of a research experiment with weird λ-encoded shit, than something usable. This new repo contains a tiny, 1000-LOC Haskell self-contained proof checker that you can actually use to prove arbitrary theorems. The language has just 6 base types: → Empty (`⊥`): type with 0 elems → Unit (`⊤`): type with 1 elem (`()`) → Bool (`𝔹`): type with 2 elems (`0 | 1`) → Sigma (`ΣA.B`): dependent pairs (`(x,y)`) → Pi (`ΠA.B`): dependent functions (`λx.f`) → Equal (`a==b`): propositional equality (`{==}`) That's all you need. Each of these is needed, as it introduces something fundamental. The file includes a parser, stringifier, equality, a bidirectional type checker, and a simple CLI. It also includes first-class reduction relations, which allow us to pretty print goas just like Lean. You can place '()' in a position to inspect the current context and goal there. I also include a demo proof for the commutation of multiplication.
English
12
11
280
17.4K
Jason Rute
Jason Rute@JasonRute·
@lacker @julianboolean_ I think we have plenty of more difficult problems already? But if it is a new conjecture, it would be interesting (at least the first time) exactly because the AIs are trying to convince us it is important.
English
0
0
2
31
Kevin Lacker
Kevin Lacker@lacker·
@JasonRute @julianboolean_ I guess after the math AIs solve the Riemann Hypothesis, their next challenge will be inventing a hypothesis that they can convince humanity is even more important.
English
3
0
2
71
Julian
Julian@julianboolean_·
the more i think about this, the more wrong I think Gowers is here Math is infinite. Every proof or problem you could ever write about is in the Library of Babel. LLMs just make it more accessible. Finding a proof of a particular problem may have gotten easier, like how complex-bashing made elementary geometry easy. But the fun is in picking out the right problem - the interesting problem - from all the books in the Library. And that will always be possible
Julian tweet media
Nabeel S. Qureshi@nabeelqu

The mathematician Tim Gowers: "the era where you could enjoy the thrill of having your name forever associated with a particular theorem or definition may well be close to its end"

English
16
10
61
19.1K
Jason Rute
Jason Rute@JasonRute·
@lacker @julianboolean_ There are a number of good videos aimed at more general audiences explaining advanced math concepts including fields medal winning papers. In this hypothetical scenario where AI is good at everything, they would also be good at making this kind of content.
English
3
0
2
75
Kevin Lacker
Kevin Lacker@lacker·
@julianboolean_ I dunno, if cutting edge math becomes too hard for humans to understand, why would anything be interesting any more? Would we have to trust the AIs when they told us which open questions were interesting, and worth working on?
English
2
0
11
463
Jason Rute
Jason Rute@JasonRute·
@littmath Did it work out that it is likely open, or find out from say your paper?
English
1
0
2
455
Daniel Litt
Daniel Litt@littmath·
I have an ambitious conjecture that reasoning models, since o3, are convinced is false. I pose it to each new model. Earlier model generations would consistently hallucinate counterexamples; GPT 5.5 Pro spends ~an hour searching and then grudgingly concedes it’s open.
English
30
16
815
71.1K
Jasper Dekoninck
Jasper Dekoninck@j_dekoninck·
@xeophon I think the more problematic statement that people make for such benches is: "models really suck at this task"
English
1
0
4
469
Florian Brand
Florian Brand@xeophon·
"our bench is really hard*" * (we prompt the models in emojis, it has to access to 841 tools which are loaded in the user message and we give it 12 seconds wall clock time)
English
5
0
136
5.1K
Jason Rute
Jason Rute@JasonRute·
@Anthony_Bonato AI can do the verification as well, especially with formal verification (but humans can still verify the verifiers).
English
0
0
4
207
Jason Rute
Jason Rute@JasonRute·
@LawrPaulson @mathladyhazel I think the best thing would be to have two different operations for exponention, one where the exponent is in Nat (for mult. moniods) / Int (for mult. groups) 0^0=1 is right for that. And one when the exponent is a real/complex/vector/matrix. For the latter use say exp(ln(a)*b)?
English
0
0
0
56