Jason Rute

426 posts

Jason Rute

Jason Rute

@JasonRute

AI Researcher @ Mistral AI | Formally IBM Research | Former Mathematician/Logician/Data scientist | Building AI for math and reasoning

Katılım Temmuz 2022
204 Takip Edilen647 Takipçiler
Sabitlenmiş Tweet
Jason Rute
Jason Rute@JasonRute·
Announcing our fully open source code agent to support development in @leanprover. This has been a labor of love by our team at @MistralAI and we look forward to seeing what the #LeanProver community does with it!
Jason Rute tweet media
English
1
28
155
9.8K
Jason Rute
Jason Rute@JasonRute·
@minimario1729 Typo: I didn't mean to say Opus is a small model :facepalm: I just looked again at your results and I see you use only Opus (and maybe a GPT model for the Codex experiments). So maybe these large models can always recall proofs correctly, but it would be good to know.
English
0
0
0
13
Jason Rute
Jason Rute@JasonRute·
@minimario1729 Have you measured this? Like does instructing the model to search the literature or directly feeding the proof to the model, do either help? Or make zero difference? I could easily see the models (especially the smaller ones like Opus) miss small details in their recall.
English
1
0
0
19
Alex Gu
Alex Gu@minimario1729·
🚨 Math Inc is introducing FormalQualBench: an open-source benchmark for end-to-end auto-formalization capabilities with math PhD qualifying exam level problems. We build this benchmark for the Lean community, allowing anyone to compare different auto-formalization agents!
English
3
23
180
14.2K
Jason Rute
Jason Rute@JasonRute·
@minimario1729 Also, what happens when these theorems end up in Mathlib or other Lean repos. (Maybe some of these are already formalized in some obscure repo even.) Will the benchmark adapt or is this just more of a snapshot in time benchmark for the currently available models?
English
1
0
0
26
Jason Rute
Jason Rute@JasonRute·
@minimario1729 You say it is "auto-formalization" but where is the natural language proof? Are you providing that? Do you expect that the model has memorized the proof? Do you expect the model to do a literature search for the proof?
English
2
0
0
39
Alex Gu
Alex Gu@minimario1729·
Key features: - High-quality, expert-checked problems - Challenging but accessible: 8/23 solved, topped by our OpenGauss - A new evaluation standard resisting exploits using specification-based evaluation
English
2
1
13
885
Jason Rute retweetledi
Apoth3osis
Apoth3osis@apoth3osis_io·
@JasonRute @Leonard41111588 So far, it has been absolutely great to work with. Thank you to the entire #Leanstral team this is exactly the kind of project we need more people working on.
English
0
1
1
113
Jason Rute
Jason Rute@JasonRute·
@littmath In mathematical logic (especially computability theory), Andre Nies's Logic Blog is for this sort of stuff: arxiv.org/abs/2503.12673 Also, if you are already on arXiv, little stops you from uploading a short note for posterity. I think peer review is the harder part.
English
0
0
1
413
Daniel Litt
Daniel Litt@littmath·
Sometimes you solve an "open problem" and the solution is just not very interesting (e.g. it would be cool if a conjecture was true, since that would imply something else important, but it turns out not to be true). Usually this doesn't result in a publication, but rather just an email to the person who originated the problem. This is in part because it takes a lot of work to write something publication-quality (not to mention making work for other people who have to referee it), and arguably that's not worth it for results of limited interest. On the other hand, it would be good to have a way to disseminate such solutions so that other people don't waste effort on them. Worth keeping this in mind when looking at AI-generated solutions to open problems too. Not all of the examples of such thus far have this nature, but many of them do.
English
18
5
377
27K
Jason Rute
Jason Rute@JasonRute·
Thanks, @Leonard41111588 ! And thanks for making such an awesome language! We really hope #Leanstral is a benefit to the Lean community, and we are actively working on making it even better!
Leonardo de Moura@Leonard41111588

@royvanrijn Congratulations to the Mistral team! Great to see a dedicated open-source Lean 4 coding agent. The Lean community is growing fast, and tools like Leanstral will help more people get productive with Lean quickly. Looking forward to seeing how it evolves.

English
1
2
20
1.2K
Jason Rute
Jason Rute@JasonRute·
Announcing our fully open source code agent to support development in @leanprover. This has been a labor of love by our team at @MistralAI and we look forward to seeing what the #LeanProver community does with it!
Jason Rute tweet media
English
1
28
155
9.8K
Jason Rute
Jason Rute@JasonRute·
Use for free in Mistral-Vibe: $ uv tool install mistral-vibe --upgrade $ vibe /leanstall Then shift+tab to lean mode
English
1
0
17
566
Jason Rute
Jason Rute@JasonRute·
@Sidjain_90 @kfountou Formal (e.g. Lean) really helps here. But ultimately math is very self correcting, with or without formal, and a number of scaffolds have been able to leverage that fact.
English
1
0
1
43
Sid Jain
Sid Jain@Sidjain_90·
@kfountou FWIW, 0.4 accuracy multiplies down to a pretty low number if you're using it for a large proof with multiple lemmas, etc.
English
2
0
1
381
Jason Rute
Jason Rute@JasonRute·
@Anthony_Bonato I was really proud of this response until I read the original tweet again. 🤦‍♂️
English
0
0
0
11
Jason Rute
Jason Rute@JasonRute·
@Anthony_Bonato I can’t determine if this is a rational holiday or not, but probably not.
English
1
0
4
719
Jason Rute
Jason Rute@JasonRute·
@sebngriego I don’t disagree, but every axiom halves the trustworthiness of the proof. It is VERY easy to misstate a result if there is no proof with it. (Also explicit assumptions > axioms.)
English
0
0
0
40
Sebastian Griego
Sebastian Griego@sebngriego·
Would you consider a Lean proof valid if it axiomatizes known results from the literature while proving a theorem?
English
16
0
35
5.1K
Jonathan Huerta y Munive
Jonathan Huerta y Munive@jjHuertayMunive·
This week, Josef Urban and I recreated his autoformalisation experiment in Isabelle. I also used Claude Code to produce CLI tools that expose certain Isabelle commands to the terminal, hoping to make autoformalisation workflows easier for the agents: github.com/yonoteam/isa_a…
English
1
0
2
111
Acer
Acer@AcerFur·
@JasonRute @MoritzFirsching James Hanson would be the person to ask since it’s his and I haven’t asked if it passes them. I know his previous exploit passed lean4checker and SafeVerify though.
English
1
0
0
133
Jason Rute
Jason Rute@JasonRute·
@MoritzFirsching @AcerFur Do you know if this is already discussed on the lean Zulip? I see print axioms doesn’t help. Does it pass lean4checker? SafeVerify? Comparator?
English
1
0
1
79
Jason Rute
Jason Rute@JasonRute·
@AcerFur I know you are trying to vague post, but is this discussed anywhere? How does the exploit work?
English
1
0
1
405