Thomas Bloom

446 posts

Thomas Bloom

Thomas Bloom

@thomasfbloom

Royal Society University Research Fellow at the University of Manchester. Mathematician and owner of https://t.co/SWVqqnq9hn. He/him/his.

Manchester, UK Beigetreten Aralık 2020
79 Folgt3K Follower
Thomas Bloom
Thomas Bloom@thomasfbloom·
@MatPawluczuk @SkarredGhost This was true in October, but things have moved quickly since then, and there are been a few incidents of AI finding new proofs not in the literature.
English
1
0
3
37
Mat Pawluczuk
Mat Pawluczuk@MatPawluczuk·
@SkarredGhost LLM that supposedely found solutions to Erdős problems was debunked. It simply found papers that other researchers were unaware of. Were are not there yet but eventually AI might create its own proofs. x.com/thomasfbloom/s…
Thomas Bloom@thomasfbloom

@kevinweil Hi, as the owner/maintainer of erdosproblems.com, this is a dramatic misrepresentation. GPT-5 found references, which solved these problems, that I personally was unaware of. The 'open' status only means I personally am unaware of a paper which solves it.

English
1
0
1
45
TonyVT SkarredGhost
TonyVT SkarredGhost@SkarredGhost·
"The race is on to develop an artificial intelligence that can do pure mathematics, and top mathematicians just threw down the gauntlet with an exam of actual, unsolved problems that are relevant to their research. The team is giving AI systems a week to solve the problems [...] “These are brand-new problems that cannot be found in any LLM’s [large language model’s] training data" I'm grabbing popcorns waiting to see how this ends scientificamerican.com/article/mathem… #artificialintelligence #AI
English
2
1
9
395
Thomas Bloom
Thomas Bloom@thomasfbloom·
@AcerFur Would be interesting to see the full slides for the talk!
English
1
1
9
665
Thomas Bloom retweetet
Greg Burnham
Greg Burnham@GregHBurnham·
There’s one problem in FrontierMath: Open Problems where GPT-5.2 Pro made a small bit of progress in our early testing. I think it shows the strengths and weaknesses of current models. Story in thread.
Greg Burnham tweet media
English
3
8
59
13.4K
Thomas Bloom
Thomas Bloom@thomasfbloom·
@j_dekoninck I'm confused by your description, can you clarify whether AI is used in the extraction of the question from the papers? And the 67% means that GPT produced valid answers to these questions 67% of the time? How was the correctness of these solutions checked?
English
1
0
6
469
Jasper Dekoninck
Jasper Dekoninck@j_dekoninck·
We present ArXivMath, a dynamic final-answer benchmark built from questions scraped from arXiv papers. It currently includes 40 questions, and a new version will be released each month from newly published papers. GPT-5.2 scores best, obtaining 67% for the January problems.
Jasper Dekoninck tweet media
English
11
25
191
24.2K
Mikola
Mikola@hakunamakunana·
@thomasfbloom @littmath Also, to both Thomas, @AcerFur and anyone else - there is an elephant in the room in the novelty discussion. For several years, all main AI-labs run RLHF on an industrial scale. For instance, the last project I was on, contributed 10000+ PhD level 'new' problems with solutions⏬
English
3
0
1
77
Thomas Bloom
Thomas Bloom@thomasfbloom·
@hakunamakunana @littmath Thanks! Yes, we've seen a lot of unexpected problems when the strange collection of idiosyncrasies and conventions used by mathematicians runs into the glaring light of wider public attention.
English
1
0
2
64
Mikola
Mikola@hakunamakunana·
@thomasfbloom @littmath Thomas, I understand that, and I think you did a great job! I was following the website well before the AI-solving hype. As it often happens with great initiatives, the problem starts when people who are not the experts take the helm (in this case, the AI-guys or journalists)
English
2
0
1
64
Thomas Bloom
Thomas Bloom@thomasfbloom·
@littmath Definitely - I recommend that anyone interested in AI for maths research read at least the introduction, which expresses these caveats very well, even if they have no interest in Erdos problems.
English
0
0
23
422
Thomas Bloom
Thomas Bloom@thomasfbloom·
@hakunamakunana @littmath I do try and avoid adding problems that seem very technical, or where there is an obvious large family of closely related problems, to only include the most representative case. So the final list is certainly bounded (although there are still a few worthy of being added)
English
1
0
2
87
Mikola
Mikola@hakunamakunana·
@littmath I am saying that “Erdos problems” are not “Hilbert’s problems”, and I don’t mean their difficulty nor importance. I mean that the list of Erdos problems is rather “arbitrary” and problems can be added almost indefinitely, especially if one finds some unpublished letter by E.
English
2
0
0
112
Thomas Bloom
Thomas Bloom@thomasfbloom·
@hakunamakunana @littmath Obviously the ones that AI has solved are not very difficult or important, but many of the problems on the site are both. (And it's still interesting to see how AI does on the easier problems first - we can't expect to go from 0 to Hilbert problems overnight!)
English
0
0
3
20
Thomas Bloom
Thomas Bloom@thomasfbloom·
@hakunamakunana @littmath Those with prizes are listed as such on the site, so anyone is free to filter by prize amount to see a list of the "top prize problems" if they wish.
English
1
0
3
21
Thomas Bloom
Thomas Bloom@thomasfbloom·
@pfau @GregHBurnham A starting point would be to browse the 'recent papers' on the Erdos problems front page, to see what's happened in the last couple of years.
English
0
0
1
61
David Pfau
David Pfau@pfau·
I'm quite certain there will still be unsolved Erdós problems in two years. The difficulty of these problems is almost certainly heavy-tailed, and the AI community gets bored and moves on from a benchmark when it's ~85% solved (see ImageNet, ARC-AGI-1)
English
7
1
73
11K
Thomas Bloom
Thomas Bloom@thomasfbloom·
@GregHBurnham @pfau The question is how much AI can help accelerate this process. (Impossible to answer really, since we'd need an alternate timeline without AI to compare to!)
English
0
0
1
46
Thomas Bloom
Thomas Bloom@thomasfbloom·
@GregHBurnham @pfau Yes, it's hard to say! One aspect is that, AI aside, we've seen a lot of breakthrough results and solutions in combinatorics and number theory in the last couple of years (completely human-generated). So this is something of a golden age, perhaps.
English
2
0
1
74
Thomas Bloom
Thomas Bloom@thomasfbloom·
@octonion Heuristically probably yes, since the chance that each is prime is about 1/n, and the sum of 1/n diverges.
English
0
0
5
348
Thomas Bloom
Thomas Bloom@thomasfbloom·
@cloneofsimo The disproved (lean) status just means that the (dis)proof has been formalised in Lean (sometimes by a human, sometimes by an autoformaliser). It does not imply anything about whether AI generated the proof. (Although in the case of 205, it was indeed AI.)
English
1
2
20
611
Simo Ryu
Simo Ryu@cloneofsimo·
"LLMs didnt prove shit, oh yeah i fact checked, people didnt discuss the proof in the comment" Literally the problem:
Simo Ryu@cloneofsimo

@hakunamakunana My brother in christ I know you are salty but It literally says "disproved (lean)" Are you language model or blind or both?

English
1
2
23
7.3K
Thomas Bloom
Thomas Bloom@thomasfbloom·
@cloneofsimo @hakunamakunana Also the proof that AI found was completely different from this alternative approach that combines two known results. So that proof, at least, is novel.
English
1
0
0
67
Simo Ryu
Simo Ryu@cloneofsimo·
@hakunamakunana Regarding 281, for the final time, it does not "just follow" from the literature. It needs combining of two results of 30 years apart literatures, which is more than nontrivial to do x.com/i/status/20129…
Thomas Bloom@thomasfbloom

@AcerFur This is not quite accurate. KoishiChan explains how this follows from two other results - so the proof is very short given those, but to my knowledge this actual result was not proved before in the literature (although in hindsight Erdos himself could have proved it easily!)

English
2
0
0
118
Simo Ryu
Simo Ryu@cloneofsimo·
This is what, 10th erdos problem solved at this point? Do u see the pattern anon? You never hear other models (gemini, claude, grok, jepa etc) solve Erdos problem. Conjectures are absolutely uncontaminated, true measure of "validation set problem". It really makes you think just how capable GPT 5.2 Pro is compared to others.
Leeham@Liam06972452

Erdős Problem #635 autonomously resolved by GPT-5.2 Pro. The model thought for just 50 mins, outputting a correct proof in Latex, then formalised in Lean by @HarmonicMath's Aristotle. Big thanks to @AcerFur for cleaning up the Lean. Literature review is ongoing.

English
16
14
420
79.6K
Thomas Bloom
Thomas Bloom@thomasfbloom·
@GregHBurnham Yes absolutely- sorry I meant in terms a proof that is actually achievable. That is, all graphs with >5 are too large to ever describe, but there is some yet-undreamt of limit type object which approximately them, and that we can reason about, and that's how the proof will happen
English
0
0
2
39
Greg Burnham
Greg Burnham@GregHBurnham·
@thomasfbloom Doesn't De Bruijn–Erdős (which, tbc, I learned about when writing this thread!) imply that, if it's > 5 for the plane, then there's a finite unit distance graph also > 5?
English
2
0
0
71
Greg Burnham
Greg Burnham@GregHBurnham·
You can see how the Hadwiger–Nelson problem might be verifiable, though. You could settle the problem for good by exhibiting a unit distance graph that requires 7 colors. You could at least rule out the answer of 5 by exhibiting such a graph that requires 6 colors.
English
1
0
2
512