Thomas Bloom

446 posts

Thomas Bloom

@thomasfbloom

Royal Society University Research Fellow at the University of Manchester. Mathematician and owner of https://t.co/SWVqqnq9hn. He/him/his.

Manchester, UK Beigetreten Aralık 2020

79 Folgt3K Follower

Thomas Bloom@thomasfbloom·13 Şub

@MatPawluczuk @SkarredGhost This was true in October, but things have moved quickly since then, and there are been a few incidents of AI finding new proofs not in the literature.

English

Mat Pawluczuk@MatPawluczuk·13 Şub

@SkarredGhost LLM that supposedely found solutions to Erdős problems was debunked. It simply found papers that other researchers were unaware of. Were are not there yet but eventually AI might create its own proofs. x.com/thomasfbloom/s…

Thomas Bloom@thomasfbloom

@kevinweil Hi, as the owner/maintainer of erdosproblems.com, this is a dramatic misrepresentation. GPT-5 found references, which solved these problems, that I personally was unaware of. The 'open' status only means I personally am unaware of a paper which solves it.

English

TonyVT SkarredGhost@SkarredGhost·12 Şub

"The race is on to develop an artificial intelligence that can do pure mathematics, and top mathematicians just threw down the gauntlet with an exam of actual, unsolved problems that are relevant to their research. The team is giving AI systems a week to solve the problems [...] “These are brand-new problems that cannot be found in any LLM’s [large language model’s] training data" I'm grabbing popcorns waiting to see how this ends scientificamerican.com/article/mathem… #artificialintelligence #AI

English

395

Thomas Bloom@thomasfbloom·12 Şub

@AcerFur Would be interesting to see the full slides for the talk!

English

665

Acer@AcerFur·12 Şub

I think this is a pretty good summary of the current status

Acer@AcerFur

giving a talk tmrw :p

English

174

13.1K

Thomas Bloom retweetet

Greg Burnham@GregHBurnham·5 Şub

There’s one problem in FrontierMath: Open Problems where GPT-5.2 Pro made a small bit of progress in our early testing. I think it shows the strengths and weaknesses of current models. Story in thread.

English

13.4K

Thomas Bloom@thomasfbloom·5 Şub

@j_dekoninck I'm confused by your description, can you clarify whether AI is used in the extraction of the question from the papers? And the 67% means that GPT produced valid answers to these questions 67% of the time? How was the correctness of these solutions checked?

English

469

Jasper Dekoninck@j_dekoninck·4 Şub

We present ArXivMath, a dynamic final-answer benchmark built from questions scraped from arXiv papers. It currently includes 40 questions, and a new version will be released each month from newly published papers. GPT-5.2 scores best, obtaining 67% for the January problems.

English

191

24.2K

Thomas Bloom@thomasfbloom·2 Şub

@hakunamakunana @littmath @AcerFur I am shocked at that number. Were these graduate students paid to produce new problems and solutions?

English

Mikola@hakunamakunana·2 Şub

@thomasfbloom @littmath Also, to both Thomas, @AcerFur and anyone else - there is an elephant in the room in the novelty discussion. For several years, all main AI-labs run RLHF on an industrial scale. For instance, the last project I was on, contributed 10000+ PhD level 'new' problems with solutions⏬

English

Daniel Litt@littmath·2 Şub

Really nice paper. Aside from the nice results resolving a number of open Erdos problems (or finding solutions in the literature), the paper does a really good job contextualizing significance and prior work.

Acer@AcerFur

Okay looks like I can now talk about Aletheia on the Erdős Problems! arxiv.org/abs/2601.22401…

English

187

15.6K

Thomas Bloom@thomasfbloom·2 Şub

@hakunamakunana @littmath @AcerFur I hadn't considered this, very good point!

English

Thomas Bloom@thomasfbloom·2 Şub

@hakunamakunana @littmath Thanks! Yes, we've seen a lot of unexpected problems when the strange collection of idiosyncrasies and conventions used by mathematicians runs into the glaring light of wider public attention.

English

Mikola@hakunamakunana·2 Şub

@thomasfbloom @littmath Thomas, I understand that, and I think you did a great job! I was following the website well before the AI-solving hype. As it often happens with great initiatives, the problem starts when people who are not the experts take the helm (in this case, the AI-guys or journalists)

English

Thomas Bloom@thomasfbloom·2 Şub

@littmath Definitely - I recommend that anyone interested in AI for maths research read at least the introduction, which expresses these caveats very well, even if they have no interest in Erdos problems.

English

422

Thomas Bloom@thomasfbloom·2 Şub

@hakunamakunana @littmath I do try and avoid adding problems that seem very technical, or where there is an obvious large family of closely related problems, to only include the most representative case. So the final list is certainly bounded (although there are still a few worthy of being added)

English

Mikola@hakunamakunana·2 Şub

@littmath I am saying that “Erdos problems” are not “Hilbert’s problems”, and I don’t mean their difficulty nor importance. I mean that the list of Erdos problems is rather “arbitrary” and problems can be added almost indefinitely, especially if one finds some unpublished letter by E.

English

112

Thomas Bloom@thomasfbloom·2 Şub

@hakunamakunana @littmath Obviously the ones that AI has solved are not very difficult or important, but many of the problems on the site are both. (And it's still interesting to see how AI does on the easier problems first - we can't expect to go from 0 to Hilbert problems overnight!)

English

Thomas Bloom@thomasfbloom·2 Şub

@hakunamakunana @littmath Those with prizes are listed as such on the site, so anyone is free to filter by prize amount to see a list of the "top prize problems" if they wish.

English

Thomas Bloom@thomasfbloom·2 Şub

@pfau @GregHBurnham A starting point would be to browse the 'recent papers' on the Erdos problems front page, to see what's happened in the last couple of years.

English

David Pfau@pfau·2 Şub

@thomasfbloom @GregHBurnham See, I didn't know that. That's really interesting. Like what?

English

David Pfau@pfau·1 Şub

I'm quite certain there will still be unsolved Erdós problems in two years. The difficulty of these problems is almost certainly heavy-tailed, and the AI community gets bored and moves on from a benchmark when it's ~85% solved (see ImageNet, ARC-AGI-1)

English

11K

Thomas Bloom@thomasfbloom·2 Şub

@GregHBurnham @pfau The question is how much AI can help accelerate this process. (Impossible to answer really, since we'd need an alternate timeline without AI to compare to!)

English

Thomas Bloom@thomasfbloom·2 Şub

@GregHBurnham @pfau Yes, it's hard to say! One aspect is that, AI aside, we've seen a lot of breakthrough results and solutions in combinatorics and number theory in the last couple of years (completely human-generated). So this is something of a golden age, perhaps.

English

Thomas Bloom@thomasfbloom·2 Şub

A new blog post by Vjekoslav Kovač, a general survey about Erdős and irrationality problems: erdosproblems.com/forum/thread/b…

Thomas Bloom@thomasfbloom·1 Şub

@octonion Heuristically probably yes, since the chance that each is prime is about 1/n, and the sum of 1/n diverges.

English

348

Christopher D. Long 🇺🇦🏳️‍🌈🌹@octonion·31 Oca

An example of something that may be true, but unprovable. Are there infinitely many primes of the form floor(exp(n)), for n a positive integer?

English

1.3K

Thomas Bloom@thomasfbloom·1 Şub

@cloneofsimo The disproved (lean) status just means that the (dis)proof has been formalised in Lean (sometimes by a human, sometimes by an autoformaliser). It does not imply anything about whether AI generated the proof. (Although in the case of 205, it was indeed AI.)

English

611

Simo Ryu@cloneofsimo·31 Oca

"LLMs didnt prove shit, oh yeah i fact checked, people didnt discuss the proof in the comment" Literally the problem:

Simo Ryu@cloneofsimo

@hakunamakunana My brother in christ I know you are salty but It literally says "disproved (lean)" Are you language model or blind or both?

English

7.3K

Thomas Bloom@thomasfbloom·1 Şub

@cloneofsimo @hakunamakunana Also the proof that AI found was completely different from this alternative approach that combines two known results. So that proof, at least, is novel.

English

Simo Ryu@cloneofsimo·31 Oca

@hakunamakunana Regarding 281, for the final time, it does not "just follow" from the literature. It needs combining of two results of 30 years apart literatures, which is more than nontrivial to do x.com/i/status/20129…

Thomas Bloom@thomasfbloom

@AcerFur This is not quite accurate. KoishiChan explains how this follows from two other results - so the proof is very short given those, but to my knowledge this actual result was not proved before in the literature (although in hindsight Erdos himself could have proved it easily!)

English

118

Simo Ryu@cloneofsimo·31 Oca

This is what, 10th erdos problem solved at this point? Do u see the pattern anon? You never hear other models (gemini, claude, grok, jepa etc) solve Erdos problem. Conjectures are absolutely uncontaminated, true measure of "validation set problem". It really makes you think just how capable GPT 5.2 Pro is compared to others.

Leeham@Liam06972452

Erdős Problem #635 autonomously resolved by GPT-5.2 Pro. The model thought for just 50 mins, outputting a correct proof in Latex, then formalised in Lean by @HarmonicMath's Aristotle. Big thanks to @AcerFur for cleaning up the Lean. Literature review is ongoing.

English

420

79.6K

Thomas Bloom@thomasfbloom·31 Oca

@GregHBurnham Yes absolutely- sorry I meant in terms a proof that is actually achievable. That is, all graphs with >5 are too large to ever describe, but there is some yet-undreamt of limit type object which approximately them, and that we can reason about, and that's how the proof will happen

English

Greg Burnham@GregHBurnham·31 Oca

@thomasfbloom Doesn't De Bruijn–Erdős (which, tbc, I learned about when writing this thread!) imply that, if it's > 5 for the plane, then there's a finite unit distance graph also > 5?

English

Greg Burnham@GregHBurnham·31 Oca

You can see how the Hadwiger–Nelson problem might be verifiable, though. You could settle the problem for good by exhibiting a unit distance graph that requires 7 colors. You could at least rule out the answer of 5 by exhibiting such a graph that requires 6 colors.

English

512

Entdecken

@MatPawluczuk @SkarredGhost @AcerFur @j_dekoninck @hakunamakunana @littmath @pfau @GregHBurnham