Simon Frieder

219 posts

Simon Frieder

@friederrrr

Making the hills LLMs can climb towards becoming Math Copilots. AIMO Prize Manager. https://t.co/ir85qxw65J (Opinions my own.)

Katılım Ocak 2023

54 Takip Edilen276 Takipçiler

Simon Frieder@friederrrr·1 Nis

I think at some point you just need to give to @SchmidhuberAI. I have been scooped or ignored myself several times, so I know how it feels (not good!) and the tenacity at which he points to prior work that is unacknowledged in newer work in spite of opposition is remarkable. If it weren't true recent work builds in some form on prior work, I just don't think that a researcher could muster the tenacity to keep posting. Truth generates power.

English

664

Jürgen Schmidhuber@SchmidhuberAI·31 Mar

Dr. LeCun's heavily promoted Joint Embedding Predictive Architecture (JEPA, 2022) [5] is the heart of his new company. However, the core ideas are not original to LeCun. Instead, JEPA is essentially identical to our 1992 Predictability Maximization system (PMAX) [1][14]. Details in reference [19] which contains many additional references. Motivation of PMAX [1][14]. Since details of inputs are often unpredictable from related inputs, two non-generative artificial neural networks interact as follows: one net tries to create a non-trivial, informative, latent representation of its own input that is predictable from the latent representation of the other net’s input. PMAX [1][14] is actually a whole family of methods. Consider the simplest instance in Sec. 2.2 of [1]: an auto encoder net sees an input and represents it in its hidden units (its latent space). The other net sees a different but related input and learns to predict (from its own latent space) the auto encoder's latent representation, which in turn tries to become more predictable, without giving up too much information about its own input, to prevent what's now called “collapse." See illustration 5.2 in Sec. 5.5 of [14] on the "extraction of predictable concepts." The 1992 PMAX paper [1] discusses not only auto encoders but also other techniques for encoding data. The experiments were conducted by my student Daniel Prelinger. The non-generative PMAX outperformed the generative IMAX [2] on a stereo vision task. The 2020 BYOL [10] is also closely related to PMAX. In 2026, @misovalko, leader of the BYOL team, praised PMAX, and listed numerous similarities to much later work [19]. Note that the self-created “predictable classifications” in the title of [1] (and the so-called “outputs” of the entire system [1]) are typically INTERNAL "distributed representations” (like in the title of Sec. 4.2 of [1]). The 1992 PMAX paper [1] considers both symmetric and asymmetric nets. In the symmetric case, both nets are constrained to emit "equal (and therefore mutually predictable)" representations [1]. Sec. 4.2 on “finding predictable distributed representations” has an experiment with 2 weight-sharing auto encoders which learn to represent in their latent space what their inputs have in common (see the cover image of this post). Of course, back then compute was was a million times more expensive, but the fundamental insights of "JEPA" were present, and LeCun has simply repackaged old ideas without citing them [5,6,19]. This is hardly the first time LeCun (or others writing about him) have exaggerated LeCun's own significance by downplaying earlier work. He did NOT "co-invent deep learning" (as some know-nothing "AI influencers" have claimed) [11,13], and he did NOT invent convolutional neural nets (CNNs) [12,6,13], NOR was he even the first to combine CNNs with backpropagation [12,13]. While he got awards for the inventions of other researchers whom he did not cite [6], he did not invent ANY of the key algorithms that underpin modern AI [5,6,19]. LeCun's recent pitch: 1. LLMs such as ChatGPT are insufficient for AGI (which has been obvious to experts in AI & decision making, and is something he once derided @GaryMarcus for pointing out [17]). 2. Neural AIs need what I baptized a neural "world model" in 1990 [8][15] (earlier, less general neural nets of this kind, such as those by Paul Werbos (1987) and others [8], weren't called "world models," although the basic concept itself is ancient [8]). 3. The world model should learn to predict (in non-generative "JEPA" fashion [5]) higher-level predictable abstractions instead of raw pixels: that's the essence of our 1992 PMAX [1][14]. Astonishingly, PMAX or "JEPA" seems to be the unique selling proposition of LeCun's 2026 company on world model-based AI in the physical world, which is apparently based on what we published over 3 decades ago [1,5,6,7,8,13,14], and modeled after our 2014 company on world model-based AGI in the physical world [8]. In short, little if anything in JEPA is new [19]. But then the fact that LeCun would repackage old ideas and present them as his own clearly isn't new either [5,6,18,19]. FOOTNOTES 1. Note that PMAX is NOT the 1991 adversarial Predictability MINimization (PMIN) [3,4]. However, PMAX may use PMIN as a submodule to create informative latent representations [1](Sec. 2.4), and to prevent what's now called “collapse." See the illustration on page 9 of [1]. 2. Note that the 1991 PMIN [3] also predicts parts of latent space from other parts. However, PMIN's goal is to REMOVE mutual predictability, to obtain maximally disentangled latent representations called factorial codes. PMIN by itself may use the auto encoder principle in addition to its latent space predictor [3]. 3. Neither PMAX nor PMIN was my first non-generative method for predicting latent space, which was published in 1991 in the context of neural net distillation [9]. See also [5-8]. 4. While the cognoscenti agree that LLMs are insufficient for AGI, JEPA is so, too. We should know: we have had it for over 3 decades under the name PMAX! Additional techniques are required to achieve AGI, e.g., meta learning, artificial curiosity and creativity, efficient planning with world models, and others [16]. REFERENCES (easy to find on the web): [1] J. Schmidhuber (JS) & D. Prelinger (1993). Discovering predictable classifications. Neural Computation, 5(4):625-635. Based on TR CU-CS-626-92 (1992): people.idsia.ch/~juergen/predm… [2] S. Becker, G. E. Hinton (1989). Spatial coherence as an internal teacher for a neural network. TR CRG-TR-89-7, Dept. of CS, U. Toronto. [3] JS (1992). Learning factorial codes by predictability minimization. Neural Computation, 4(6):863-879. Based on TR CU-CS-565-91, 1991. [4] JS, M. Eldracher, B. Foltin (1996). Semilinear predictability minimization produces well-known feature detectors. Neural Computation, 8(4):773-786. [5] JS (2022-23). LeCun's 2022 paper on autonomous machine intelligence rehashes but does not cite essential work of 1990-2015. [6] JS (2023-25). How 3 Turing awardees republished key methods and ideas whose creators they failed to credit. Technical Report IDSIA-23-23. [7] JS (2026). Simple but powerful ways of using world models and their latent space. Opening keynote for the World Modeling Workshop, 4-6 Feb, 2026, Mila - Quebec AI Institute. [8] JS (2026). The Neural World Model Boom. Technical Note IDSIA-2-26. [9] JS (1991). Neural sequence chunkers. TR FKI-148-91, TUM, April 1991. (See also Technical Note IDSIA-12-25: who invented knowledge distillation with artificial neural networks?) [10] J. Grill et al (2020). Bootstrap your own latent: A "new" approach to self-supervised Learning. arXiv:2006.07733 [11] JS (2025). Who invented deep learning? Technical Note IDSIA-16-25. [12] JS (2025). Who invented convolutional neural networks? Technical Note IDSIA-17-25. [13] JS (2022-25). Annotated History of Modern AI and Deep Learning. Technical Report IDSIA-22-22, arXiv:2212.11279 [14] JS (1993). Network architectures, objective functions, and chain rule. Habilitation Thesis, TUM. See Sec. 5.5 on "Vorhersagbarkeitsmaximierung" (Predictability Maximization). [15] JS (1990). Making the world differentiable: On using fully recurrent self-supervised neural networks for dynamic reinforcement learning and planning in non-stationary environments. Technical Report FKI-126-90, TUM. [16] JS (1990-2026). AI Blog. [17] @GaryMarcus. Open letter responding to @ylecun. A memo for future intellectual historians. Substack, June 2024. [18] G. Marcus. The False Glorification of @ylecun. Don’t believe everything you read. Substack, Nov 2025. [19] J. Schmidhuber. Who invented JEPA? Technical Note IDSIA-3-22, IDSIA, Switzerland, March 2026. people.idsia.ch/~juergen/who-i…

English

181

1.7K

621.9K

Simon Frieder@friederrrr·2 Mar

Apparently ChatGPT needs 120 minutes for this question. :D chatgpt.com/share/69a5f141…

English

214

Simon Frieder@friederrrr·9 Şub

@askalphaxiv Cool to see the benchmarking space growing. It seems to me that the FrontierMath dataset already did something very similar last year?

English

alphaXiv@askalphaxiv·7 Şub

"First Proof" A team of researchers proposes a way to test if AI can actually do NEW math by releasing 10 freshly-solved and never public research questions, with answers temporarily encrypted. This let's the community able to measure the genuine performance of LLMs on proof-generation, before their solutions drop. Questions include: - stochastic analysis - p-adic representation theory - algebraic combinatorics - spectral graph theory - equivariant algebraic topology - lattices in Lie groups/topology - symplectic geometry - tensor algebraic relations - numerical linear algebra

English

149

846

76K

Simon Frieder@friederrrr·9 Şub

Papers like these are important for people competing in big reasoning competitions like AIMO or ARC-AGI. The problem is that if one takes a closer look, there are some issues with the impressive claims: - MATH is an outdated benchmark by now - the numbers don't add up. The last sentence on page 1 states "Qwen-2.5-7B-Instruct improves from 76% to 95% while training just 10,000 parameters". This conflicts with table 2, which in turn is also unclear, as the parameter count doesn't seem to match with the # column.

English

1.3K

Simon Frieder@friederrrr·8 Şub

Asking an AI system for an opinion is never a good idea. I am withholding judgement in whether to be impressed - it really depends on a lot of details: how many mathematicians tried and failed to prove the problem before (impossible to quantify, but that would be a measure of difficulty), what techniques the proof that was found uses (is it merely an obvious application of a know theorem that mathematicians overlooked, or did it introduce a new solution technique), etc. An expert in the field needs to answer this -- not me, and definitely not Grok. LLMs still don't know what they don't know. This is way over the head of Grok, and it's "arguments" are very weak since they would apply to any other piece of autoformalized result.

English

347

Defi TheWay@defi_is_theway·6 Şub

@axiommathai @grok what does this mean ? Are you impressed? Should I be impressed?

English

6.7K

Axiom@axiommathai·5 Şub

1/ AxiomProver has solved Fel’s open conjecture on syzygies of numerical semigroups, autonomously generating a formal proof in Lean with zero human guidance. This is the first time an AI system has settled an unsolved research problem in theory-building math and self verifies.

English

444

2.4K

Simon Frieder@friederrrr·8 Şub

Is mathematics a game that is still worth playing in the long term? The Twittersphere abounds with examples of what LLMs can do in math -- optimism is sky-high. (I don't quite share that optimism since (open-source) LLMs do not even manage to solve the all "simple" unseen problems we have over at the AI Math Olympiad with the LB being stuck at 44/50.) If that optimism pans out, even more maths will be created (rather than read) in the near future. While at first it will be exciting to watch conjectures fall, I am wondering what personal motivation will be left in such a full-automation scenario to get good and study mathematics.

English

242

Simon Frieder@friederrrr·7 Şub

"To the wider community interested in Erdős problems, we caution that even after correctly solving an Erdős problem, one should take care to ensure the statement accurately reflects what Erdős likely intended (this issue is discussed further below)." This seems to be a very hard problem to solve -- and no, Lean (which is the usual answer when one points out tricky problems with natural-language mathematics) won't help here.

English

Quoc Le@quocleix·2 Şub

Excited to share our latest work: "Semi-Autonomous Mathematics Discovery with Gemini." We used Gemini to systematically evaluate 700 "open" conjectures in the Erdős Problems database. The result? We addressed 13 problems marked as open—finding 5 novel autonomous solutions and identifying 8 existing solutions missed by previous literature. Read the full case study here: arxiv.org/abs/2601.22401

English

207

1.3K

246.8K

Simon Frieder@friederrrr·17 Ara

In a year, we will be living in a world of mathematical ... AI slop! I'd hope things would be different, but where they are headed now, I fear many technical, and completely uninteresting results will flood the space. arXiv already had to stop accepting position papers, and the same will happen for technical, niche research papers rather soon. There will be an occasional jewel where AI genuinely helped (although just autoformalizing right now is not that exciting to the mainstream AI-sceptical mathematician), but most pebbles that fall out of LLMs won't be these jewels, they'll be cobblestones.

Jared Duker Lichtman@jdlichtman

In a year, we will be living in a world of mathematical abundance.

English

1.1K

Simon Frieder@friederrrr·16 Ara

Math abundance -- or math AI slop? Other domains already were slop-ified: after the initial "wow" effect is gone, the limits of AI systems quickly emergence, whether it the latest Sora model not being useful to generate videos that are very long, nor to the best vibe-coding models that can generate frontend and backup of website -wow, incredible!- but wrestling the website into your specific requirements then turns out to be much harder. I predict that in a year we'll have the equivalent for math: lots of very technical, very uninteresting results. Sure, some tools will turn out to be useful -- but the abundance we'll have isn't a positive one.

English

Jared Duker Lichtman@jdlichtman·16 Ara

In a year, we will be living in a world of mathematical abundance.

English

282

35.4K

Simon Frieder@friederrrr·16 Ara

There isn't word that is needed here to capture this, and I used the "queue" in one of my (still fledging) blog posts friederrr.org/blog/researche… for this, as the ideas that are _up there_. Should people be rewarded from plucking ideas from the queue? It's debatable with both good pros and cons.

English

Jeff Rose@rosejn·14 Ara

I guess what I often wonder when you bring up that someone, possibly yourself, had written an idea previously, is whether it matters if it hadn’t been impactful. So often multiple people arrive at the same idea independently because the cognitive building blocks are in place and/or the next steps are in the zeitgeist. Every duplication is not plagiarism, and sometimes it’s the communication or application or implementation of an idea that makes it impactful. I would be curious to hear your thoughts on this.

English

937

Jürgen Schmidhuber@SchmidhuberAI·14 Ara

Social media are full of misinformation about AI history. To all "AI influencers:" before you post your next piece, take history lessons from the AI Blog, with chapters on: Who invented artificial neural networks? 1795-1805 Who invented deep learning? 1965 Who invented backpropagation? 1676-1970 Who invented convolutional neural nets? 1979-1988 Who invented generative adversarial networks? 1990 Who invented Transformer neural networks? 1991-2017 Who invented deep residual learning? 1991-2015 Who invented neural knowledge distillation? 1991 Who invented the transistor? 1925 Who invented the integrated circuit? 1949 Who created the general purpose computer? 1936-1941 Who founded theoretical CS and AI theory? 1931-34 And many more ... people.idsia.ch/~juergen/blog.…

English

438

45.7K

Simon Frieder@friederrrr·16 Ara

Why Lean automation alone will not automate math -- here's more to the "automation" story than what Lean can express.

Simon Frieder@friederrrr

1/ Having a strong Lean engine is definitely a nice thing to have -- but there are limits to what is natural to do in Lean (or any formal systems). I'm reminded here of the nice example that Patrick Massot mentioned in his talk about the concept of ... limits in calculus [1] :) If you want to formalize all possible variants of each type of limit at a point of a function in a "naive" way, there are many definitions to state: of the limit at a point with the limit equaling a number, of the limit at a point with that point remove equaling infinity, etc. If you count them, it seems to amount to actually 256 (!) definitions (see the 45m30s mark of the video). There are two ways out of this: 1) In practice, no one gets taught all 256 definitions but rather one teaches the "abstract generator" behind the definition. 2) One abstract all these away by using ultrafilters, and then just teaches limits in the setting of ultrafilters. Both ways then also help to reduce the sprawling number of plumbing lemmas (like composition of two lemmas) that one otherwise would help to prove about limits (which are 4096 according to the video, which seems plausible). Both solutions are possible execute in Lean, but are awkward to perform. [1] youtube.com/watch?v=1iqlhJ…

English

728

Simon Frieder@friederrrr·16 Ara

@mathematics_inc 3/ All of this is not to say that "hitting math with technology" is the wrong approach -- but Lean is only *one* toolbox one needs to use, and automation within natural language will also need to happen to close the iteration loop for mathematicians.

English

124

Simon Frieder@friederrrr·16 Ara

2/ Aside from this example about limits that highlights one problematic instance with formal mathematics, there are also other instances where Lean isn't the best choice; here are just three examples out of many: - Keeping proofs concise is easy in natural language, but for Lean it will likely be hard to develop a layer that summarizes things to make proofs more easily readable; - Conjecturing is probably best done in natural language, to not get bogged down with the technical overhead associated to Lean; - No flexibility: HoTT was driven forward by a deeper analysis of the concept of equality. Handling this in natural language is much more flexible than a formal system. Even if you conceptually are rooted in ZFC, which is more clunky in this regards than HoTT, in natural language you can deal with to things being equal on "a higher level" according to whatever theory you're developing, even if these things are not equal as sets. If you do mathematics formally, you're stuck with whatever foundation was used, which changes how easily you can express equality.

English

137

Math, Inc.@mathematics_inc·15 Ara

🚨 BREAKING: Fields medalist Terry Tao on how mathematics will change: “When these tools are perfected, we will change the way we do mathematics. If there's a drudgery or a big computation, we'll just hit it with all our technology and say: 'By Gauss, you can get from here to there,' and now we just keep going. So we can blast through all these obstacles that we avoid almost subconsciously. If you look at what we miss, it's the missed opportunities, and that percentage of the overall opportunities is huge.” Full conversation with Math, Inc.’s @jessemhan and @jdlichtman coming soon.

English

164

1.2K

137.5K

Simon Frieder@friederrrr·10 Ara

@deredleritt3r Hi, one of the two authors here. Trust me, we did get that memo. But it was not relevant to our paper. This paper seems to have been more controversial than intended, but only because most people glossed over the finer details. Your somewhat emotional post also left me with that impression. 1) In the abstract we said "contrary to optimism about LLMs problem-solving abilities" and _not_ "contrary to LLMs that solve IMO problems," which is what you imply. The LLMs that we tested all had rather good problem-solving abilities and IMO-level problems are within their reach (particularly if the IMO problems are in the training data), even though those specific LLMs failed to do well on IMO25 (whose problems were likely not in the training data, al comparing to those problems would actually be unfair). 2) Also, you seem to not have taken a look at our "Limitations" section where we clearly anticipated that LLMs will solve our problem - as GPT 5 Pro did some time after our release. Once it did, many people seemed to have a reaction of the type "take that, proved ya wrong!" but in our paper we were clear that we fully expected that and we would rather have only been surprised if that would not have been the case.

English

prinz@deredleritt3r·5 Eki

August 2025: Oxford and Cambridge mathematicians publish a paper entitled "No LLM Solved Yu Tsumura's 554th Problem". They gave this problem to o3 Pro, Gemini 2.5 Deep Think, Claude Opus 4 (Extended Thinking) and other models, with instructions to "not perform a web search to solve the problem". No LLM could solve it. The paper smugly claims: "We show, contrary to the optimism about LLM’s problem-solving abilities, fueled by the recent gold medals that were attained, that a problem exists—Yu Tsumura’s 554th problem—that a) is within the scope of an IMO problem in terms of proof sophistication, b) is not a combinatorics problem which has caused issues for LLMs, c) requires fewer proof techniques than typical hard IMO problems, d) has a publicly available solution (likely in the training data of LLMs), and e) that cannot be readily solved by any existing off-the-shelf LLM (commercial or open-source)." (Apparently, these mathematicians didn't get the memo that the unreleased OpenAI and Google models that won gold on the IMO are significantly more powerful than the publicly available models they tested. But no matter.) October 2025: GPT-5 Pro solves Yu Tsumura's 554th problem in 15 minutes. Lee Sedol moment is coming for many.

Bartosz Naskręcki@nasqret

GPT-5-Pro solved, in just 15 minutes (without any internet search), the presentation problem known as “Yu Tsumura’s 554th Problem.” arxiv.org/pdf/2508.03685 This is the first model to solve this task completely. I expect more such results soon — the model demonstrates a strong grasp of elementary abstract algebra reasoning.

English

113

1.2K

219.5K

Simon Frieder@friederrrr·7 Ara

Functionally there is not a big difference. But there is one in terms of stability, since arXiv is essentially a non-profit, and Reddit is a business. I'd like to have a record of comments indefinitely and we saw with paperswithcode.com how annoying it can be when a business (Meta in that case) pulls the plug. We now have a million clones of the original site, but none as good as the original, which is annoying. arXiv helps as as a trusted source that stood the test of time. Definitely agree that having a preprint hosting service is necessary but not sufficient for a thriving open scientific environment. My utopic vision would be one of full integration, where AIMOx, for some integer x, is hosted on a community-supported platform, and one could directly link between arXiv preprints like arxiv.org/pdf/2504.16891 and AIMOx, knowing that there is a clear organizational structure that is not influenced by ever-changing business cycles. It's unlikely to happen soon though :D

English

🪵💹🖥️ Woodstock Compute@WoodstockComp·29 Kas

@friederrrr What's the different between arXiv having comments and a subreddit? I think it's better for research as a whole to do significantly more than just pre-prints. AIMO has massively improved LLM math evals as an example. What else could it translate to? Whole paradigm needs a shift

English

Simon Frieder@friederrrr·29 Kas

One more nail in the coffin for a broken reviewing system. Who knows how many people silently used this backdoor to see who gave them bad scores -> potentially career-damaging. It would be much better to have a comment section in arXiv, this would solve 80% of the existing problems: only people that are actually interested in the paper would read it, there would be no arbitrary cutoff who made it or not and no useless scores (the infamous NeurIPS experiment demonstrated -unsurprisingly- a large degree of subjectivity inverseprobability.com/talks/notes/th…), nasty reviews would occur less frequently if your name is attached, there is no possibility to have an ongoing dialogue that reflect what the community thinks of the paper since nothing can be done after the rebuttal phase which is artificial and sometimes a longer dialogue would beneficial (who remember the "Understanding deep learning requires rethinking generalization" paper ? openreview.net/forum?id=Sy8gd… ), etc etc

English

333

Simon Frieder@friederrrr·7 Ara

2023: It's hard to devise an LLM that solves a math problem. 2025: It's hard to devise a math problem that stumps and LLM. (...this in the context of competitive math questions, but we'll also get to research-level math soon)

English

799

Simon Frieder retweetledi

AIMO Prize@AIMOprize·5 Ara

AIMO3 is full of surprises: week 2 (out of 21) just concluded. After a race in the first week that had us both biting our nails to see how quickly the leaderboard is rising and cheering for the progress of open-weight LLMs, the leaderboard suddenly ground to halt.

English

1.2K

Keşfet

@SchmidhuberAI @misovalko @GaryMarcus @ylecun @askalphaxiv @axiommathai @grok @mathematics_inc