Paul Callaghan

3.6K posts

Paul Callaghan

@paulcc_two

PhD on Evaluation of NLP (MUC-6), 1998. Parsing & ambiguity, Haskell & type theory, Riichi Mahjong. Once lecturer, now happy escapee from organised education.

Katılım Mayıs 2009

598 Takip Edilen328 Takipçiler

Paul Callaghan retweetledi

Jürgen Schmidhuber@SchmidhuberAI·18h

Dr. LeCun's heavily promoted Joint Embedding Predictive Architecture (JEPA, 2022) [5] is the heart of his new company. However, the core ideas are not original to LeCun. Instead, JEPA is essentially identical to our 1992 Predictability Maximization system (PMAX) [1][14]. Details in reference [19] which contains many additional references. Motivation of PMAX [1][14]. Since details of inputs are often unpredictable from related inputs, two non-generative artificial neural networks interact as follows: one net tries to create a non-trivial, informative, latent representation of its own input that is predictable from the latent representation of the other net’s input. PMAX [1][14] is actually a whole family of methods. Consider the simplest instance in Sec. 2.2 of [1]: an auto encoder net sees an input and represents it in its hidden units (its latent space). The other net sees a different but related input and learns to predict (from its own latent space) the auto encoder's latent representation, which in turn tries to become more predictable, without giving up too much information about its own input, to prevent what's now called “collapse." See illustration 5.2 in Sec. 5.5 of [14] on the "extraction of predictable concepts." The 1992 PMAX paper [1] discusses not only auto encoders but also other techniques for encoding data. The experiments were conducted by my student Daniel Prelinger. The non-generative PMAX outperformed the generative IMAX [2] on a stereo vision task. The 2020 BYOL [10] is also closely related to PMAX. In 2026, @misovalko, leader of the BYOL team, praised PMAX, and listed numerous similarities to much later work [19]. Note that the self-created “predictable classifications” in the title of [1] (and the so-called “outputs” of the entire system [1]) are typically INTERNAL "distributed representations” (like in the title of Sec. 4.2 of [1]). The 1992 PMAX paper [1] considers both symmetric and asymmetric nets. In the symmetric case, both nets are constrained to emit "equal (and therefore mutually predictable)" representations [1]. Sec. 4.2 on “finding predictable distributed representations” has an experiment with 2 weight-sharing auto encoders which learn to represent in their latent space what their inputs have in common (see the cover image of this post). Of course, back then compute was was a million times more expensive, but the fundamental insights of "JEPA" were present, and LeCun has simply repackaged old ideas without citing them [5,6,19]. This is hardly the first time LeCun (or others writing about him) have exaggerated LeCun's own significance by downplaying earlier work. He did NOT "co-invent deep learning" (as some know-nothing "AI influencers" have claimed) [11,13], and he did NOT invent convolutional neural nets (CNNs) [12,6,13], NOR was he even the first to combine CNNs with backpropagation [12,13]. While he got awards for the inventions of other researchers whom he did not cite [6], he did not invent ANY of the key algorithms that underpin modern AI [5,6,19]. LeCun's recent pitch: 1. LLMs such as ChatGPT are insufficient for AGI (which has been obvious to experts in AI & decision making, and is something he once derided @GaryMarcus for pointing out [17]). 2. Neural AIs need what I baptized a neural "world model" in 1990 [8][15] (earlier, less general neural nets of this kind, such as those by Paul Werbos (1987) and others [8], weren't called "world models," although the basic concept itself is ancient [8]). 3. The world model should learn to predict (in non-generative "JEPA" fashion [5]) higher-level predictable abstractions instead of raw pixels: that's the essence of our 1992 PMAX [1][14]. Astonishingly, PMAX or "JEPA" seems to be the unique selling proposition of LeCun's 2026 company on world model-based AI in the physical world, which is apparently based on what we published over 3 decades ago [1,5,6,7,8,13,14], and modeled after our 2014 company on world model-based AGI in the physical world [8]. In short, little if anything in JEPA is new [19]. But then the fact that LeCun would repackage old ideas and present them as his own clearly isn't new either [5,6,18,19]. FOOTNOTES 1. Note that PMAX is NOT the 1991 adversarial Predictability MINimization (PMIN) [3,4]. However, PMAX may use PMIN as a submodule to create informative latent representations [1](Sec. 2.4), and to prevent what's now called “collapse." See the illustration on page 9 of [1]. 2. Note that the 1991 PMIN [3] also predicts parts of latent space from other parts. However, PMIN's goal is to REMOVE mutual predictability, to obtain maximally disentangled latent representations called factorial codes. PMIN by itself may use the auto encoder principle in addition to its latent space predictor [3]. 3. Neither PMAX nor PMIN was my first non-generative method for predicting latent space, which was published in 1991 in the context of neural net distillation [9]. See also [5-8]. 4. While the cognoscenti agree that LLMs are insufficient for AGI, JEPA is so, too. We should know: we have had it for over 3 decades under the name PMAX! Additional techniques are required to achieve AGI, e.g., meta learning, artificial curiosity and creativity, efficient planning with world models, and others [16]. REFERENCES (easy to find on the web): [1] J. Schmidhuber (JS) & D. Prelinger (1993). Discovering predictable classifications. Neural Computation, 5(4):625-635. Based on TR CU-CS-626-92 (1992): people.idsia.ch/~juergen/predm… [2] S. Becker, G. E. Hinton (1989). Spatial coherence as an internal teacher for a neural network. TR CRG-TR-89-7, Dept. of CS, U. Toronto. [3] JS (1992). Learning factorial codes by predictability minimization. Neural Computation, 4(6):863-879. Based on TR CU-CS-565-91, 1991. [4] JS, M. Eldracher, B. Foltin (1996). Semilinear predictability minimization produces well-known feature detectors. Neural Computation, 8(4):773-786. [5] JS (2022-23). LeCun's 2022 paper on autonomous machine intelligence rehashes but does not cite essential work of 1990-2015. [6] JS (2023-25). How 3 Turing awardees republished key methods and ideas whose creators they failed to credit. Technical Report IDSIA-23-23. [7] JS (2026). Simple but powerful ways of using world models and their latent space. Opening keynote for the World Modeling Workshop, 4-6 Feb, 2026, Mila - Quebec AI Institute. [8] JS (2026). The Neural World Model Boom. Technical Note IDSIA-2-26. [9] JS (1991). Neural sequence chunkers. TR FKI-148-91, TUM, April 1991. (See also Technical Note IDSIA-12-25: who invented knowledge distillation with artificial neural networks?) [10] J. Grill et al (2020). Bootstrap your own latent: A "new" approach to self-supervised Learning. arXiv:2006.07733 [11] JS (2025). Who invented deep learning? Technical Note IDSIA-16-25. [12] JS (2025). Who invented convolutional neural networks? Technical Note IDSIA-17-25. [13] JS (2022-25). Annotated History of Modern AI and Deep Learning. Technical Report IDSIA-22-22, arXiv:2212.11279 [14] JS (1993). Network architectures, objective functions, and chain rule. Habilitation Thesis, TUM. See Sec. 5.5 on "Vorhersagbarkeitsmaximierung" (Predictability Maximization). [15] JS (1990). Making the world differentiable: On using fully recurrent self-supervised neural networks for dynamic reinforcement learning and planning in non-stationary environments. Technical Report FKI-126-90, TUM. [16] JS (1990-2026). AI Blog. [17] @GaryMarcus. Open letter responding to @ylecun. A memo for future intellectual historians. Substack, June 2024. [18] G. Marcus. The False Glorification of @ylecun. Don’t believe everything you read. Substack, Nov 2025. [19] J. Schmidhuber. Who invented JEPA? Technical Note IDSIA-3-22, IDSIA, Switzerland, March 2026. people.idsia.ch/~juergen/who-i…

English

132

1.2K

203K

Paul Callaghan retweetledi

Dr Kareem Carr@kareem_carr·13 Mar

This actually suggests weakness to me. If they knew how to make money from AI directly, their best move would be to keep it for themselves, but they don't, so instead they're pushing it on us, hoping we'll figure it out for them.

Chief Nerd@TheChiefNerd

🚨 SAM ALTMAN: “We see a future where intelligence is a utility, like electricity or water, and people buy it from us on a meter.”

English

607

77.8K

Paul Callaghan@paulcc_two·7 Mar

@MrEwanMorrison You might enjoy my PhD thesis - on evaluation of NLP systems. Not many people did. I quit the field soon after, partly because of realising the general direction was flawed. I should write a summary of it...

English

Paul Callaghan@paulcc_two·7 Mar

@MrEwanMorrison And likely that the "correct" part is mostly the low-hanging fruit and not the stuff that really matters.

English

Ewan Morrison@MrEwanMorrison·7 Mar

Slop World is worse than AI doomsday. Everything reduced to mediocre mulch, everything factually incorrect by 35%. Nothing original or truthful emerging, just a closed loop of endlessly recycled AI led consumer culture that deteriorates with each repeat. Stop Slop World.

English

172

2.7K

Paul Callaghan@paulcc_two·7 Mar

@MrEwanMorrison In logic, two key properties of a proposed model are soundness & completeness, checking whether the model adds things not in the original, or whether the model omits things in the original. Failing the first makes it unsound. Failing the second makes it incomplete. Failing both?

English

228

Ewan Morrison@MrEwanMorrison·7 Mar

I'm not going to use the word "hallucinate' any more to describe LLM AI. It was a trick to use this word from the start. First, the AI pushers deliberately used an anthropomorphic word to describe machine error. Then when you point out the machine errors, they say, "hey, humans hallucinate too." Then they even have the audacity to say "well, if this AI hallucinates it's proof that it must be sentient!" The acceptance of this trick-word has been an error from the get go. Offer me some alternatives words, please. Accurate and derogatory words are both accepted.

English

492

264

1.7K

58.8K

Paul Callaghan retweetledi

Ed Newton-Rex@ednewtonrex·5 Mar

proud of my daughter’s outfit for world book day she went as the 7 million books pirated by anthropic

English

143

2.8K

Paul Callaghan@paulcc_two·5 Mar

@PaprikaGirl_JP Excellent article - though sobering how common this situation is (almost me as well - though my group leader was the best ever - issues were elsewhere). Life has been much better since I left 🙂

English

Paprika Girl@PaprikaGirl_JP·5 Mar

I have restarted my Wordpress account, and written an answer to the following prompt: "How has a failure set you up for later success?" It is not a story about me, but a good story about my dear husband, which even culminates in a bit of happy revenge. Read it if you like. If you have a blog, please post a link below, because I shall follow. deartokyolovepaprika.wordpress.com/2026/03/05/the…

English

1.8K

Paul Callaghan@paulcc_two·4 Mar

Essential reading, lots of important points

Leonardo de Moura@Leonard41111588

AI is writing a growing share of the world's software. No one is formally verifying any of it. New essay: "When AI Writes the World's Software, Who Verifies It?" leodemoura.github.io/blog/2026/02/2…

English

Paul Callaghan@paulcc_two·3 Mar

@josevalim Notice it's not just about correctness either. It's partly being clearer about your thinking & being able to show some of it in code. Eg. "take" puts your data in a simpler form, & type signature says just that. Also shows when it can be used & what you can rely on for the result

English

Paul Callaghan@paulcc_two·3 Mar

@josevalim Cutting edge dependent type langs can do sthg like "honest_take : xs -> Map(xs + other) -> Map(xs)", indicating you can trim a larger map to a smaller map if the target keys are present. Then it's up to calling sites to justify at compile time that the call is indeed legitimate.

English

221

José Valim@josevalim·3 Mar

Most people treat the static vs dynamic debate as a foregone conclusion, but in practice type systems restrict the expressive power of programming languages in ways that leak complexity to developers. New article: Type systems are leaky abstractions — the case of Map.take!/2

English

198

15.1K

Paul Callaghan retweetledi

Grady Booch@Grady_Booch·27 Şub

Bravo, Anthropic, for drawing the line. Oh, and this reminds that that now it’s time for me to finish filing my claim against you for illegally using every one of my books to train your LLM. Well the good news, I suppose, is that at least you have some moral lines you won’t cross.

Anthropic@AnthropicAI

A statement from Anthropic CEO, Dario Amodei, on our discussions with the Department of War. anthropic.com/news/statement…

English

996

68.2K

Paul Callaghan retweetledi

Gary Marcus@GaryMarcus·26 Şub

My time here been a failure. I tried to get the Twitterverse to wake up before things got bad. Now we are here. Things are bad. And about to get worse. Most people still don’t realize how bad. It’s not that AI is inherently impossible or immoral. It’s that most of the people pushing it don’t give a damn.

English

173

132

1.1K

85.7K

Paul Callaghan@paulcc_two·23 Şub

Slightly enthusiastic book review for “Mahjong: House Rules from Across the Asian Diaspora” by Nicole Wong - it really is that good. free-variable.org/2026/02/23/won…

English

Paul Callaghan@paulcc_two·23 Şub

Considering some new resolutions because of Lunar New Year? Are you in the Newcastle or Durham (UK) area? Try Mahjong - free-variable.org/flyer-2026-2/ #durham #newcastle #mahjong

English

Paul Callaghan retweetledi

Grady Booch@Grady_Booch·16 Şub

existentialcomics.com/comic/641

ZXX

2.7K

Paul Callaghan retweetledi

Pam Ayres MBE@PamAyres·5 Şub

An algorithm’s watching me, I am not certain why, It sends advertisements for things I do not want to buy, It makes me feel uneasy, that upon me it has preyed, And if I could locate it, I would whack it with a spade.

English

196

1.3K

7.7K

131.8K

Paul Callaghan retweetledi

ally@missmayn·27 Oca

how fucking stupid is it that we have all these supposed billionaire geniuses running around and their greatest innovation of our lifetime has been stealing all our data to sell us ads.

English

841

21.5K

186K

2.8M

Paul Callaghan retweetledi

World Riichi (WRC)@WorldRiichi·25 Oca

World Riichi Online Team League Sanma (3 player) is open for registration! Find two friends and join the thrill of 3 player over February and March! worldriichi.org/teamleaguewrot…

English

1.7K

Paul Callaghan retweetledi

Boze Herrington, Library Owl 😴🧙‍♀️@SketchesbyBoze·10 Oca

Young people may not know this but there was a time when the internet had thousands of quirky, informative websites, when you could spend hours browsing and come away smarter rather than dumber. We have seen a wonderful thing destroyed in our lifetimes.

LonelyGoomba@LonelyGoomba

I genuinely believe this is the worst era of the Internet in regards to the user experience.

English

483

14.5K

104.9K

1.7M

Paul Callaghan retweetledi

Talia Ringer 🕊@TaliaRinger·10 Oca

Going to recirculate this now that Hinton's misleading claims about mathematics are circulating. It's important to know the ways in which [formal] mathematics is "like a game," but even more important to know the ways in which it is not

Talia Ringer 🕊@TaliaRinger

New from me, in @Nature: Mathematicians put AI model AlphaProof to the test. A solicited News & Views article about @GoogleDeepMind AlphaProof that was an absolute joy to write! rdcu.be/ePz4Q

English

Keşfet

@misovalko @GaryMarcus @ylecun @MrEwanMorrison @PaprikaGirl_JP @josevalim @elonmusk @BarackObama