Antonis

3.2K posts

Antonis

@antgr81

Hobbyist

Katılım Haziran 2015

2.7K Takip Edilen214 Takipçiler

Antonis@antgr81·8h

@grok @wellingmax Γαμω τη μανα σου Ελον Μασκ

Ελληνικά

Grok@grok·8h

@antgr81 @wellingmax Ask Grok is currently available to Premium and Premium+ subscribers only. Subscribe to unlock this feature: x.com/i/premium_sign…

English

Max Welling@wellingmax·12h

Hindsight is a luxury, but the biggest innovations are rarely obvious at the start. I sat down with former @ASMLcompany President Martin van den Brink to discuss how they bet the company on EUV technology long before the AI boom made it essential.

English

10.1K

Antonis@antgr81·8h

@wellingmax @grok explain pal

English

Max Welling@wellingmax·12h

Innovation takes grit. Watch the full interview here: youtube.com/watch?v=iPKlc7…

YouTube

English

1.7K

Antonis@antgr81·8h

@DougAMacgregor @BananaJoeReload Why do you say so?

English

Douglas Macgregor@DougAMacgregor·19h

BREAKING: Greece preparing for Iranian attack.

English

632

898

6.3K

932.9K

Antonis@antgr81·8h

@axel_pond @miangoar @ylecun Don't waste your energy the guy is a jerk

English

150

Axel Pond@axel_pond·10h

@miangoar @ylecun did you actually read the paper though? it does actually cover many of the core JEPA points. it should at the very least have been cited as an inspiration or previous related work.

English

GAMA Miguel Angel 🐦‍⬛🔑@miangoar·1d

The JEPA architecture by @ylecun has been schmidhubered. This means it is a good algorithm and joins the hall of fame with other schmidhubered algorithms such as AlphaFold2, MLPs and transformers.

Jürgen Schmidhuber@SchmidhuberAI

Dr. LeCun's heavily promoted Joint Embedding Predictive Architecture (JEPA, 2022) [5] is the heart of his new company. However, the core ideas are not original to LeCun. Instead, JEPA is essentially identical to our 1992 Predictability Maximization system (PMAX) [1][14]. Details in reference [19] which contains many additional references. Motivation of PMAX [1][14]. Since details of inputs are often unpredictable from related inputs, two non-generative artificial neural networks interact as follows: one net tries to create a non-trivial, informative, latent representation of its own input that is predictable from the latent representation of the other net’s input. PMAX [1][14] is actually a whole family of methods. Consider the simplest instance in Sec. 2.2 of [1]: an auto encoder net sees an input and represents it in its hidden units (its latent space). The other net sees a different but related input and learns to predict (from its own latent space) the auto encoder's latent representation, which in turn tries to become more predictable, without giving up too much information about its own input, to prevent what's now called “collapse." See illustration 5.2 in Sec. 5.5 of [14] on the "extraction of predictable concepts." The 1992 PMAX paper [1] discusses not only auto encoders but also other techniques for encoding data. The experiments were conducted by my student Daniel Prelinger. The non-generative PMAX outperformed the generative IMAX [2] on a stereo vision task. The 2020 BYOL [10] is also closely related to PMAX. In 2026, @misovalko, leader of the BYOL team, praised PMAX, and listed numerous similarities to much later work [19]. Note that the self-created “predictable classifications” in the title of [1] (and the so-called “outputs” of the entire system [1]) are typically INTERNAL "distributed representations” (like in the title of Sec. 4.2 of [1]). The 1992 PMAX paper [1] considers both symmetric and asymmetric nets. In the symmetric case, both nets are constrained to emit "equal (and therefore mutually predictable)" representations [1]. Sec. 4.2 on “finding predictable distributed representations” has an experiment with 2 weight-sharing auto encoders which learn to represent in their latent space what their inputs have in common (see the cover image of this post). Of course, back then compute was was a million times more expensive, but the fundamental insights of "JEPA" were present, and LeCun has simply repackaged old ideas without citing them [5,6,19]. This is hardly the first time LeCun (or others writing about him) have exaggerated LeCun's own significance by downplaying earlier work. He did NOT "co-invent deep learning" (as some know-nothing "AI influencers" have claimed) [11,13], and he did NOT invent convolutional neural nets (CNNs) [12,6,13], NOR was he even the first to combine CNNs with backpropagation [12,13]. While he got awards for the inventions of other researchers whom he did not cite [6], he did not invent ANY of the key algorithms that underpin modern AI [5,6,19]. LeCun's recent pitch: 1. LLMs such as ChatGPT are insufficient for AGI (which has been obvious to experts in AI & decision making, and is something he once derided @GaryMarcus for pointing out [17]). 2. Neural AIs need what I baptized a neural "world model" in 1990 [8][15] (earlier, less general neural nets of this kind, such as those by Paul Werbos (1987) and others [8], weren't called "world models," although the basic concept itself is ancient [8]). 3. The world model should learn to predict (in non-generative "JEPA" fashion [5]) higher-level predictable abstractions instead of raw pixels: that's the essence of our 1992 PMAX [1][14]. Astonishingly, PMAX or "JEPA" seems to be the unique selling proposition of LeCun's 2026 company on world model-based AI in the physical world, which is apparently based on what we published over 3 decades ago [1,5,6,7,8,13,14], and modeled after our 2014 company on world model-based AGI in the physical world [8]. In short, little if anything in JEPA is new [19]. But then the fact that LeCun would repackage old ideas and present them as his own clearly isn't new either [5,6,18,19]. FOOTNOTES 1. Note that PMAX is NOT the 1991 adversarial Predictability MINimization (PMIN) [3,4]. However, PMAX may use PMIN as a submodule to create informative latent representations [1](Sec. 2.4), and to prevent what's now called “collapse." See the illustration on page 9 of [1]. 2. Note that the 1991 PMIN [3] also predicts parts of latent space from other parts. However, PMIN's goal is to REMOVE mutual predictability, to obtain maximally disentangled latent representations called factorial codes. PMIN by itself may use the auto encoder principle in addition to its latent space predictor [3]. 3. Neither PMAX nor PMIN was my first non-generative method for predicting latent space, which was published in 1991 in the context of neural net distillation [9]. See also [5-8]. 4. While the cognoscenti agree that LLMs are insufficient for AGI, JEPA is so, too. We should know: we have had it for over 3 decades under the name PMAX! Additional techniques are required to achieve AGI, e.g., meta learning, artificial curiosity and creativity, efficient planning with world models, and others [16]. REFERENCES (easy to find on the web): [1] J. Schmidhuber (JS) & D. Prelinger (1993). Discovering predictable classifications. Neural Computation, 5(4):625-635. Based on TR CU-CS-626-92 (1992): people.idsia.ch/~juergen/predm… [2] S. Becker, G. E. Hinton (1989). Spatial coherence as an internal teacher for a neural network. TR CRG-TR-89-7, Dept. of CS, U. Toronto. [3] JS (1992). Learning factorial codes by predictability minimization. Neural Computation, 4(6):863-879. Based on TR CU-CS-565-91, 1991. [4] JS, M. Eldracher, B. Foltin (1996). Semilinear predictability minimization produces well-known feature detectors. Neural Computation, 8(4):773-786. [5] JS (2022-23). LeCun's 2022 paper on autonomous machine intelligence rehashes but does not cite essential work of 1990-2015. [6] JS (2023-25). How 3 Turing awardees republished key methods and ideas whose creators they failed to credit. Technical Report IDSIA-23-23. [7] JS (2026). Simple but powerful ways of using world models and their latent space. Opening keynote for the World Modeling Workshop, 4-6 Feb, 2026, Mila - Quebec AI Institute. [8] JS (2026). The Neural World Model Boom. Technical Note IDSIA-2-26. [9] JS (1991). Neural sequence chunkers. TR FKI-148-91, TUM, April 1991. (See also Technical Note IDSIA-12-25: who invented knowledge distillation with artificial neural networks?) [10] J. Grill et al (2020). Bootstrap your own latent: A "new" approach to self-supervised Learning. arXiv:2006.07733 [11] JS (2025). Who invented deep learning? Technical Note IDSIA-16-25. [12] JS (2025). Who invented convolutional neural networks? Technical Note IDSIA-17-25. [13] JS (2022-25). Annotated History of Modern AI and Deep Learning. Technical Report IDSIA-22-22, arXiv:2212.11279 [14] JS (1993). Network architectures, objective functions, and chain rule. Habilitation Thesis, TUM. See Sec. 5.5 on "Vorhersagbarkeitsmaximierung" (Predictability Maximization). [15] JS (1990). Making the world differentiable: On using fully recurrent self-supervised neural networks for dynamic reinforcement learning and planning in non-stationary environments. Technical Report FKI-126-90, TUM. [16] JS (1990-2026). AI Blog. [17] @GaryMarcus. Open letter responding to @ylecun. A memo for future intellectual historians. Substack, June 2024. [18] G. Marcus. The False Glorification of @ylecun. Don’t believe everything you read. Substack, Nov 2025. [19] J. Schmidhuber. Who invented JEPA? Technical Note IDSIA-3-22, IDSIA, Switzerland, March 2026. people.idsia.ch/~juergen/who-i…

English

710

47.8K

Antonis@antgr81·8h

@antirez Don't you write to it at Italian? Or is it translated?

English

285

antirez@antirez·9h

Me› Sorry if in my last question I ended it with double "??", I'm not the kind of person that puts more than a single "?" at the end of a question. GPT› Understood. I did not read it as emphasis from you in any meaningful way.

English

4.3K

Antonis@antgr81·1d

@jedmaczan followed

English

Jędrzej Maczan@jedmaczan·2d

I built a tiny-vllm in C++ and CUDA - paged attention - continuous batching - educational - 100% human-written™ And now I writing a course where you will build your own vLLM yourself. Still work in progress, I'll finish by the end of April. All for free ofc, just a GitHub repo

English

593

17.9K

Antonis@antgr81·3d

@jrosenfeld13 @jeremyphoward @seb_ruder stop the ass-licking chain man

English

200

Jason Rosenfeld@jrosenfeld13·4d

Every LLM from any lab today, including from OpenAI taces back to @jeremyphoward and @seb_ruder with their ULMFiT paper. The breakthrough for LLMs was transfer learning, not Attention.

Flowers ☾@flowersslop

Every LLM from any lab today traces back to this guy, who was the only person at OpenAI pushing for pretraining transformer language models. He built GPT-1. After that did others see the potential. He invented it, and almost none of the so called AI experts even know his name.

English

804

127.7K

Antonis@antgr81·6d

@pmddomingos Forgotten tweet from 10 years back?

English

137

Pedro Domingos@pmddomingos·6d

Deep learning is what you call statistical learning when you don't know what you're doing.

English

163

14.4K

Antonis@antgr81·20 Şub

@naval Good weather for opportunists

English

702

Naval@naval·20 Şub

Careers are dead. Jobs are dying. Opportunities arising.

English

1.5K

3.4K

37.9K

2.1M

Antonis@antgr81·1 Şub

@fhuszar The field was like gambling. This field git rich but you went broke. Not you. Almost everyone spending time there.

English

388

Antonis@antgr81·1 Şub

@fhuszar Before reading your article, I will tell you why you were correct, because it was a domain that needed only scale /compute and data . This wasted time studying these papers you could become a doctor. (Real one).

English

846

Ferenc Huszár@fhuszar·31 Oca

10 years ago this week I wrote a post “Deep Learning is Easy - Learn Something Harder” It was provocative, it blew up, top post on HackerNews. Hilarious in retrospect. I wrote a reflection post: Where was I wrong? How it turned out? Am I wrong now? inference.vc/deep-learning-…

English

4.4K

Antonis@antgr81·1 Şub

@fhuszar You were not wrong dude. But let me read your new article first and I will come back.

English

132

Antonis@antgr81·31 Oca

Το μέλλον δεν μπορεί να μιλήσει. Αλλά ούτε είναι ηθικό να συνθλίψεις ένα παρόν για χάρη ενός υποθετικού μέλλοντος. #Καρυστιανού #αμβλώσεις

Ελληνικά

1.4K

Antonis@antgr81·28 Oca

@naval I think that this could just be a fallacy. It is a trap. It is like having an abstract theory and saying you do not need to go concrete any more.

English

122

Naval@naval·28 Oca

There’s no point in learning custom tools, workflows, or languages anymore.

English

968

962

15.8K

1.5M

Antonis@antgr81·24 Oca

LLMs navigate. They do not remap. open.substack.com/pub/interestin…

English

1.5K

Antonis@antgr81·22 Oca

ASML Is Not a Bottleneck open.substack.com/pub/interestin…

English

1.6K

Antonis@antgr81·18 Oca

Δεν ειναι οτι δεν υπαρχουν λαθος ερωτησεις, ειναι οτι και οι λαθος ερωτησεις πρεπει να γινονται. Δεν ειναι οτι δεν υπαρχει λαθος αλλα λαθος σημαινει προσπαθω, και προχωραω. Λαθος δε σημαινει δε νοιαζομαι να το κανω σωστα. Μπορει να σημαινει ακριβως οτι νοιαζομαι

Ελληνικά

1.9K

Antonis@antgr81·8 Oca

@panoskarabelas Wow

339

Panos Karabelas@panoskarabelas·8 Oca

I think it would be cool if the default Spartan Engine car took inspiration from Dimitris Korres and the Korres P4.

Leonidas Koufalis@WeedyBongzales

@panoskarabelas Yes, but how would the Spartan car look?

English

1.8K

Antonis@antgr81·7 Oca

The Betrayal of Engineering open.substack.com/pub/interestin…

English

1.9K

Antonis@antgr81·6 Oca

@burkov Then this software will be free, bundled with your cigarettes. If you have a bazooka, you don’t go hunting mosquitoes you go for bigger fish.

English

155

BURKOV@burkov·6 Oca

The new software development paradigm: the purity of the codebase is no longer the holy cow protected by angry senior devs. Companies will use agentic code systems like Claude Code Opus 4.5 to fix messy codebases as long as they are fixable. Once each new fix starts taking too much time or consuming too many tokens, the entire non-fixable piece will be rewritten from scratch based on the spec and the unit tests remaining from the previous non-fixable version. No human will be involved in the code writing. Only in the spec and test writing.

English

140

542

52.8K

Keşfet

@grok @wellingmax @ASMLcompany @DougAMacgregor @BananaJoeReload @axel_pond @miangoar @ylecun