Clément

191 posts

Clément

@ClementRebuffel

Quant Researcher @ G-Research. Deep learning apprentice

London, England Katılım Aralık 2019

209 Takip Edilen137 Takipçiler

Clément retweetledi

“paula”@paularambles·3d

they call them crisps there

English

177

1.4K

22.1K

574.3K

Clément retweetledi

Sam Altman@sama·6d

i meant a goblin moment, sorry

English

348

2.8K

345.2K

Clément retweetledi

Arthur Douillard@Ar_Douillard·23 Nis

The DiLoCo team at Google DeepMind and Google Research is proud to release Decoupled DiLoCo, the next frontier for resilient AI pre-training. Decoupled DiLoCo enables training with datacenters across the world, using heterogeneous hardware, and never halting the system despite hardware failures.

GIF

English

606

2.7M

Clément retweetledi

Gappy (Giuseppe Paleologo)@__paleologo·10 Nis

One of the most unexpected byproducts of being on X is that I receive >>0 requests for relationship advice from quants of all genders. It’s definitely an underserved niche. My advice to all: go ahead. Love a quant a little.

English

131

15.6K

Clément retweetledi

Ethan Mollick@emollick·11 Nis

I also fixed Goya's Saturn Devouring His Son. I assume the Black Paintings only existed because Goya did not have access to delicious hot dogs.

English

319

49.7K

Clément retweetledi

Thomas Scialom@ThomasScialom·8 Nis

Glad to see our new Meta Al out in the wild Long way since LLaMA 4 🤣 Next stop: Mythos

Alexandr Wang@alexandr_wang

1/ today we're releasing muse spark, the first model from MSL. nine months ago we rebuilt our ai stack from scratch. new infrastructure, new architecture, new data pipelines. muse spark is the result of that work, and now it powers meta ai. 🧵

English

13.2K

Clément retweetledi

Sama Hoole@SamaHoole·3 Nis

A gorilla has a colon the length of a small motorway. It ferments plant matter for up to 24 hours. It eats all day. Every day. Just to stay alive. It still can't fully digest cellulose. It extracts maybe 30% of the calories from the fibre it consumes. The rest exits approximately as it entered. A cow has four stomachs. Four. It regurgitates its food, chews it a second time, passes it back through, ferments it with specialised bacteria, neutralises the resulting acid, and extracts nutrition from grass that would be entirely indigestible to any primate on earth. Then there's you. You have a stomach the size of a fist, a colon that runs for about five feet, and a digestive transit time of roughly 24-72 hours. You have almost no capacity for fibre fermentation. You have essentially no cellulase. Your gut is optimised for one thing: dense, calorie-rich, rapidly digestible animal protein. You are not built to process plants. But here's the beautiful thing. You don't have to. The cow has already done it. It took the grass, ran it through four stomachs and a rumen of microorganisms, and produced beef. The beef arrives at the other end pre-converted. Bioavailable. Ready. You outsourced the hard part ten thousand years ago when you domesticated ruminants. You are the apex predator that invented a living food processing system. And somewhere, a nutritionist is telling you to eat more fibre.

English

467

2.3K

86.1K

Clément retweetledi

Eliezer Yudkowsky@allTheYud·2 Nis

TIL that Gemini, Claude, and ChatGPT (but not Grok) are told that today is March 32nd, because if you tell LLMs it's April 1st, the conditional text predictions downstream become less reliable for obvious training-dataset reasons.

English

111

4.5K

251.2K

Clément retweetledi

Jürgen Schmidhuber@SchmidhuberAI·31 Mar

Dr. LeCun's heavily promoted Joint Embedding Predictive Architecture (JEPA, 2022) [5] is the heart of his new company. However, the core ideas are not original to LeCun. Instead, JEPA is essentially identical to our 1992 Predictability Maximization system (PMAX) [1][14]. Details in reference [19] which contains many additional references. Motivation of PMAX [1][14]. Since details of inputs are often unpredictable from related inputs, two non-generative artificial neural networks interact as follows: one net tries to create a non-trivial, informative, latent representation of its own input that is predictable from the latent representation of the other net’s input. PMAX [1][14] is actually a whole family of methods. Consider the simplest instance in Sec. 2.2 of [1]: an auto encoder net sees an input and represents it in its hidden units (its latent space). The other net sees a different but related input and learns to predict (from its own latent space) the auto encoder's latent representation, which in turn tries to become more predictable, without giving up too much information about its own input, to prevent what's now called “collapse." See illustration 5.2 in Sec. 5.5 of [14] on the "extraction of predictable concepts." The 1992 PMAX paper [1] discusses not only auto encoders but also other techniques for encoding data. The experiments were conducted by my student Daniel Prelinger. The non-generative PMAX outperformed the generative IMAX [2] on a stereo vision task. The 2020 BYOL [10] is also closely related to PMAX. In 2026, @misovalko, leader of the BYOL team, praised PMAX, and listed numerous similarities to much later work [19]. Note that the self-created “predictable classifications” in the title of [1] (and the so-called “outputs” of the entire system [1]) are typically INTERNAL "distributed representations” (like in the title of Sec. 4.2 of [1]). The 1992 PMAX paper [1] considers both symmetric and asymmetric nets. In the symmetric case, both nets are constrained to emit "equal (and therefore mutually predictable)" representations [1]. Sec. 4.2 on “finding predictable distributed representations” has an experiment with 2 weight-sharing auto encoders which learn to represent in their latent space what their inputs have in common (see the cover image of this post). Of course, back then compute was was a million times more expensive, but the fundamental insights of "JEPA" were present, and LeCun has simply repackaged old ideas without citing them [5,6,19]. This is hardly the first time LeCun (or others writing about him) have exaggerated LeCun's own significance by downplaying earlier work. He did NOT "co-invent deep learning" (as some know-nothing "AI influencers" have claimed) [11,13], and he did NOT invent convolutional neural nets (CNNs) [12,6,13], NOR was he even the first to combine CNNs with backpropagation [12,13]. While he got awards for the inventions of other researchers whom he did not cite [6], he did not invent ANY of the key algorithms that underpin modern AI [5,6,19]. LeCun's recent pitch: 1. LLMs such as ChatGPT are insufficient for AGI (which has been obvious to experts in AI & decision making, and is something he once derided @GaryMarcus for pointing out [17]). 2. Neural AIs need what I baptized a neural "world model" in 1990 [8][15] (earlier, less general neural nets of this kind, such as those by Paul Werbos (1987) and others [8], weren't called "world models," although the basic concept itself is ancient [8]). 3. The world model should learn to predict (in non-generative "JEPA" fashion [5]) higher-level predictable abstractions instead of raw pixels: that's the essence of our 1992 PMAX [1][14]. Astonishingly, PMAX or "JEPA" seems to be the unique selling proposition of LeCun's 2026 company on world model-based AI in the physical world, which is apparently based on what we published over 3 decades ago [1,5,6,7,8,13,14], and modeled after our 2014 company on world model-based AGI in the physical world [8]. In short, little if anything in JEPA is new [19]. But then the fact that LeCun would repackage old ideas and present them as his own clearly isn't new either [5,6,18,19]. FOOTNOTES 1. Note that PMAX is NOT the 1991 adversarial Predictability MINimization (PMIN) [3,4]. However, PMAX may use PMIN as a submodule to create informative latent representations [1](Sec. 2.4), and to prevent what's now called “collapse." See the illustration on page 9 of [1]. 2. Note that the 1991 PMIN [3] also predicts parts of latent space from other parts. However, PMIN's goal is to REMOVE mutual predictability, to obtain maximally disentangled latent representations called factorial codes. PMIN by itself may use the auto encoder principle in addition to its latent space predictor [3]. 3. Neither PMAX nor PMIN was my first non-generative method for predicting latent space, which was published in 1991 in the context of neural net distillation [9]. See also [5-8]. 4. While the cognoscenti agree that LLMs are insufficient for AGI, JEPA is so, too. We should know: we have had it for over 3 decades under the name PMAX! Additional techniques are required to achieve AGI, e.g., meta learning, artificial curiosity and creativity, efficient planning with world models, and others [16]. REFERENCES (easy to find on the web): [1] J. Schmidhuber (JS) & D. Prelinger (1993). Discovering predictable classifications. Neural Computation, 5(4):625-635. Based on TR CU-CS-626-92 (1992): people.idsia.ch/~juergen/predm… [2] S. Becker, G. E. Hinton (1989). Spatial coherence as an internal teacher for a neural network. TR CRG-TR-89-7, Dept. of CS, U. Toronto. [3] JS (1992). Learning factorial codes by predictability minimization. Neural Computation, 4(6):863-879. Based on TR CU-CS-565-91, 1991. [4] JS, M. Eldracher, B. Foltin (1996). Semilinear predictability minimization produces well-known feature detectors. Neural Computation, 8(4):773-786. [5] JS (2022-23). LeCun's 2022 paper on autonomous machine intelligence rehashes but does not cite essential work of 1990-2015. [6] JS (2023-25). How 3 Turing awardees republished key methods and ideas whose creators they failed to credit. Technical Report IDSIA-23-23. [7] JS (2026). Simple but powerful ways of using world models and their latent space. Opening keynote for the World Modeling Workshop, 4-6 Feb, 2026, Mila - Quebec AI Institute. [8] JS (2026). The Neural World Model Boom. Technical Note IDSIA-2-26. [9] JS (1991). Neural sequence chunkers. TR FKI-148-91, TUM, April 1991. (See also Technical Note IDSIA-12-25: who invented knowledge distillation with artificial neural networks?) [10] J. Grill et al (2020). Bootstrap your own latent: A "new" approach to self-supervised Learning. arXiv:2006.07733 [11] JS (2025). Who invented deep learning? Technical Note IDSIA-16-25. [12] JS (2025). Who invented convolutional neural networks? Technical Note IDSIA-17-25. [13] JS (2022-25). Annotated History of Modern AI and Deep Learning. Technical Report IDSIA-22-22, arXiv:2212.11279 [14] JS (1993). Network architectures, objective functions, and chain rule. Habilitation Thesis, TUM. See Sec. 5.5 on "Vorhersagbarkeitsmaximierung" (Predictability Maximization). [15] JS (1990). Making the world differentiable: On using fully recurrent self-supervised neural networks for dynamic reinforcement learning and planning in non-stationary environments. Technical Report FKI-126-90, TUM. [16] JS (1990-2026). AI Blog. [17] @GaryMarcus. Open letter responding to @ylecun. A memo for future intellectual historians. Substack, June 2024. [18] G. Marcus. The False Glorification of @ylecun. Don’t believe everything you read. Substack, Nov 2025. [19] J. Schmidhuber. Who invented JEPA? Technical Note IDSIA-3-22, IDSIA, Switzerland, March 2026. people.idsia.ch/~juergen/who-i…

English

180

1.7K

619.2K

Clément retweetledi

Citrini@citrini·11 Mar

@Srasgon Three out of ten

English

172

24.3K

Clément retweetledi

Thariq@trq212·9 Mar

@maxbittker funny you should say that...

English

1.5K

56.8K

Clément retweetledi

Tri Dao@tri_dao·5 Mar

The FA4 paper is finally out after a year of work. On Blackwell GPUs, attention now goes about as fast as matmul even though the bottlenecks are so different! Tensor cores are now crazy fast that attn fwd is bottlenecked by exponential, and attn bwd is bottlenecked by shared memory bandwidth. Some fun stuff in the redesigned algorithm to overcome these bottlenecks: exponential emulation with polynomials, new online softmax to avoid 90% of softmax rescaling, 2CTA MMA instructions that allow two thread blocks to share operands to reduce smem traffic.

Ted Zadouri@tedzadouri

Asymmetric hardware scaling is here. Blackwell tensor cores are now so fast, exp2 and shared memory are the wall. FlashAttention-4 changes the algorithm & pipeline so that softmax & SMEM bandwidth no longer dictate speed. Attn reaches ~1600 TFLOPs, pretty much at matmul speed! joint work w/ Markus Hoehnerbach, Jay Shah(@ultraproduct), Timmy Liu, Vijay Thakkar (@__tensorcore__ ), Tri Dao (@tri_dao) 1/

English

229

1.8K

188.1K

Clément retweetledi

swyx 🌉@swyx·28 Şub

@pzakin it's chatgpt and claudeai wdym

English

5.4K

Clément retweetledi

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭@elder_plinius·23 Şub

@AnthropicAI it’s only Claude if it’s distilled in the Silicon Valley region of California 😤

English

315

8.7K

595.8K

Clément retweetledi

Theo - t3.gg@theo·7 Şub

To be fair, you have to have a very high IQ to understand Codex 5.3. The behavior is extremely subtle, and without a solid grasp of clean code most of the decisions will go over a typical developer's head. There's also Codex's thorough research, which is deftly woven into it's weights - it's personal philosophy draws heavily from Uncle Bob, for instance. The fans understand this stuff; they have the intellectual capacity to truly appreciate the depths of these behaviors, to realize that they're not just smart- they say something deep about CODE. As a consequence people who dislike Codex truly ARE idiots. I'm smirking right now just imagining one of those addlepated simpletons scratching their heads in confusion as Codex's genius unfolds itself on their computer screens. What fools... how I pity them. 😂 And yes by the way, I DO have a Sam Altman tattoo. And no, you cannot see it. It's for the ladies' eyes only- And even they have to demonstrate that they're within 5 IQ points of my own (preferably lower) beforehand.

English

203

226.5K

Clément retweetledi

Neil Zeghidour@neilzegh·20 Oca

Me defending my O(n^3) solution to the coding interviewer.

English

417

49.4K

Clément retweetledi

tobi lutke@tobi·11 Oca

btw this is a good example of what i meant with "reflexively" reaching for AI. You tinker with AI for a while, and you just reach for this. This was an obvious thing to try when I saw I needed to use windows and was on my mac. You want to train your brain on this intuition.

English

4.1K

343.1K

Clément retweetledi

near@nearcyan·5 Oca

do real things in 2026

Lulu Cheng Meservey@lulumeservey

x.com/i/article/2008…

English

502

80.9K

Clément@ClementRebuffel·25 Ara

Airplane Mode took my top spot this year. #SpotifyWrapped @FrakThePerson keep doing you man, merry Christmas

English

742

Clément retweetledi

Yishan@yishan·25 Ara

If I were to try and find the core issue I think it was that social media was fundamentally designed for human use (you know, it’s social) and once automation and later AI was allowed onto it, it became dystopian. If we could only do ONE thing to make people happier with tech in general I think it would be an industry-wide agreement to preserve social media as a human-only space, with draconian enforcement. Automation and AI can be elsewhere, just not on social media, where humans congregate.

⚡︎@_sorrengailll

My unpopular opinion / Hot Take : I actually do not desire to see technology advance any further.

English

269

90.6K

Keşfet

@misovalko @GaryMarcus @ylecun @Srasgon @maxbittker @pzakin @AnthropicAI @FrakThePerson