Sebastian Beyer

1.3K posts

Sebastian Beyer

@BeyerSebastian

Building Sentinel (~€1k MRR) & https://t.co/dLXnaGX8Pe (€55 MRR) • Co-founder https://t.co/X6KxwGA9Fw • Bootstrapping AI products from Vienna

Austria Beigetreten Haziran 2013

190 Folgt61 Follower

Sebastian Beyer@BeyerSebastian·1d

Any good advice on how to give an OpenClaw agent a unique personality?

English

Sebastian Beyer@BeyerSebastian·1d

@mitsuhiko Would you take volunteers for the interview?

English

920

Armin Ronacher ⇌@mitsuhiko·1d

I did 10 calls with people now that shared their agentic coding experience. 7/10 reported non engineers vibeslopping code up. Majority said they moved to re-prompt all those contributions because it became impossible/too time consuming to work with the those PRs.

English

306

27.2K

Sebastian Beyer@BeyerSebastian·2d

„OpenClaw ist von der Architektur her so aufgebaut - wenn das Model etwas macht, das es nicht soll - geht es derart in die Hose, dass du eher chinesische Fachliteratur bekommst, statt eine funktionierende Hotelbuchung."

Deutsch

Sebastian Beyer@BeyerSebastian·2d

Don't get me wrong, I like @openclaw and it is a great project! The following is for my Austrian friends: Die Antwort die @steipete an Armin Wolf geben hätte sollen zu der „falsches Hotel in Paris gebucht“ Frage ist nicht was auch immer er gesagt hat, sondern:

Sebastian Beyer@BeyerSebastian

found the root cause. issue one appears to be model drift in Gemini Flash. Gemini randomly produced chinese text. second problem seems to be an architecture weakness in @openclaw which randomly surfaced the chinese text in a telegram message.

Deutsch

Sebastian Beyer@BeyerSebastian·2d

English

Sebastian Beyer@BeyerSebastian·2d

@steipete This is extremely weird - I suddenly received chinese text from a different user somehow. I have ChatGPT and Codex investigate it. Appears to be a session-state contamination inside @openclaw

English

Sebastian Beyer@BeyerSebastian·3d

@GaryMarcus @SchmidhuberAI @ylecun I heard Yann say LLMs would be an on-ramp to AGI. Oh wait… April Fool‘s!

English

173

Gary Marcus@GaryMarcus·3d

Hey, you are never gonna believe it! @ylecun organized a group phone call with me and @SchmidhuberAI and some of the other people he has ripped off over the years, and *apologized*. He’s a much more mature human being than I had ever realized. Oh wait… April Fool’s!

English

169

21.1K

Sebastian Beyer@BeyerSebastian·3d

@ID_AA_Carmack @ylecun I haven’t look at this research at all, but I am somehow 99.99% confident schmidhuber had this already in the 90s in his lab. @SchmidhuberAI what would you say?

English

John Carmack@ID_AA_Carmack·4d

Paper review: LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels arxiv.org/pdf/2603.19312 Nice clean github: github.com/lucas-maes/le-… This is the application of the LeJEPA results to world models, trained offline on experience from three different robotics style tests with one to two million steps in each dataset. Re-states the benefits of the SigReg loss relative to prior world model approaches. Uses ImageNet standard 224x224 RGB pixel input images with an unmodified ViT-Tiny vision transformer from HuggingFace to generate latents. One extra post-projection step is needed to give SigReg the necessary freedom to perturb the latents into independent gaussians, since ViT ends with a layernorm’d layer. Also tested with ResNet-18, which still performed well, but slightly worse. Uses a 192 dimensional latent. Performance slightly dropped when doubling the latent size to 384; it would be nice to know if it was stable there, or if it continued worsening with excessive latents. There is a relationship between batch size and SIGReg, the larger latent may have improved performance if the batch size was increased. The predictor is implemented as a ViT-S backbone – Why a vision transformer when the latent is flat? Uses a history of 3 sets of latents for two of the benchmarks and 1 for the other. Performance was markedly better with the “small” ViT model than the “tiny”, but the larger “base” model degraded notably, which is interesting. Dropout of 0.1 on the predictor significantly improved performance. 0.2 was still better than 0.0, but 0.5 was worse. Trained with a batch of 128 x 4 trajectories. I wish their training loss graphs were more zoomed in with grid lines. Performs planning at test time instead of building a policy by training in imagination like Dreamer / Diamond. Rolls out 300 initially random sets of actions up to a planning horizon H of 5 (at frame-skip 5). Iterates up to 30 times using the Cross Entropy Method (CEM). The main paper body mentions using Model Predictive Control (MPC) strategy, where only the first K planned actions are executed before replanning, but appendix D says they execute all 5 planned actions. After training, they probe the latent space to demonstrate that it does capture and represent physically meaningful quantities. They also implement a decoder from the latent space back to pixels – not used by the algorithms, but helpful to see what things the latent space is actually representing. They tested incorporating the reconstruction loss into training, but it hurt performance somewhat. They wound up with a 0.1 lambda for SigReg, as opposed to 0.05 in the LeJEPA paper. 1024 sigreg projections, but observe the number has negligible impact I like the JEPA framework, but so far my attempts to use it on Atari games with value functions have not matched my other efforts.

Lucas Maes@lucasmaes_

JEPA are finally easy to train end-to-end without any tricks! Excited to introduce LeWorldModel: a stable, end-to-end JEPA that learns world models directly from pixels, no heuristics. 15M params, 1 GPU, and full planning <1 second. 📑: le-wm.github.io

English

926

196.8K

Sebastian Beyer@BeyerSebastian·3d

@miangoar @ylecun The question is not what was schmidhubered? The question is what was NOT schmidhubered?

English

336

GAMA Miguel Angel 🐦‍⬛🔑@miangoar·4d

The JEPA architecture by @ylecun has been schmidhubered. This means it is a good algorithm and joins the hall of fame with other schmidhubered algorithms such as AlphaFold2, MLPs and transformers.

Jürgen Schmidhuber@SchmidhuberAI

Dr. LeCun's heavily promoted Joint Embedding Predictive Architecture (JEPA, 2022) [5] is the heart of his new company. However, the core ideas are not original to LeCun. Instead, JEPA is essentially identical to our 1992 Predictability Maximization system (PMAX) [1][14]. Details in reference [19] which contains many additional references. Motivation of PMAX [1][14]. Since details of inputs are often unpredictable from related inputs, two non-generative artificial neural networks interact as follows: one net tries to create a non-trivial, informative, latent representation of its own input that is predictable from the latent representation of the other net’s input. PMAX [1][14] is actually a whole family of methods. Consider the simplest instance in Sec. 2.2 of [1]: an auto encoder net sees an input and represents it in its hidden units (its latent space). The other net sees a different but related input and learns to predict (from its own latent space) the auto encoder's latent representation, which in turn tries to become more predictable, without giving up too much information about its own input, to prevent what's now called “collapse." See illustration 5.2 in Sec. 5.5 of [14] on the "extraction of predictable concepts." The 1992 PMAX paper [1] discusses not only auto encoders but also other techniques for encoding data. The experiments were conducted by my student Daniel Prelinger. The non-generative PMAX outperformed the generative IMAX [2] on a stereo vision task. The 2020 BYOL [10] is also closely related to PMAX. In 2026, @misovalko, leader of the BYOL team, praised PMAX, and listed numerous similarities to much later work [19]. Note that the self-created “predictable classifications” in the title of [1] (and the so-called “outputs” of the entire system [1]) are typically INTERNAL "distributed representations” (like in the title of Sec. 4.2 of [1]). The 1992 PMAX paper [1] considers both symmetric and asymmetric nets. In the symmetric case, both nets are constrained to emit "equal (and therefore mutually predictable)" representations [1]. Sec. 4.2 on “finding predictable distributed representations” has an experiment with 2 weight-sharing auto encoders which learn to represent in their latent space what their inputs have in common (see the cover image of this post). Of course, back then compute was was a million times more expensive, but the fundamental insights of "JEPA" were present, and LeCun has simply repackaged old ideas without citing them [5,6,19]. This is hardly the first time LeCun (or others writing about him) have exaggerated LeCun's own significance by downplaying earlier work. He did NOT "co-invent deep learning" (as some know-nothing "AI influencers" have claimed) [11,13], and he did NOT invent convolutional neural nets (CNNs) [12,6,13], NOR was he even the first to combine CNNs with backpropagation [12,13]. While he got awards for the inventions of other researchers whom he did not cite [6], he did not invent ANY of the key algorithms that underpin modern AI [5,6,19]. LeCun's recent pitch: 1. LLMs such as ChatGPT are insufficient for AGI (which has been obvious to experts in AI & decision making, and is something he once derided @GaryMarcus for pointing out [17]). 2. Neural AIs need what I baptized a neural "world model" in 1990 [8][15] (earlier, less general neural nets of this kind, such as those by Paul Werbos (1987) and others [8], weren't called "world models," although the basic concept itself is ancient [8]). 3. The world model should learn to predict (in non-generative "JEPA" fashion [5]) higher-level predictable abstractions instead of raw pixels: that's the essence of our 1992 PMAX [1][14]. Astonishingly, PMAX or "JEPA" seems to be the unique selling proposition of LeCun's 2026 company on world model-based AI in the physical world, which is apparently based on what we published over 3 decades ago [1,5,6,7,8,13,14], and modeled after our 2014 company on world model-based AGI in the physical world [8]. In short, little if anything in JEPA is new [19]. But then the fact that LeCun would repackage old ideas and present them as his own clearly isn't new either [5,6,18,19]. FOOTNOTES 1. Note that PMAX is NOT the 1991 adversarial Predictability MINimization (PMIN) [3,4]. However, PMAX may use PMIN as a submodule to create informative latent representations [1](Sec. 2.4), and to prevent what's now called “collapse." See the illustration on page 9 of [1]. 2. Note that the 1991 PMIN [3] also predicts parts of latent space from other parts. However, PMIN's goal is to REMOVE mutual predictability, to obtain maximally disentangled latent representations called factorial codes. PMIN by itself may use the auto encoder principle in addition to its latent space predictor [3]. 3. Neither PMAX nor PMIN was my first non-generative method for predicting latent space, which was published in 1991 in the context of neural net distillation [9]. See also [5-8]. 4. While the cognoscenti agree that LLMs are insufficient for AGI, JEPA is so, too. We should know: we have had it for over 3 decades under the name PMAX! Additional techniques are required to achieve AGI, e.g., meta learning, artificial curiosity and creativity, efficient planning with world models, and others [16]. REFERENCES (easy to find on the web): [1] J. Schmidhuber (JS) & D. Prelinger (1993). Discovering predictable classifications. Neural Computation, 5(4):625-635. Based on TR CU-CS-626-92 (1992): people.idsia.ch/~juergen/predm… [2] S. Becker, G. E. Hinton (1989). Spatial coherence as an internal teacher for a neural network. TR CRG-TR-89-7, Dept. of CS, U. Toronto. [3] JS (1992). Learning factorial codes by predictability minimization. Neural Computation, 4(6):863-879. Based on TR CU-CS-565-91, 1991. [4] JS, M. Eldracher, B. Foltin (1996). Semilinear predictability minimization produces well-known feature detectors. Neural Computation, 8(4):773-786. [5] JS (2022-23). LeCun's 2022 paper on autonomous machine intelligence rehashes but does not cite essential work of 1990-2015. [6] JS (2023-25). How 3 Turing awardees republished key methods and ideas whose creators they failed to credit. Technical Report IDSIA-23-23. [7] JS (2026). Simple but powerful ways of using world models and their latent space. Opening keynote for the World Modeling Workshop, 4-6 Feb, 2026, Mila - Quebec AI Institute. [8] JS (2026). The Neural World Model Boom. Technical Note IDSIA-2-26. [9] JS (1991). Neural sequence chunkers. TR FKI-148-91, TUM, April 1991. (See also Technical Note IDSIA-12-25: who invented knowledge distillation with artificial neural networks?) [10] J. Grill et al (2020). Bootstrap your own latent: A "new" approach to self-supervised Learning. arXiv:2006.07733 [11] JS (2025). Who invented deep learning? Technical Note IDSIA-16-25. [12] JS (2025). Who invented convolutional neural networks? Technical Note IDSIA-17-25. [13] JS (2022-25). Annotated History of Modern AI and Deep Learning. Technical Report IDSIA-22-22, arXiv:2212.11279 [14] JS (1993). Network architectures, objective functions, and chain rule. Habilitation Thesis, TUM. See Sec. 5.5 on "Vorhersagbarkeitsmaximierung" (Predictability Maximization). [15] JS (1990). Making the world differentiable: On using fully recurrent self-supervised neural networks for dynamic reinforcement learning and planning in non-stationary environments. Technical Report FKI-126-90, TUM. [16] JS (1990-2026). AI Blog. [17] @GaryMarcus. Open letter responding to @ylecun. A memo for future intellectual historians. Substack, June 2024. [18] G. Marcus. The False Glorification of @ylecun. Don’t believe everything you read. Substack, Nov 2025. [19] J. Schmidhuber. Who invented JEPA? Technical Note IDSIA-3-22, IDSIA, Switzerland, March 2026. people.idsia.ch/~juergen/who-i…

English

891

69.3K

Sebastian Beyer@BeyerSebastian·4d

@badlogicgames @martinmose Why would you sell a number, that is irrational…

English

Mario Zechner@badlogicgames·4d

@martinmose want me to sell pi to you?

English

727

Mario Zechner@badlogicgames·4d

chat, is he serious?

English

19.3K

Sebastian Beyer@BeyerSebastian·5d

@mitsuhiko Stimmt

Deutsch

160

Armin Ronacher ⇌@mitsuhiko·5d

By far the coolest thing about LLMs is still that you can use them to connect to people that speak other languages. I can now get great English language transcripts of arabic videos, which was previously a pain in the butt.

English

Sebastian Beyer@BeyerSebastian·5d

@levelsio Is Tim cooked?

English

@levelsio@levelsio·5d

Now my gf's Airpods Pro 3 have no sound on one side (already for weeks intermittently) My iPhone 17 Pro Max just tried to connect an Apple Watch None of us have an Apple Watch 😂 Tim Cook pack your bags 🤡

@levelsio@levelsio

My $2000+ iPhone 17 Pro Max when I try to take a photo just about every day now Then I realized most of my friends have the same problem Tim Cook should resign 🤡

English

124

713

147.5K

Sebastian Beyer@BeyerSebastian·27 Mar

@sickdotdev Coding = security risks Vibe or Trad. Doesn’t matter. It is a risk and must be managed. If managed well, the risk is as low as you want it to be.

English

Sick@sickdotdev·27 Mar

Prove me wrong: Vibe coding = security risks

English

152

102

9.4K

Sebastian Beyer@BeyerSebastian·24 Mar

@steipete @JustinGorya Are you doing any frontend (web) with codex?

English

1.2K

Peter Steinberger 🦞@steipete·24 Mar

@JustinGorya the app's amazing for knowledge work, emails, slack, cal, notion, linear, github, smaller fixes. For deep coding work it's hard to break old habits.

English

168

32.1K

Justin@JustinGorya·24 Mar

BREAKING! Peter is still using CodexCLI instead of the Codex App. He is still on the good side!

Peter Steinberger 🦞@steipete

Pretty much every PR I review: 0) review [codex does it's thing and finds issues] 1) is the issue clear? [if not, trash PR] 2) is this the best possible fix? [95% of the time no] 3) continue discussion, consider tradeoffs, usually rewrite PR Most folks send too localized, small fixes that would end up making the project unmaintainable.

English

19.1K

Sebastian Beyer@BeyerSebastian·24 Mar

@badlogicgames @0xSero Neither can the codex stat

English

455

Mario Zechner@badlogicgames·24 Mar

@0xSero the claude stat can't possibly be right.

English

5.4K