Mohamad H. Danesh

573 posts

Mohamad H. Danesh banner
Mohamad H. Danesh

Mohamad H. Danesh

@mo_danesh

CS PhD @McGillU and @mila_quebec, working on 🍒 and 🤖 stuff / ex- @LetsUnifyAI, @NUSComputing, @EngineeringOSU

Montréal, Québec Se unió Şubat 2017
759 Siguiendo239 Seguidores
Mohamad H. Danesh
Mohamad H. Danesh@mo_danesh·
🚀Contraction dampens solver errors and reduces unwanted action variance with negligible compute cost. 🎯The result? Huge boosts in offline learning and high reliability on physical hardware!
English
1
0
0
28
Mohamad H. Danesh
Mohamad H. Danesh@mo_danesh·
📢Late update: Contractive Diffusion Policies is accepted to #ICLR2026! 🎉 🤔How do we reduce compounding errors in offline diffusion policies? 📋Enforcing contractive flows mitigates solver errors & boosts offline learning performance, especially with limited data.
Mohamad H. Danesh tweet media
English
1
2
7
354
Mohamad H. Danesh
Mohamad H. Danesh@mo_danesh·
@ChongZitaZhang Based on experience, watch out for the PhysX step size. If the RBF is too sharp, it might trigger some crazy instabilities in the upper body solver.
English
1
0
1
43
C Zhang
C Zhang@ChongZitaZhang·
@mo_danesh actually no extension. I just added a rbf term to upper body qpos. but I might have some wrong definition in the code.
English
1
0
0
69
Mohamad H. Danesh retuiteado
Amir-massoud Farahmand
Amir-massoud Farahmand@SoloGen·
Ali Khamenei is in hell. The world is a better place now!
English
3
2
92
5.2K
Mohamad H. Danesh retuiteado
Glen Berseth
Glen Berseth@GlenBerseth·
🚀 Montreal #Robotics Summer School 2026 — Applications Now Open! 📅 Dates: August 2–7, 2026 📍 Location: @Mila_Quebec (in person) 🤖 Program: 1 day of activities + 5 days of intensive Robotics & AI training (theory and hands-on)
English
3
6
44
3.9K
Mohamad H. Danesh retuiteado
Jürgen Schmidhuber
Jürgen Schmidhuber@SchmidhuberAI·
World Model Boom The concept of a mental model of the world - a world model - dates back millennia. Aristotle wrote that phantasia or mental images allow humans to imagine the future and to plan action sequences by mentally manipulating images in the absence of the actual objects. Only 2370 years later - a mere blink of an eye by cosmical standards — we are witnessing a boom in world models based on artificial neural networks (NNs) for AI in the physical world. New startups on this are emerging. To explain what's going on, I'll take you on a little journey through the history of general purpose neural world models [WM26] discussed in yesterday's talk for the World Modeling Workshop (Quebec AI Institute, 4 Feb 2026) which is on YouTube [WM26b]. ★ 1990: recurrent NNs as general purpose world models. In 1990, I studied adaptive agents living in partially observable environments where non-trivial kinds of memory are required to act successfully. I used the term world model for a recurrent NN (RNN) that learns to predict the agent's sensory inputs (including pain and reward signals) reflecting the consequences of the actions of a separate controller RNN steering the agent. The controller C used the world model M to plan its action sequences through "rollouts" or mental experiments. Compute was 10 million times more expensive than today. Since RNNs are general purpose computers, this approach went beyond previous, less powerful, feedforward NN-based systems (since 1987) for fully observable environments (Werbos 1987, Munro 1987, Nguyen & Widrow 1989). ★ 1990: artificial curiosity for NNs. In the beginning, my 1990 world model M knew nothing. That's why my 1990 controller C (a generative model with stochastic neurons) was intrinsically motivated through adversarial artificial curiosity to invent action sequences or experiments that yield data from which M can learn something: C simply tried to maximize the prediction error minimized by M. Today, they call this a generative adversarial network (GAN). The 1990 system didn't learn like today's foundation models and large language models (LLMs) by downloading and imitating the web. No, it generated its own self-invented experiments to collect limited but relevant data from the environment, like a physicist, or a baby. It was a simple kind of artificial scientist. ★ March-June 1991: linear Transformers and deep residual learning. The above-mentioned gradient-based RNN world models of 1990 did not work well for long time lags between relevant input events - they were not very deep. To overcome this, my little AI lab at TU Munich came up with various innovations, in the process laying the foundations of today's foundation models and LLMs. We published the first Transformer variants (see the T in ChatGPT) including the now-so-called unnormalized linear Transformer [ULTRA], Pre-training for deep NNs (see the P in ChatGPT), NN distillation (central to the famous 2025 DeepSeek and other LLMs), as well as deep residual learning [VAN1][WHO11] for very deep NNs such as Long Short-Term Memory, the most cited AI of the 20th century, basis of the first LLMs. In fact, as of 2026, the two most frequently cited papers of all time (with the most citations within 3 years - manuals excluded) are directly based on this work of 1991 [MOST26]. Back then, however, it was already totally obvious that LLM-type NNs alone are not enough to achieve Artificial General Intelligence (AGI). No AGI without mastery of the real world! True AGI in the physical world must somehow learn a model of its changing environment, and use the model to plan action sequences that solve its goals. Sure, one can train a foundation model to become a world model M, but additional elements are needed for decision making and planning. In particular, some sort of controller C must learn to use M to achieve its goals. ★ 1991-: reward C for M's improvements, not M's errors. Many things are fundamentally unpredictable by M, e.g., white noise on a screen (the noisy TV problem). To deal with this problem, in 1991, I used M's improvements rather than M's errors as C's intrinsic curiosity reward. In 1995, we used the information gain (optimally since 2011). ★ 1991-: predicting latent space. My NNs also started to predict latent space and hidden units rather than raw pixels. For example, I had a hierarchical architecture for predictive models that learn representations at multiple levels of abstraction and multiple time scales. Here an automatizer NN learns to predict the informative hidden units of a chunker NN, thus collapsing or distilling the chunker's knowledge into the automatizer. This can greatly facilitate downstream deep learning. In 1992, my other combination of two NNs also learned to create informative yet predictable internal representations in latent space. Both NNs saw different but related inputs which they tried to represent internally. For example, the first NN tried to predict the hidden units of an autoencoder NN, which in turn tried to make its hidden units more predictable, while leaving them as informative as possible. This was called Predictability Maximization, complementing my earlier 1991 work on Predictability Minimization: adversarial NNs learning to create informative yet unpredictable internal representations. ★ 1997-: predicting in latent space for reinforcement learning (RL) and control. I applied the above concepts of hidden state prediction to RL, building controllers that follow a self-supervised learning paradigm that produces informative yet predictable internal abstractions of complex spatio-temporal events. Instead of predicting all details of future inputs (e.g., raw pixels), the 1997 system could ask arbitrary abstract questions with computable answers encoded in representation space. It could even focus its attention on small relevant parts of its latent space, and ignore the rest. Two learning, reward-maximizing adversaries called left brain and right brain played a zero-sum game, trying to surprise each other, occasionally betting on different yes/no outcomes of computational experiments, until the outcomes became predictable and boring. Remarkably, this type of self-guided learning and exploration can accelerate external reward intake. ★ Early 2000s: theoretically optimal controllers and universal world models. My postdoc Marcus Hutter, working under my SNF grant at IDSIA, even had a mathematically optimal (yet computationally infeasible) way of learning a world model and exploiting it to plan optimal actions sequences: the famous AIXI model. ★ 2006: Formal theory of fun & creativity. C's intrinsic reward or curiosity reward was redefined as M's compression progress (rather than M's traditional information gain). This led to the "formal theory of fun & creativity." The basic insight was: interestingness is the first derivative of subjective beauty or compressibility (in space and time) of the lifelong sensory input stream, and curiosity & creativity is the drive to maximize it. I think this is the essence of what scientists and artists do. ★ 2014: we founded an AGI company for Physical AI in the real world, based on neural world models [NAI]. It achieved lots of remarkable milestones in collaboration with world-famous companies. Alas, like some of our projects, the company may have been a bit ahead of time, because real world robots and hardware are so challenging. Nevertheless, it's great that in the 2020s, new world model startups have been created! ★ 2015: Planning with spatio-temporal abstractions in world models / RL prompt engineer / chain of thought. The 2015 paper went beyond the inefficient millisecond by millisecond planning of 1990, addressing planning and reasoning in abstract concept spaces and learning to think (including ways of learning to act largely by observation), going beyond our hierarchical neural subgoal generators and planners of 1990-92. The controller C became an RL prompt engineer that learns to create a chain of thought: to speed up RL, C learns to query its world model M for abstract reasoning and decision making. This has become popular. ★ 2018: A 2018 paper finally collapsed C and M into a single One Big Net for everything, using my NN distillation procedure of 1991. Apparently, this is what DeepSeek used to shock the stock market in 2025. And the other 2018 paper with David Ha was the one that finally made world models popular :-) ★ What's next? As compute keeps getting 10 times cheaper every 5 years, the Machine Learning community will combine the puzzle pieces above into one simple, coherent whole, and scale it up. REFERENCES 100+ references in [WM26] based on [WM26b]. Links in the reply! [WM26b] J. Schmidhuber. Simple but powerful ways of using world models and their latent space. Talk at the World Modeling Workshop, Agora, Mila - Quebec AI Institute, 4 Feb 2026. It's on YouTube! [WM26] J. Schmidhuber. The Neural World Model Boom. Technical Note IDSIA-2-26, 4 Feb 2026.
English
17
81
458
65.2K
Mohamad H. Danesh retuiteado
Charlie Hebdo
Charlie Hebdo@Charlie_Hebdo_·
Le dessin du jour, par #Félix
Charlie Hebdo tweet media
Français
1K
11.8K
42.2K
1.4M
Mohamad H. Danesh retuiteado
Kai Arulkumaran
Kai Arulkumaran@kaixhin·
One of the greatest RL researchers ever, David Silver, just left DeepMind to make his own startup. Would be amazing to see what kind of RL research he has planned there.
English
5
3
166
13.2K
Mohamad H. Danesh retuiteado
Chenhao Li
Chenhao Li@breadli428·
🌎World models can predict, but controlling real robots from imagination sees a long-standing failure due to hallucination. 🧠Introducing Uncertainty-Aware RWM: a black-box, end-to-end neural dynamics model with long-horizon uncertainty propagation. 🎯sites.google.com/view/uncertain…
English
7
39
258
42.1K
Mohamad H. Danesh retuiteado
Saeed
Saeed@GreenWithE·
Iranians will free themselves from this Islamic tyranny, w/ or w/o outside help, as they have many times in history. The difference is tens of thousands of lives that could be saved; and the world’s silence in the face of horrendous human rights violations. #IranRevolution
English
1
1
2
64
Mohamad H. Danesh retuiteado
Saeed
Saeed@GreenWithE·
A sea of blood stands between the brave Iranian people and the coward murderous Islamic regime. We will not let our patriots’ blood be in vain. From their blood, victory roses will rise. ✌🏾 #IranRevoIution2026
English
0
1
3
38
Chenhao Li
Chenhao Li@breadli428·
@mo_danesh With same hyperparameters -> comparable. But RWM can train on real data. Then that’s a different story.
English
1
0
0
39
Mohamad H. Danesh retuiteado
SZaman
SZaman@szamanzadeh·
To put the scale in perspective: Estimates are 10–15k civilians killed in the Russia–Ukraine war over 3 years. The IRGC killed a similar number in its own country in just THREE DAYS! An unprecedented massacre in response to a peaceful protest in modern history.
English
14
403
983
19.4K