Philip J. Ball

165 posts

Philip J. Ball banner
Philip J. Ball

Philip J. Ball

@philipjohnball

Research Scientist @GoogleDeepMind: Genie 2 + 3. Not the science writer. Prev @berkeley_ai, @Waymo, @MSFTResearch, @Cambridge_Uni, @UniofOxford.

London, UK Katılım Ekim 2015
692 Takip Edilen1.7K Takipçiler
Philip J. Ball
Philip J. Ball@philipjohnball·
Most exciting Genie 3 result for me so far, the Waymo folk are insanely good to work with:
Google DeepMind@GoogleDeepMind

Genie 3 🤝 @Waymo The Waymo World Model generates photorealistic, interactive environments to train autonomous vehicles. This helps the cars navigate rare, unpredictable events before encountering them in reality. 🧵

English
4
3
158
36.8K
Philip J. Ball retweetledi
Jack Parker-Holder
Jack Parker-Holder@jparkerholder·
World models existed, and were quite well defined, before the recent hype. Here’s a great list of references - the 2018 paper was a huge inspiration for Genie and some of my PhD work. @philipjohnball and I worked on small scale world models in 2019 on a single shared GPU… RIP Marvin
Jürgen Schmidhuber@SchmidhuberAI

World Model Boom The concept of a mental model of the world - a world model - dates back millennia. Aristotle wrote that phantasia or mental images allow humans to imagine the future and to plan action sequences by mentally manipulating images in the absence of the actual objects. Only 2370 years later - a mere blink of an eye by cosmical standards — we are witnessing a boom in world models based on artificial neural networks (NNs) for AI in the physical world. New startups on this are emerging. To explain what's going on, I'll take you on a little journey through the history of general purpose neural world models [WM26] discussed in yesterday's talk for the World Modeling Workshop (Quebec AI Institute, 4 Feb 2026) which is on YouTube [WM26b]. ★ 1990: recurrent NNs as general purpose world models. In 1990, I studied adaptive agents living in partially observable environments where non-trivial kinds of memory are required to act successfully. I used the term world model for a recurrent NN (RNN) that learns to predict the agent's sensory inputs (including pain and reward signals) reflecting the consequences of the actions of a separate controller RNN steering the agent. The controller C used the world model M to plan its action sequences through "rollouts" or mental experiments. Compute was 10 million times more expensive than today. Since RNNs are general purpose computers, this approach went beyond previous, less powerful, feedforward NN-based systems (since 1987) for fully observable environments (Werbos 1987, Munro 1987, Nguyen & Widrow 1989). ★ 1990: artificial curiosity for NNs. In the beginning, my 1990 world model M knew nothing. That's why my 1990 controller C (a generative model with stochastic neurons) was intrinsically motivated through adversarial artificial curiosity to invent action sequences or experiments that yield data from which M can learn something: C simply tried to maximize the prediction error minimized by M. Today, they call this a generative adversarial network (GAN). The 1990 system didn't learn like today's foundation models and large language models (LLMs) by downloading and imitating the web. No, it generated its own self-invented experiments to collect limited but relevant data from the environment, like a physicist, or a baby. It was a simple kind of artificial scientist. ★ March-June 1991: linear Transformers and deep residual learning. The above-mentioned gradient-based RNN world models of 1990 did not work well for long time lags between relevant input events - they were not very deep. To overcome this, my little AI lab at TU Munich came up with various innovations, in the process laying the foundations of today's foundation models and LLMs. We published the first Transformer variants (see the T in ChatGPT) including the now-so-called unnormalized linear Transformer [ULTRA], Pre-training for deep NNs (see the P in ChatGPT), NN distillation (central to the famous 2025 DeepSeek and other LLMs), as well as deep residual learning [VAN1][WHO11] for very deep NNs such as Long Short-Term Memory, the most cited AI of the 20th century, basis of the first LLMs. In fact, as of 2026, the two most frequently cited papers of all time (with the most citations within 3 years - manuals excluded) are directly based on this work of 1991 [MOST26]. Back then, however, it was already totally obvious that LLM-type NNs alone are not enough to achieve Artificial General Intelligence (AGI). No AGI without mastery of the real world! True AGI in the physical world must somehow learn a model of its changing environment, and use the model to plan action sequences that solve its goals. Sure, one can train a foundation model to become a world model M, but additional elements are needed for decision making and planning. In particular, some sort of controller C must learn to use M to achieve its goals. ★ 1991-: reward C for M's improvements, not M's errors. Many things are fundamentally unpredictable by M, e.g., white noise on a screen (the noisy TV problem). To deal with this problem, in 1991, I used M's improvements rather than M's errors as C's intrinsic curiosity reward. In 1995, we used the information gain (optimally since 2011). ★ 1991-: predicting latent space. My NNs also started to predict latent space and hidden units rather than raw pixels. For example, I had a hierarchical architecture for predictive models that learn representations at multiple levels of abstraction and multiple time scales. Here an automatizer NN learns to predict the informative hidden units of a chunker NN, thus collapsing or distilling the chunker's knowledge into the automatizer. This can greatly facilitate downstream deep learning. In 1992, my other combination of two NNs also learned to create informative yet predictable internal representations in latent space. Both NNs saw different but related inputs which they tried to represent internally. For example, the first NN tried to predict the hidden units of an autoencoder NN, which in turn tried to make its hidden units more predictable, while leaving them as informative as possible. This was called Predictability Maximization, complementing my earlier 1991 work on Predictability Minimization: adversarial NNs learning to create informative yet unpredictable internal representations. ★ 1997-: predicting in latent space for reinforcement learning (RL) and control. I applied the above concepts of hidden state prediction to RL, building controllers that follow a self-supervised learning paradigm that produces informative yet predictable internal abstractions of complex spatio-temporal events. Instead of predicting all details of future inputs (e.g., raw pixels), the 1997 system could ask arbitrary abstract questions with computable answers encoded in representation space. It could even focus its attention on small relevant parts of its latent space, and ignore the rest. Two learning, reward-maximizing adversaries called left brain and right brain played a zero-sum game, trying to surprise each other, occasionally betting on different yes/no outcomes of computational experiments, until the outcomes became predictable and boring. Remarkably, this type of self-guided learning and exploration can accelerate external reward intake. ★ Early 2000s: theoretically optimal controllers and universal world models. My postdoc Marcus Hutter, working under my SNF grant at IDSIA, even had a mathematically optimal (yet computationally infeasible) way of learning a world model and exploiting it to plan optimal actions sequences: the famous AIXI model. ★ 2006: Formal theory of fun & creativity. C's intrinsic reward or curiosity reward was redefined as M's compression progress (rather than M's traditional information gain). This led to the "formal theory of fun & creativity." The basic insight was: interestingness is the first derivative of subjective beauty or compressibility (in space and time) of the lifelong sensory input stream, and curiosity & creativity is the drive to maximize it. I think this is the essence of what scientists and artists do. ★ 2014: we founded an AGI company for Physical AI in the real world, based on neural world models [NAI]. It achieved lots of remarkable milestones in collaboration with world-famous companies. Alas, like some of our projects, the company may have been a bit ahead of time, because real world robots and hardware are so challenging. Nevertheless, it's great that in the 2020s, new world model startups have been created! ★ 2015: Planning with spatio-temporal abstractions in world models / RL prompt engineer / chain of thought. The 2015 paper went beyond the inefficient millisecond by millisecond planning of 1990, addressing planning and reasoning in abstract concept spaces and learning to think (including ways of learning to act largely by observation), going beyond our hierarchical neural subgoal generators and planners of 1990-92. The controller C became an RL prompt engineer that learns to create a chain of thought: to speed up RL, C learns to query its world model M for abstract reasoning and decision making. This has become popular. ★ 2018: A 2018 paper finally collapsed C and M into a single One Big Net for everything, using my NN distillation procedure of 1991. Apparently, this is what DeepSeek used to shock the stock market in 2025. And the other 2018 paper with David Ha was the one that finally made world models popular :-) ★ What's next? As compute keeps getting 10 times cheaper every 5 years, the Machine Learning community will combine the puzzle pieces above into one simple, coherent whole, and scale it up. REFERENCES 100+ references in [WM26] based on [WM26b]. Links in the reply! [WM26b] J. Schmidhuber. Simple but powerful ways of using world models and their latent space. Talk at the World Modeling Workshop, Agora, Mila - Quebec AI Institute, 4 Feb 2026. It's on YouTube! [WM26] J. Schmidhuber. The Neural World Model Boom. Technical Note IDSIA-2-26, 4 Feb 2026.

English
2
6
106
11.3K
Philip J. Ball retweetledi
fofr
fofr@fofrAI·
You are a fish, you must escape the kitchen
English
114
302
5.1K
572.2K
Philip J. Ball retweetledi
Tim Sweeney
Tim Sweeney@TimSweeneyEpic·
Genie 3 is amazing. I prompted it to remake Jill of the Jungle (1992 Epic game) in 3D and it did a reasonable job. With more esoteric prompts, it tends to fall back into things it knows; a prompt to recreate ZZT from 1991 made a Blade Runner styled 3D game.
Google@Google

Introducing Project Genie: An experimental research prototype powered by Genie 3, our world model, that lets you prompt an interactive world into existence — and then step inside 🌎

English
112
67
1K
138.3K
Philip J. Ball retweetledi
Demis Hassabis
Demis Hassabis@demishassabis·
Thrilled to launch Project Genie, an experimental prototype of the world's most advanced world model. Create entire playable worlds to explore in real-time just from a simple text prompt - kind of mindblowing really! Available to Ultra subs in the US for now - have fun exploring!
English
381
947
7.9K
962.9K
Philip J. Ball
Philip J. Ball@philipjohnball·
Workflow: Imagen -> Veo 3 -> Genie 3
Nederlands
2
0
30
1.9K
Philip J. Ball retweetledi
Rui Huang
Rui Huang@RuiHuang_art·
I got the @GoogleDeepMind team to test Genie 3 with one of my artworks
English
161
442
6K
903.4K
Philip J. Ball retweetledi
Matt McGill
Matt McGill@MattMcGill_·
Genie 3 for when your Veo clip ends too soon. Imagen -> Veo -> Genie 3.
English
39
124
1.3K
232.8K
Kvltgames - Wishlist Dverghold on Steam!
Does this require a pre-existing environment it uses/is trained on for frame generation, like the Quake 2 demo that is trained on a pre-existing level from the game? Or can it just generate any conceivable level structure on the fly? If not, why wouldn't I just use the original level it is based on?
English
1
0
11
1.4K
Philip J. Ball retweetledi
Jakob Bauer
Jakob Bauer@jkbr_ai·
Something we discovered by accident: what happens if we start Genie 3 from a video and a completely unrelated prompt? Turns out the model really, really wants to make it work, to the point where it emulates itself. The prompt in this one is about a trex on a tropical island.
English
148
328
4.6K
684.6K
Jake
Jake@ExUnoOmnia·
@philipjohnball How do you have access to this already?
English
1
0
8
5.9K