Ali Razavi
180 posts

Ali Razavi
@catamorphist
Research Scientist @GoogleDeepmind Working on generative models (Veo, Imagen,...) in the GenMedia team.



SIMA 2 🤝 Genie 3 We tested SIMA 2’s abilities in simulated 3D worlds created by our world model Genie 3. It demonstrated unprecedented adaptability by navigating its surroundings and took meaningful steps toward goals.

Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on! 📘 We’re excited to release 《The Principles of Diffusion Models》— with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon. It traces the core ideas that shaped diffusion modeling and explains how today’s models work, why they work, and where they’re heading. 🧵You’ll find the link and a few highlights in the thread. We’d love to hear your thoughts and join some discussions! ⚡ Stay tuned for our markdown version, where you can drop your comments!


🚨🎬 Big news from Video Arena! @GoogleDeepMind’s latest Veo 3.1 now ranks #1 in both Text-to-Video and Image-to-Video leaderboards. 🏆 This is a +30-point leap from Veo 3.0 → 3.1, making it the first model to break 1400 in Video Arena history! Huge congrats to the @GoogleDeepMind team for pushing the frontier of video generation forward! More details in the thread 🧵

Today we're announcing Gauss, our first autoformalization agent that just completed Terry Tao & Alex Kontorovich's Strong Prime Number Theorem project in 3 weeks—an effort that took human experts 18+ months of partial progress.




Genie 3 is here - it can generate an entire world simulation that you can interact with in real-time, just from a text prompt! It's pretty mind-blowing really when you stop to think about it, and it's rapidly improving - one day we will be able to build the Holodeck for real!

What if you could not only watch a generated video, but explore it too? 🌐 Genie 3 is our groundbreaking world model that creates interactive, playable environments from a single text prompt. From photorealistic landscapes to fantasy realms, the possibilities are endless. 🧵

What if you could not only watch a generated video, but explore it too? 🌐 Genie 3 is our groundbreaking world model that creates interactive, playable environments from a single text prompt. From photorealistic landscapes to fantasy realms, the possibilities are endless. 🧵


Want to be part of a team redefining SOTA for generative video models? Excited about building models that can reach billions of users? The Veo team is hiring! We are looking for amazing researchers and engineers, in North America and Europe. Details below:

Thrilled to share our latest work on SciVid, to appear at #ICCV2025! 🎉 SciVid offers cross-domain evaluation of video models in scientific applications, including medical CV, animal behavior, & weather forecasting 🧪🌍📽️🪰🐭🫀🌦️ #AI4Science #FoundationModel #CV4Science [1/5]🧵







