Marco Pavone

241 posts

Marco Pavone

@drmapavone

Prof @Stanford, Distinguished Research Scientist and AV research lead @nvidia. PhD from @MITAeroAstro. Robotics, autonomous systems, AI. Opinions are my own.

Stanford, CA USA Katılım Kasım 2018

67 Takip Edilen5.3K Takipçiler

Marco Pavone@drmapavone·30 Mar

A central challenge in #physical #AI is data scarcity: vision-language-action (#VLA) models are fundamentally limited by the availability of high-quality robotics demonstrations. In our recent work, we introduce R&B-EnCoRe (arxiv.org/pdf/2602.08167), a framework that enables models to self-bootstrap embodied #reasoning by leveraging synthetic visuo-textual data together with limited embodiment-specific experience. In essence, R&B-EnCoRe allows models to learn how to reason in an embodied setting. Our approach treats reasoning as a latent variable and uses self-supervised refinement to learn reasoning strategies that are directly predictive of successful control—without human annotations, reward engineering, or external verifiers. We validate the approach across a range of embodiments—including manipulation, navigation, and autonomous driving—and across model scales from 1B to 30B parameters, observing consistent improvements: 💪 +28% task success in real-world manipulation 🦿 +101% score in legged locomotion navigation 🚗 −21% collision rate in autonomous driving Overall, this work highlights a promising direction: aligning internet-scale priors with embodiment-specific data to enable scalable, self-improving physical intelligence. Kudos to an amazing team: Milan Ganai Katie Luo @JonasFrey96 Clark Barrett 🌐 Website: milanganai.github.io/rnb-encore/ 📄 Paper: arxiv.org/pdf/2602.08167

English

5.3K

Marco Pavone@drmapavone·26 Mar

Excitingly, @nvidia #Alpamayo 1.5 is now available within Autoware: github.com/autowarefounda… Grateful to @ShinpeiKato and the rest of the TIER IV team for helping democratize the development of AV solutions. I look forward to seeing #Alpamayo’s adoption continue to grow! As Jensen said, “Everything that moves will be autonomous.” Together, we are making big strides toward this vision! More about Alpamayo 1.5: huggingface.co/blog/drmapavon… @NVIDIADRIVE @NVIDIAAI

English

21.2K

Marco Pavone@drmapavone·26 Mar

Ψ₀ (psi-lab.ai/Psi0) is an open foundation model for universal humanoid loco-manipulation—and, more broadly, one of the first and most comprehensive ecosystems for developing humanoid vision-language-action models trained from egocentric data. It advances the state of the art in performance while shedding light on key aspects of model development, including how to effectively structure the training process. 📄 Paper: arxiv.org/abs/2603.12263 Kudos to @yuewang314 for spearheading such an impactful effort—excited to be part of this collaboration!

Yue Wang@yuewang314

Introducing Ψ₀ (psi-lab.ai/Psi0) — an open foundation model for universal humanoid loco-manipulation. 🏆 Outperforms GR00T N1.6 by 40%+ overall success rate 📉 Uses only ~10% of the pre-training data 📦 Fully open-source: model, data, code, and deployment pipeline 1/10

English

595

Marco Pavone@drmapavone·17 Mar

Jensen today announced Alpamayo 1.5 at #NVIDIAGTC! #Alpamayo 1.5 is a major update to Alpamayo 1—@nvidia’s open 10B-parameter chain-of-thought reasoning VLA model, first introduced at #CES. Built on the #Cosmos-Reason2 VLM backbone and post-trained with RL, it adds support for navigation guidance, flexible multi-camera setups, configurable camera parameters, and user question answering. The result is an interactive, steerable reasoning engine for the AV community. We’re also releasing post-training scripts to help researchers and developers adapt the model. Additionally, we’ve significantly expanded the Alpamayo open platform across data and simulation, including releasing highly requested reasoning labels for the PhysicalAI Autonomous Vehicles dataset (huggingface.co/datasets/nvidi…), as well as our chain-of-causation auto-labeling pipeline. 🔎 Learn more about Alpamayo 1.5 and the latest extensions to the Alpamayo open platform: huggingface.co/blog/drmapavon… (please note that most of the links will become active in the next few days.) Happy building—and stay tuned for more in the coming months! @NVIDIADRIVE @NVIDIAAI

English

162

17.6K

Marco Pavone@drmapavone·15 Mar

What does it take to build autonomous vehicles that can reason about the world they drive in? Tomorrow at #NVIDIAGTC, Patrick Liu and I will take a deep dive into the #Alpamayo #reasoning model family—a family of reasoning-based vision–language–action (#VLA) models that form a core component of the Alpamayo open platform (huggingface.co/blog/drmapavon…). We’ll cover three main topics: - How reasoning-based VLA models like Alpamayo 1 are designed and built - What it takes to bring Alpamayo 1 to production, including some of our latest results - Several exciting announcements about the expansion of the Alpamayo open platform If you're working on autonomous driving, robotics, or foundation models for physical AI, this session will offer a look at where the field is heading. Session details: 📅 Monday, Mar 16 | 3:00 PM PDT 📍 #NVIDIAGTC 2026 🔗 nvda.ws/4rze5oj Looking forward to seeing many of you there. @NVIDIADRIVE @NVIDIAAI

English

7.6K

Marco Pavone@drmapavone·4 Mar

Excited to share CoVer-VLA — a contrastive verifier and hierarchical test-time scaling framework that bridges the intention–action gap in generalist robot policies. We show that allocating compute to reasoning and verification at deployment can be more effective than scaling policy training alone. 🌐 Website: cover-vla.github.io 📄 Paper: arxiv.org/abs/2602.12281 🤗 Models: huggingface.co/cover-vla 💻 Code: github.com/cover-vla/cove… Work led by @jackyk02, in collaboration with @Azaliamirh and @chelseabfinn

English

6.4K

Marco Pavone@drmapavone·3 Mar

@nvidia #Alpamayo 1 has now surpassed 100,000 downloads! 🚀 Since its announcement at #CES, Alpamayo 1 has been adopted across a wide range of use cases — and it’s been incredibly exciting to see the community put it to work in so many creative and impactful ways. Curious to learn more about Alpamayo 1 — and the latest extensions to the Alpamayo ecosystem that we’ll be unveiling at #NVIDIAGTC? Join my talk at NVIDIA GTC 2026: “From Research to Production: How Alpamayo Accelerates Autonomous Vehicle Development” 📅 Monday, March 16 | 3:00 PM PDT 📍 San Jose, CA 🔗 nvidia.com/gtc/session-ca… Looking forward to connecting with many of you there. #NVIDIAGTC #AutonomousVehicles #AI #Robotics @NVIDIAAI @NVIDIADRIVE

NVIDIA DRIVE@NVIDIADRIVE

Alpamayo 1 is now @huggingface’s top-downloaded robotics model with 100K downloads and counting. 🎉 It helps researchers and autonomous-driving practitioners develop and evaluate vision-language-action models for complex autonomous-driving scenarios, especially rare long-tail events. 🔗 Get started with Alpamayo 1 today: nvda.ws/3OnZoWU 🎥 Watch the deep-dive: nvda.ws/4tJxvbN

English

1.4K

Marco Pavone@drmapavone·12 Şub

Deep dive on @NVIDIA Alpamayo 1 (reasoning-based model for AVs) is now up. Watch the full recording: youtube.com/watch?v=V9E4GX… @NVIDIADRIVE @NVIDIAAI

YouTube

NVIDIA DRIVE@NVIDIADRIVE

💨 How fast can an autonomous vehicle think? With Alpamayo 1, NVIDIA's 10B-parameter chain-of-thought reasoning model, the distilled version can reason in real time. Hear Marco Pavone (@drmapavone), Yan Wang, Yurong You, and Wenhao Ding from our AV Research team break down Alpamayo 1 and what's next for reasoning in autonomous driving. 🔁 Watch the replay: nvda.ws/3O5gKb3

English

4.4K

Marco Pavone@drmapavone·11 Şub

Reminder - Join me and my collaborators for a *live* discussion on @nvidia Alpamayo 1 (huggingface.co/nvidia/Alpamay…), a reasoning-based vision–language–action (VLA) model for autonomous driving. 🎥 Livestream: Inside NVIDIA Alpamayo 1: Making Autonomous Vehicles Reason 🗓 February 11 ⏰ 9:00am PST 📍 Watch here: youtube.com/watch?v=V9E4GX…

YouTube

Marco Pavone@drmapavone

Join me and my collaborators for a *live* discussion on @nvidia Alpamayo 1 (huggingface.co/nvidia/Alpamay…), a reasoning-based vision–language–action (VLA) model for autonomous driving. 🎥 Livestream: Inside NVIDIA Alpamayo 1: Making Autonomous Vehicles Reason 🗓 February 11 ⏰ 9:00am PST 📍 Watch here: youtube.com/watch?v=V9E4GX… As NVIDIA CEO Jensen Huang put it: “The ChatGPT moment for physical AI is here — when machines begin to understand, reason, and act in the real world. Robotaxis are among the first to benefit. Alpamayo brings reasoning to autonomous vehicles, allowing them to think through rare scenarios, drive safely in complex environments, and explain their driving decisions — it’s the foundation for safe, scalable autonomy.” During the livestream, we’ll cover: - How #reasoning-based #VLA models like #Alpamayo 1 are designed and built - Applications ranging from end-to-end #autonomy to reasoning-driven auto-labeling - Key opportunities and challenges in developing reasoning models for #Physical #AI I’ll be joined by core Alpamayo 1 developers @yan_wang_9 @YurongYou @wenhaoding95, and we’ll take questions live from the community. 📖 Ahead of time, you might enjoy this overview of the Alpamayo ecosystem: huggingface.co/blog/drmapavon… And if you’re attending @NVIDIAGTC (March 16–19) and would like to meet some of the Alpamayo team in person, you can use my employee code for 25% off your conference pass: nvidia.com/gtc/?ncid=GTC-… Hope to see you at the livestream! @NVIDIAAI @NVIDIADRIVE

English

1.2K

Marco Pavone@drmapavone·8 Şub

YouTube

English

4.2K

Marco Pavone retweetledi

NVIDIA DRIVE@NVIDIADRIVE·22 Oca

Open models. Open datasets. Safer AVs. @drmapavone and Ed Schmerling explain how NVIDIA’s Alpamayo ecosystem uses multi-sensor data, reasoning models, and tools to advance safe autonomous driving. Watch the livestream 👉 nvda.ws/4bL2m1d

English

160

3.1K

Marco Pavone retweetledi

NVIDIA DRIVE@NVIDIADRIVE·15 Oca

Alpamayo open models and datasets are accelerating AV safety—see how NVIDIA is redefining safe Level 4 autonomous vehicles, live. Join @drmapavone and Ed Schmerling for a deep dive into NVIDIA’s Alpamayo open ecosystem, from diverse multi-sensor data to state-of-the-art AV reasoning models. 🗓️ Wednesday, January 21 ⏰ 9:00–10:00am PT 👉 Add to your calendar: nvda.ws/3Zft4Yk

English

Marco Pavone@drmapavone·12 Oca

It’s incredibly exciting to see how quickly the community is engaging with the @nvidia Alpamayo ecosystem for developing reasoning-based autonomous vehicles (huggingface.co/blog/drmapavon…)! In this instance, TIER IV is showcasing Alpamayo 1’s reasoning capabilities in Tokyo, integrated with Autoware and ROS. Fantastic work, @ShinpeiKato and the @tier_iv_global team! 👏 Quick highlights about Alpamayo: Alpamayo 1: - Among HuggingFace’s top 10 overall trending models - Among the top 3 most downloaded models on HuggingFace when filtered by 'robotics' Alpamayo PhysicalAI–Autonomous-Vehicles dataset: - Trending in HuggingFace’s top 10 overall datasets Happy developing! 🚀 #AutonomousVehicles #Robotics #AI #Reasoning #HuggingFace #Autoware #ROS #AutonomousDriving #PhysicalAI #Alpamayo #RobotLearning @NVIDIAAI @NVIDIADRIVE

Shinpei KATO (加藤真平)@ShinpeiKato

Alpamayoちゃんと学習させれば日本でも結構使えそう！自動運転も世界モデルもオープンソースの時代！

English

5.8K

Marco Pavone@drmapavone·9 Oca

More on #reasoning in Vision-Language-Action (#VLA) models --- Traditional VLA models decide what action to take by decomposing complex situations into their most salient factors. But reasoning models can do much more. When viewed as implicit world models operating in a semantic space, they can be used counterfactually—exploring multiple “what if” scenarios before acting. In our recent paper, Counterfactual VLA (CF-VLA, arxiv.org/pdf/2512.24426), we show that counterfactual reasoning consistently improves trajectory accuracy, safety, and reasoning quality. Key contributions: - Self-reflective counterfactual reasoning: CF-VLA reflects on predicted meta-actions, anticipates consequences, and revises plans before execution—enabling causal self-correction. - Automated data pipeline: A novel data pipeline generates counterfactual data, forming a self-improving loop for reasoning and action. - Adaptive thinking in autonomous driving: CF-VLA focuses reasoning on the most challenging scenarios, improving performance while keeping test-time computation efficient. Paper: arxiv.org/pdf/2512.24426 #AI #Robotics #VisionLanguageAction #AutonomousSystems #MachineLearning #CounterfactualReasoning @NVIDIAAI @NVIDIADRIVE

English

105

7.4K

Marco Pavone@drmapavone·7 Oca

On the heels of the Alpamayo announcement — @nvidia's fully open ecosystem for accelerating the development of reasoning-based autonomous vehicles — I’m excited to share our latest advances in researching reasoning-based Physical AI models. Starting with Latent‑CoT‑Drive (LCDrive), a novel approach that learns to reason in a *latent* action-aligned space for end-to-end driving decision-making. Traditional vision-language-action models rely on natural language for chain-of-thought reasoning — but is language the best medium for encoding driving decisions? In our paper, we explore this question and introduce a latent representation that integrates both action proposals and predictions of future outcomes, enabling richer reasoning and improved performance. 🔍 Key Contributions - Latent reasoning for driving: LCDrive rethinks reasoning in vision–language–action (VLA) models using latent chain-of-thought tokens aligned with driving actions and a latent world model. - Effective training framework: Combines latent CoT cold-start, world model training, and closed-loop reinforcement learning, tailored for latent reasoning models. - Empirical gains: Shows faster inference and higher driving quality compared to non-reasoning and text-reasoning baselines. This work shows that latent reasoning provides a compelling representation for reasoning-based VLA models. 📄 Full paper here: arxiv.org/pdf/2512.10226 #AutonomousVehicles #AutonomousDriving #PhysicalAI #ReasoningAI #Alpamayo @NVIDIAAI @NVIDIADRIVE

English

138

10.7K

Marco Pavone@drmapavone·6 Oca

A great video overview of @nvidia Alpamayo—@nvidia’s open ecosystem for building reasoning-based autonomous driving systems: youtu.be/KGCTwoAlhsM?si… #AutonomousVehicles #AutonomousDriving #ReasoningAI #OpenEcosystem #Alpamayo @NVIDIAAI @NVIDIADRIVE

YouTube

English

2.8K

Marco Pavone@drmapavone·6 Oca

🚀 Exciting news from #CES2026! In his keynote today, Jensen announced @nvidia Alpamayo — a *fully open* ecosystem of models, simulation tools, and datasets designed to accelerate reasoning-based autonomous vehicle (AV) architectures and advance the path to Level 4 autonomous driving. Alpamayo brings together several technologies we’ve developed to enable reasoning-based vision–language–action (VLA) models for AVs. Our goal is to provide researchers and developers with a flexible, fast, and scalable platform for evaluating and training reasoning-based AV architectures in realistic closed-loop settings. Explore Alpamayo: -- Press Release: nvidianews.nvidia.com/news/alpamayo-… -- Hugging Face Blog: huggingface.co/blog/drmapavon… -- Tech Blog: developer.nvidia.com/blog/building-… -- Alpamayo 1 reasoning model: research.nvidia.com/publication/20… -- Physical AI AV Dataset: huggingface.co/datasets/nvidi… -- AlpaSim simulator: github.com/NVlabs/alpasim I’m incredibly proud of the @nvidia AV Research team (research.nvidia.com/labs/avg/) and our many @nvidia collaborators whose contributions made this possible. More releases and features are coming soon — we can’t wait to see what the community builds with Alpamayo! 💡 Want to help grow the Alpamayo ecosystem? We’re hiring: [Sr.] Research Scientist: nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAEx… [Sr.] Research Engineer: nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAEx… #AutonomousVehicles #AutonomousDriving #AI #Simulation #ReasoningAI #OpenEcosystem #Alpamayo @NVIDIAAI @NVIDIADRIVE

English

114.2K

Marco Pavone@drmapavone·19 Ara

High-quality motion annotation is a critical enabling technology for unlocking Physical AI. We've just released FoundationMotion, an automated motion labeling pipeline for generalized spatial detection, tracking, and understanding of object behaviors. When fine-tuned on our annotations, the open-source Qwen and NVILA models outperform state-of-the-art closed-source models on spatial understanding tasks across autonomous driving, robotics, and everyday scenarios. As always, we are making everything publicly available: 📜 Paper: arxiv.org/abs/2512.10927 🌐 Project page: yulugan.com/projects/Found… 💻 Code: github.com/Wolfv0/Foundat… 🕸️ Models: huggingface.co/WoWolf/models 📊 Dataset: huggingface.co/datasets/WoWol… 👉 Interactive demo: huggingface.co/spaces/yulu2/F… Outstanding work spearheaded by @Boyiliee, with phenomenal collaborators from @MIT, @NVIDIAAI, the @UMich, and @UCBerkeley.

Boyi Li@Boyiliee

Introducing FoundationMotion. A large-scale, video-derived motion annotation dataset & auto-labeling pipeline + advanced models for motion understanding. Fully open-source: code, datasets, and models, free to use and build on. Understanding motion is core to physical reasoning, yet today’s leading models still struggle with simple spatial actions like “turn right” or “move up” or “flip the toast” - mainly due to the lack of large, fine-grained motion datasets. We present FoundationMotion, a fully automated pipeline that: • detects & tracks objects in videos • extracts trajectories • uses LLMs + frames to generate rich motion captions & QA pairs → creating large-scale, high-quality motion datasets at scale. After fine-tuning the open-source models Qwen and NVILA on our annotations, these models now outperform the closed-source Gemini-3-Flash and GPT-5.1 on spatial understanding tasks across autonomous driving, robotics, and everyday scenarios. 📜Paper: arxiv.org/abs/2512.10927 🌐Webpage: yulugan.com/projects/Found… 💻 Code: github.com/Wolfv0/Foundat… 🕸️Model: huggingface.co/WoWolf/models 📊 Dataset: huggingface.co/datasets/WoWol… 👉 Interactive Demo: huggingface.co/spaces/yulu2/F… Let’s move research forward together. FoundationMotion is also referred to as Wolf V2 🐺, the second chapter in the Wolf series: wolfv0.github.io.

English

13.3K

Marco Pavone@drmapavone·11 Ara

🚀 Strengthening Robot Safety with Multimodal Defenses I’m excited to share our recent work, “Preventing Robotic Jailbreaking via Multimodal Domain Adaptation,” now available on arXiv: arxiv.org/pdf/2509.23281 As vision-language models (VLMs) become foundational components of modern robot autonomy, VLM-enabled robots also become increasingly vulnerable to jailbreaking attacks—adversarial prompts that can bypass safety filters and trigger unsafe or harmful behaviors in real-world robotic systems. This poses a significant challenge for the safe deployment of AI in autonomous vehicles, maritime robots, quadrupeds, and other embodied platforms. 📌 In this work, we introduce J-DAPT, a lightweight framework for robust multimodal jailbreak detection that delivers near-perfect detection performance across multiple robotic domains with minimal overhead. Our results demonstrate that it is indeed possible to effectively enhance safety defenses for vision-language models in robotics—an important step toward trustworthy and reliable autonomous systems. 📄 Read the full paper: arxiv.org/pdf/2509.23281 A great collaboration with the research groups of George Pappas and Mauro Conti. #Robotics #AI #Safety #MachineLearning #MultimodalAI

English

2.2K

Marco Pavone@drmapavone·8 Ara

🚗 Imitation learning is everywhere—but is it enough? So far, imitation learning—most commonly via behavior cloning (BC)—remains the go-to approach for training real-world autonomous vehicle (AV) driving policies. Yet BC operates in an open-loop (OL) fashion, overlooking the critical interdependence among inputs, outputs, and future states that comes with closed-loop (CL) operation. The result? The notorious—but often overlooked—OL–CL gap ⚠️ To address this challenge and encourage broader adoption of CL techniques, we’ve just published a survey (research.nvidia.com/publication/20…) presenting a comprehensive taxonomy of closed-loop training methods for end-to-end driving. Our framework organizes approaches along three key axes: - Action generation - Environment response generation - Training objectives 💡 Bottom line: enabling technologies—like neural rendering, generative world models, and scalable RL—have now matured, making closed-loop AV training ready for wide-scale adoption. We’d love to hear your thoughts—drop a comment and join the discussion! 💬 And as a reminder, we are hiring for full-time research scientist and research engineer positions: 🔹 [Sr.] Research Scientist: nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAEx… 🔹 [Sr.] Research Engineer: nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAEx… @NVIDIADRIVE @NVIDIAAI @nvidia

English

9.2K

Keşfet

@JonasFrey96 @nvidia @ShinpeiKato @NVIDIADRIVE @NVIDIAAI @yuewang314 @jackyk02 @Azaliamirh