Marco Pavone

238 posts

Marco Pavone

@drmapavone

Prof @Stanford, Distinguished Research Scientist and AV research lead @nvidia. PhD from @MITAeroAstro. Robotics, autonomous systems, AI. Opinions are my own.

Stanford, CA USA เข้าร่วม Kasım 2018

67 กำลังติดตาม5.2K ผู้ติดตาม

Marco Pavone@drmapavone·4d

Jensen today announced Alpamayo 1.5 at #NVIDIAGTC! #Alpamayo 1.5 is a major update to Alpamayo 1—@nvidia’s open 10B-parameter chain-of-thought reasoning VLA model, first introduced at #CES. Built on the #Cosmos-Reason2 VLM backbone and post-trained with RL, it adds support for navigation guidance, flexible multi-camera setups, configurable camera parameters, and user question answering. The result is an interactive, steerable reasoning engine for the AV community. We’re also releasing post-training scripts to help researchers and developers adapt the model. Additionally, we’ve significantly expanded the Alpamayo open platform across data and simulation, including releasing highly requested reasoning labels for the PhysicalAI Autonomous Vehicles dataset (huggingface.co/datasets/nvidi…), as well as our chain-of-causation auto-labeling pipeline. 🔎 Learn more about Alpamayo 1.5 and the latest extensions to the Alpamayo open platform: huggingface.co/blog/drmapavon… (please note that most of the links will become active in the next few days.) Happy building—and stay tuned for more in the coming months! @NVIDIADRIVE @NVIDIAAI

English

155

16.6K

Marco Pavone@drmapavone·5d

What does it take to build autonomous vehicles that can reason about the world they drive in? Tomorrow at #NVIDIAGTC, Patrick Liu and I will take a deep dive into the #Alpamayo #reasoning model family—a family of reasoning-based vision–language–action (#VLA) models that form a core component of the Alpamayo open platform (huggingface.co/blog/drmapavon…). We’ll cover three main topics: - How reasoning-based VLA models like Alpamayo 1 are designed and built - What it takes to bring Alpamayo 1 to production, including some of our latest results - Several exciting announcements about the expansion of the Alpamayo open platform If you're working on autonomous driving, robotics, or foundation models for physical AI, this session will offer a look at where the field is heading. Session details: 📅 Monday, Mar 16 | 3:00 PM PDT 📍 #NVIDIAGTC 2026 🔗 nvda.ws/4rze5oj Looking forward to seeing many of you there. @NVIDIADRIVE @NVIDIAAI

English

7.1K

Marco Pavone@drmapavone·4 Mar

Excited to share CoVer-VLA — a contrastive verifier and hierarchical test-time scaling framework that bridges the intention–action gap in generalist robot policies. We show that allocating compute to reasoning and verification at deployment can be more effective than scaling policy training alone. 🌐 Website: cover-vla.github.io 📄 Paper: arxiv.org/abs/2602.12281 🤗 Models: huggingface.co/cover-vla 💻 Code: github.com/cover-vla/cove… Work led by @jackyk02, in collaboration with @Azaliamirh and @chelseabfinn

English

6.1K

Marco Pavone@drmapavone·3 Mar

@nvidia #Alpamayo 1 has now surpassed 100,000 downloads! 🚀 Since its announcement at #CES, Alpamayo 1 has been adopted across a wide range of use cases — and it’s been incredibly exciting to see the community put it to work in so many creative and impactful ways. Curious to learn more about Alpamayo 1 — and the latest extensions to the Alpamayo ecosystem that we’ll be unveiling at #NVIDIAGTC? Join my talk at NVIDIA GTC 2026: “From Research to Production: How Alpamayo Accelerates Autonomous Vehicle Development” 📅 Monday, March 16 | 3:00 PM PDT 📍 San Jose, CA 🔗 nvidia.com/gtc/session-ca… Looking forward to connecting with many of you there. #NVIDIAGTC #AutonomousVehicles #AI #Robotics @NVIDIAAI @NVIDIADRIVE

NVIDIA DRIVE@NVIDIADRIVE

Alpamayo 1 is now @huggingface’s top-downloaded robotics model with 100K downloads and counting. 🎉 It helps researchers and autonomous-driving practitioners develop and evaluate vision-language-action models for complex autonomous-driving scenarios, especially rare long-tail events. 🔗 Get started with Alpamayo 1 today: nvda.ws/3OnZoWU 🎥 Watch the deep-dive: nvda.ws/4tJxvbN

English

1.4K

Marco Pavone@drmapavone·12 Şub

Deep dive on @NVIDIA Alpamayo 1 (reasoning-based model for AVs) is now up. Watch the full recording: youtube.com/watch?v=V9E4GX… @NVIDIADRIVE @NVIDIAAI

YouTube

NVIDIA DRIVE@NVIDIADRIVE

💨 How fast can an autonomous vehicle think? With Alpamayo 1, NVIDIA's 10B-parameter chain-of-thought reasoning model, the distilled version can reason in real time. Hear Marco Pavone (@drmapavone), Yan Wang, Yurong You, and Wenhao Ding from our AV Research team break down Alpamayo 1 and what's next for reasoning in autonomous driving. 🔁 Watch the replay: nvda.ws/3O5gKb3

English

4.3K

Marco Pavone@drmapavone·11 Şub

Reminder - Join me and my collaborators for a *live* discussion on @nvidia Alpamayo 1 (huggingface.co/nvidia/Alpamay…), a reasoning-based vision–language–action (VLA) model for autonomous driving. 🎥 Livestream: Inside NVIDIA Alpamayo 1: Making Autonomous Vehicles Reason 🗓 February 11 ⏰ 9:00am PST 📍 Watch here: youtube.com/watch?v=V9E4GX…

YouTube

Marco Pavone@drmapavone

Join me and my collaborators for a *live* discussion on @nvidia Alpamayo 1 (huggingface.co/nvidia/Alpamay…), a reasoning-based vision–language–action (VLA) model for autonomous driving. 🎥 Livestream: Inside NVIDIA Alpamayo 1: Making Autonomous Vehicles Reason 🗓 February 11 ⏰ 9:00am PST 📍 Watch here: youtube.com/watch?v=V9E4GX… As NVIDIA CEO Jensen Huang put it: “The ChatGPT moment for physical AI is here — when machines begin to understand, reason, and act in the real world. Robotaxis are among the first to benefit. Alpamayo brings reasoning to autonomous vehicles, allowing them to think through rare scenarios, drive safely in complex environments, and explain their driving decisions — it’s the foundation for safe, scalable autonomy.” During the livestream, we’ll cover: - How #reasoning-based #VLA models like #Alpamayo 1 are designed and built - Applications ranging from end-to-end #autonomy to reasoning-driven auto-labeling - Key opportunities and challenges in developing reasoning models for #Physical #AI I’ll be joined by core Alpamayo 1 developers @yan_wang_9 @YurongYou @wenhaoding95, and we’ll take questions live from the community. 📖 Ahead of time, you might enjoy this overview of the Alpamayo ecosystem: huggingface.co/blog/drmapavon… And if you’re attending @NVIDIAGTC (March 16–19) and would like to meet some of the Alpamayo team in person, you can use my employee code for 25% off your conference pass: nvidia.com/gtc/?ncid=GTC-… Hope to see you at the livestream! @NVIDIAAI @NVIDIADRIVE

English

1.1K

Marco Pavone@drmapavone·8 Şub

YouTube

English

4.1K

Marco Pavone รีทวีตแล้ว

NVIDIA DRIVE@NVIDIADRIVE·22 Oca

Open models. Open datasets. Safer AVs. @drmapavone and Ed Schmerling explain how NVIDIA’s Alpamayo ecosystem uses multi-sensor data, reasoning models, and tools to advance safe autonomous driving. Watch the livestream 👉 nvda.ws/4bL2m1d

English

159

3.1K

Marco Pavone รีทวีตแล้ว

NVIDIA DRIVE@NVIDIADRIVE·15 Oca

Alpamayo open models and datasets are accelerating AV safety—see how NVIDIA is redefining safe Level 4 autonomous vehicles, live. Join @drmapavone and Ed Schmerling for a deep dive into NVIDIA’s Alpamayo open ecosystem, from diverse multi-sensor data to state-of-the-art AV reasoning models. 🗓️ Wednesday, January 21 ⏰ 9:00–10:00am PT 👉 Add to your calendar: nvda.ws/3Zft4Yk

English

Marco Pavone@drmapavone·12 Oca

It’s incredibly exciting to see how quickly the community is engaging with the @nvidia Alpamayo ecosystem for developing reasoning-based autonomous vehicles (huggingface.co/blog/drmapavon…)! In this instance, TIER IV is showcasing Alpamayo 1’s reasoning capabilities in Tokyo, integrated with Autoware and ROS. Fantastic work, @ShinpeiKato and the @tier_iv_global team! 👏 Quick highlights about Alpamayo: Alpamayo 1: - Among HuggingFace’s top 10 overall trending models - Among the top 3 most downloaded models on HuggingFace when filtered by 'robotics' Alpamayo PhysicalAI–Autonomous-Vehicles dataset: - Trending in HuggingFace’s top 10 overall datasets Happy developing! 🚀 #AutonomousVehicles #Robotics #AI #Reasoning #HuggingFace #Autoware #ROS #AutonomousDriving #PhysicalAI #Alpamayo #RobotLearning @NVIDIAAI @NVIDIADRIVE

Shinpei KATO (加藤真平)@ShinpeiKato

Alpamayoちゃんと学習させれば日本でも結構使えそう！自動運転も世界モデルもオープンソースの時代！

English

5.7K

Marco Pavone@drmapavone·9 Oca

More on #reasoning in Vision-Language-Action (#VLA) models --- Traditional VLA models decide what action to take by decomposing complex situations into their most salient factors. But reasoning models can do much more. When viewed as implicit world models operating in a semantic space, they can be used counterfactually—exploring multiple “what if” scenarios before acting. In our recent paper, Counterfactual VLA (CF-VLA, arxiv.org/pdf/2512.24426), we show that counterfactual reasoning consistently improves trajectory accuracy, safety, and reasoning quality. Key contributions: - Self-reflective counterfactual reasoning: CF-VLA reflects on predicted meta-actions, anticipates consequences, and revises plans before execution—enabling causal self-correction. - Automated data pipeline: A novel data pipeline generates counterfactual data, forming a self-improving loop for reasoning and action. - Adaptive thinking in autonomous driving: CF-VLA focuses reasoning on the most challenging scenarios, improving performance while keeping test-time computation efficient. Paper: arxiv.org/pdf/2512.24426 #AI #Robotics #VisionLanguageAction #AutonomousSystems #MachineLearning #CounterfactualReasoning @NVIDIAAI @NVIDIADRIVE

English

105

7.2K

Marco Pavone@drmapavone·7 Oca

On the heels of the Alpamayo announcement — @nvidia's fully open ecosystem for accelerating the development of reasoning-based autonomous vehicles — I’m excited to share our latest advances in researching reasoning-based Physical AI models. Starting with Latent‑CoT‑Drive (LCDrive), a novel approach that learns to reason in a *latent* action-aligned space for end-to-end driving decision-making. Traditional vision-language-action models rely on natural language for chain-of-thought reasoning — but is language the best medium for encoding driving decisions? In our paper, we explore this question and introduce a latent representation that integrates both action proposals and predictions of future outcomes, enabling richer reasoning and improved performance. 🔍 Key Contributions - Latent reasoning for driving: LCDrive rethinks reasoning in vision–language–action (VLA) models using latent chain-of-thought tokens aligned with driving actions and a latent world model. - Effective training framework: Combines latent CoT cold-start, world model training, and closed-loop reinforcement learning, tailored for latent reasoning models. - Empirical gains: Shows faster inference and higher driving quality compared to non-reasoning and text-reasoning baselines. This work shows that latent reasoning provides a compelling representation for reasoning-based VLA models. 📄 Full paper here: arxiv.org/pdf/2512.10226 #AutonomousVehicles #AutonomousDriving #PhysicalAI #ReasoningAI #Alpamayo @NVIDIAAI @NVIDIADRIVE

English

138

10.6K

Marco Pavone@drmapavone·6 Oca

A great video overview of @nvidia Alpamayo—@nvidia’s open ecosystem for building reasoning-based autonomous driving systems: youtu.be/KGCTwoAlhsM?si… #AutonomousVehicles #AutonomousDriving #ReasoningAI #OpenEcosystem #Alpamayo @NVIDIAAI @NVIDIADRIVE

YouTube

English

2.7K

Marco Pavone@drmapavone·6 Oca

🚀 Exciting news from #CES2026! In his keynote today, Jensen announced @nvidia Alpamayo — a *fully open* ecosystem of models, simulation tools, and datasets designed to accelerate reasoning-based autonomous vehicle (AV) architectures and advance the path to Level 4 autonomous driving. Alpamayo brings together several technologies we’ve developed to enable reasoning-based vision–language–action (VLA) models for AVs. Our goal is to provide researchers and developers with a flexible, fast, and scalable platform for evaluating and training reasoning-based AV architectures in realistic closed-loop settings. Explore Alpamayo: -- Press Release: nvidianews.nvidia.com/news/alpamayo-… -- Hugging Face Blog: huggingface.co/blog/drmapavon… -- Tech Blog: developer.nvidia.com/blog/building-… -- Alpamayo 1 reasoning model: research.nvidia.com/publication/20… -- Physical AI AV Dataset: huggingface.co/datasets/nvidi… -- AlpaSim simulator: github.com/NVlabs/alpasim I’m incredibly proud of the @nvidia AV Research team (research.nvidia.com/labs/avg/) and our many @nvidia collaborators whose contributions made this possible. More releases and features are coming soon — we can’t wait to see what the community builds with Alpamayo! 💡 Want to help grow the Alpamayo ecosystem? We’re hiring: [Sr.] Research Scientist: nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAEx… [Sr.] Research Engineer: nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAEx… #AutonomousVehicles #AutonomousDriving #AI #Simulation #ReasoningAI #OpenEcosystem #Alpamayo @NVIDIAAI @NVIDIADRIVE

English

114K

Marco Pavone@drmapavone·19 Ara

High-quality motion annotation is a critical enabling technology for unlocking Physical AI. We've just released FoundationMotion, an automated motion labeling pipeline for generalized spatial detection, tracking, and understanding of object behaviors. When fine-tuned on our annotations, the open-source Qwen and NVILA models outperform state-of-the-art closed-source models on spatial understanding tasks across autonomous driving, robotics, and everyday scenarios. As always, we are making everything publicly available: 📜 Paper: arxiv.org/abs/2512.10927 🌐 Project page: yulugan.com/projects/Found… 💻 Code: github.com/Wolfv0/Foundat… 🕸️ Models: huggingface.co/WoWolf/models 📊 Dataset: huggingface.co/datasets/WoWol… 👉 Interactive demo: huggingface.co/spaces/yulu2/F… Outstanding work spearheaded by @Boyiliee, with phenomenal collaborators from @MIT, @NVIDIAAI, the @UMich, and @UCBerkeley.

Boyi Li@Boyiliee

Introducing FoundationMotion. A large-scale, video-derived motion annotation dataset & auto-labeling pipeline + advanced models for motion understanding. Fully open-source: code, datasets, and models, free to use and build on. Understanding motion is core to physical reasoning, yet today’s leading models still struggle with simple spatial actions like “turn right” or “move up” or “flip the toast” - mainly due to the lack of large, fine-grained motion datasets. We present FoundationMotion, a fully automated pipeline that: • detects & tracks objects in videos • extracts trajectories • uses LLMs + frames to generate rich motion captions & QA pairs → creating large-scale, high-quality motion datasets at scale. After fine-tuning the open-source models Qwen and NVILA on our annotations, these models now outperform the closed-source Gemini-3-Flash and GPT-5.1 on spatial understanding tasks across autonomous driving, robotics, and everyday scenarios. 📜Paper: arxiv.org/abs/2512.10927 🌐Webpage: yulugan.com/projects/Found… 💻 Code: github.com/Wolfv0/Foundat… 🕸️Model: huggingface.co/WoWolf/models 📊 Dataset: huggingface.co/datasets/WoWol… 👉 Interactive Demo: huggingface.co/spaces/yulu2/F… Let’s move research forward together. FoundationMotion is also referred to as Wolf V2 🐺, the second chapter in the Wolf series: wolfv0.github.io.

English

13.2K

Marco Pavone@drmapavone·11 Ara

🚀 Strengthening Robot Safety with Multimodal Defenses I’m excited to share our recent work, “Preventing Robotic Jailbreaking via Multimodal Domain Adaptation,” now available on arXiv: arxiv.org/pdf/2509.23281 As vision-language models (VLMs) become foundational components of modern robot autonomy, VLM-enabled robots also become increasingly vulnerable to jailbreaking attacks—adversarial prompts that can bypass safety filters and trigger unsafe or harmful behaviors in real-world robotic systems. This poses a significant challenge for the safe deployment of AI in autonomous vehicles, maritime robots, quadrupeds, and other embodied platforms. 📌 In this work, we introduce J-DAPT, a lightweight framework for robust multimodal jailbreak detection that delivers near-perfect detection performance across multiple robotic domains with minimal overhead. Our results demonstrate that it is indeed possible to effectively enhance safety defenses for vision-language models in robotics—an important step toward trustworthy and reliable autonomous systems. 📄 Read the full paper: arxiv.org/pdf/2509.23281 A great collaboration with the research groups of George Pappas and Mauro Conti. #Robotics #AI #Safety #MachineLearning #MultimodalAI

English

2.2K

Marco Pavone@drmapavone·8 Ara

🚗 Imitation learning is everywhere—but is it enough? So far, imitation learning—most commonly via behavior cloning (BC)—remains the go-to approach for training real-world autonomous vehicle (AV) driving policies. Yet BC operates in an open-loop (OL) fashion, overlooking the critical interdependence among inputs, outputs, and future states that comes with closed-loop (CL) operation. The result? The notorious—but often overlooked—OL–CL gap ⚠️ To address this challenge and encourage broader adoption of CL techniques, we’ve just published a survey (research.nvidia.com/publication/20…) presenting a comprehensive taxonomy of closed-loop training methods for end-to-end driving. Our framework organizes approaches along three key axes: - Action generation - Environment response generation - Training objectives 💡 Bottom line: enabling technologies—like neural rendering, generative world models, and scalable RL—have now matured, making closed-loop AV training ready for wide-scale adoption. We’d love to hear your thoughts—drop a comment and join the discussion! 💬 And as a reminder, we are hiring for full-time research scientist and research engineer positions: 🔹 [Sr.] Research Scientist: nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAEx… 🔹 [Sr.] Research Engineer: nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAEx… @NVIDIADRIVE @NVIDIAAI @nvidia

English

9.2K

Marco Pavone@drmapavone·5 Ara

A great summary of our work on AI-powered robot navigation aboard the International Space Station: news.stanford.edu/stories/2025/1… @StanfordEng @StanfordAILab @NASA

English

1.8K

Marco Pavone@drmapavone·4 Ara

We’ve just released @nvidia #DRIVE Alpamayo-R1 (AR1) — the world’s first industry-scale open #reasoning #VLA model for autonomous-vehicle (AV) research. AR1 integrates Chain-of-Causation reasoning with trajectory planning to improve decision-making in complex driving scenarios. Built on @nvidia #Cosmos #Reason, AR1 is designed as a customizable foundation for a broad range of AV applications — from instantiating an end-to-end backbone for autonomous driving to powering advanced, reasoning-based auto-labeling tools. Resources: Model: huggingface.co/nvidia/Alpamay… Inference Code: github.com/NVlabs/alpamayo Paper: research.nvidia.com/publication/20… Blog Post: blogs.nvidia.com/blog/neurips-o… A subset of the data used to train and evaluate AR1 is available in the @nvidia Physical AI Open Datasets: huggingface.co/datasets/nvidi… AR1 can be evaluated using AlpaSim (github.com/NVlabs/alpasim), @nvidia's newly released open-source AV simulation framework built specifically for research and development. (Separate post on AlpaSim coming soon.) This release completes @nvidia’s trifecta — model, data, and simulator — to accelerate research and development in the autonomous-vehicle domain. Happy developing, and stay tuned for more! Huge thanks to the phenomenal team that made this possible @NVIDIAAI @nvidia.

English

173

22.4K

Marco Pavone@drmapavone·10 Kas

Here’s a short explainer on NVIDIA’s approach to Level 4 autonomy and robotaxis: 🎥 Video: youtube.com/watch?v=mYUrtm… For more details, see the accompanying blog post: 📰 Blog: blogs.nvidia.com/blog/level-4-a… And for those interested in the research behind our reasoning models for autonomous vehicles: 📄 Paper: Alpamayo-R1: research.nvidia.com/publication/20… Happy viewing and reading! 🚗🤖

YouTube

NVIDIA DRIVE@NVIDIADRIVE

🚗 What does level 4 autonomy 𝘢𝘤𝘵𝘶𝘢𝘭𝘭𝘺 mean? @drmapavone, NVIDIA Director of Autonomous Vehicle Research and Stanford professor, breaks down the breakthroughs enabling L4 autonomy, and the full-stack safety system that makes it possible. 🎥 Watch the explainer: nvda.ws/3Lvu0nO

English

7.5K

ค้นพบ

@nvidia @NVIDIADRIVE @NVIDIAAI @jackyk02 @Azaliamirh @chelseabfinn @yan_wang_9 @YurongYou