Yanjiang Guo

90 posts

Yanjiang Guo

Yanjiang Guo

@Yanjiang_Guo

CS PhD & EE Undergrad @Tsinghua_Uni. Visiting PhD Student @Stanford.

Katılım Kasım 2021
426 Takip Edilen1K Takipçiler
Sabitlenmiş Tweet
Yanjiang Guo
Yanjiang Guo@Yanjiang_Guo·
Excited to share VLAW: Iterative Co-Improvement of Vision-Language-Action Policy and World Model We explore improving VLA inside a learned world model, and find that the key is to jointly improve VLA and WM! Website: sites.google.com/view/vlaw-arxiv
Yanjiang Guo tweet media
English
4
44
272
57.9K
Yanjiang Guo retweetledi
Joy He-Yueya
Joy He-Yueya@JoyHeYueya·
Scientists often make breakthroughs by synthesizing ideas across papers. In our new paper, we ask whether a language model can anticipate this process: given two parent papers, can it generate the core insight of a future paper built on them? 🧵⬇️
English
18
91
734
181.5K
Yanjiang Guo retweetledi
Chelsea Finn
Chelsea Finn@chelseabfinn·
LLM post-training used to mean fine-tuning to a downstream task Robotics has been stuck in this setting, needing task-specific fine-tuning for best performance π07 changes this: It works out of the box & outperforms fine-tuned specialists Details: pi.website/pi07
English
18
59
556
53.7K
Yanjiang Guo retweetledi
ali
ali@aliuahma·
i'm so excited to finally share what we've been working on @rhodaai! taking lessons from success in LLMs, we identify autoregressive video generation as a scalable objective for training robot policies. 1/n
Jagdeep Singh@startupjag

After operating in stealth for the last 18 months @rhodaai , we’re excited today to finally show the world what we’ve been working on. We believe we’re on a path to physical AGI with the launch of our brand new foundation model, the Direct Video Action (DVA) model.

English
7
10
101
15K
Yanjiang Guo retweetledi
Yunzhu Li
Yunzhu Li@YunzhuLiYZ·
For a long time, I was skeptical about action-conditioned video prediction for robotics. Many models look impressive, but once you ask them to handle long-horizon manipulation with real physical interaction, things quickly fall apart (e.g., Genie is amazing but mostly focused on navigation). This project changed my mind. I'm beyond excited to share Interactive World Simulator, a project we have been working on for the past ~1.5 years 🤖 One of the first world models that produces convincing results for long-horizon robotic manipulation involving complex physical interactions, across a diverse range of objects (rigid objects, deformables, ropes, object piles). It directly unlocks scalable data generation for robotic policy training and policy evaluation. Try it yourself (no installation needed): yixuanwang.me/interactive_wo… Play directly with the simulator in your browser. Key Takeaways: 1️⃣ 15 Hz long-horizon action-conditioned video prediction for 10+ minutes on a single RTX 4090 GPU 2️⃣ Visual and dynamic fidelity: people often ask how much sim data equals one real data point. In our experiments, it turns out to be close to one-to-one using the Interactive World Simulator 3️⃣ Stress testing matters: we emphasize interactive stress testing to understand robustness and stability and to build trust in the simulator 4️⃣ The model is trained with only ~6 hours of real-world random interaction data on a single GPU. Imagine what happens if we scale this 1000× or even 1M× Huge credit to @YXWangBot, who led this effort with countless hours of work on data collection, training recipes, and system design. I'm incredibly proud of the work he did here! Enjoy the demos and videos. We also fully open-sourced the codebase for anyone interested in applying this to their own tasks. #Robotics #RobotLearning #WorldModels #EmbodiedAI
Yixuan Wang@YXWangBot

1/ World models are getting popular in robotics 🤖✨ But there’s a big problem: most are slow and break physical consistency over long horizons. 2/ Today we’re releasing Interactive World Simulator: An action-conditioned world model that supports stable long-horizon interaction. 3/ Key result: ✅ 10+ minutes of interactive prediction ✅ 15 FPS ✅ on a single RTX 4090🔥 4/ Why this matters: it unlocks two critical robotics applications: 🚀 Scalable data generation for policy training 🧪 Faithful policy evaluation 5/ You can play with our world model NOW at #interactive-demo" target="_blank" rel="nofollow noopener">yixuanwang.me/interactive_wo…. NO git clone, NO pip install, NO python. Just click and play! NOTE ⚠️ ALL videos here are generated purely by our model in pixel space! They are **NOT** from a real camera More details coming 👇 (1/9) #Robotics #AI #MachineLearning #WorldModels #RobotLearning #ImitationLearning

English
2
52
368
75.3K
RoboPapers
RoboPapers@RoboPapers·
Pretraining is essential for good performance on a wide variety of robotics tasks, and so most vision-language-action models build off of a vision language model (VLM) trained on a wide variety of image-language data. But how does the choice of VLM translate to downstream robotics performance? Jianke Zhang and @GYanjiang join us to talk about this key part of the robot policy, looking at a wide variety of different VLMs and how they perform. Interestingly, they see that performance on auxiliary tasks like quesiton answering did not lead to downstream improvements in control. To learn more, watch episode 65 of RoboPapers now, with @chris_j_paxton and @DJiafei!
English
2
20
123
23.9K
Yanjiang Guo retweetledi
Marcel Torné
Marcel Torné@marceltornev·
We equipped PI policies with memory! And taught our robots to do long-horizon real world tasks such as preparing the items for a recipe, cooking a grilled cheese and cleaning the kitchen!
Physical Intelligence@physical_int

We’ve developed a memory system for our models that provides both short-term visual memory and long-term semantic memory. Our approach allows us to train robots to perform long and complex tasks, like cleaning up a kitchen or preparing a grilled cheese sandwich from scratch 👇

English
7
15
89
9.7K
Yanjiang Guo retweetledi
Chelsea Finn
Chelsea Finn@chelseabfinn·
We added short-term visual memory + long-term text memory to pi models. 🤖 Enables robots to: - complete tasks up to 15 min long - cook grilled cheese while keeping track of time - adapt in-context Paper & videos: pi.website/memory
English
12
47
491
30.9K
Yanjiang Guo retweetledi
Lihan Zha
Lihan Zha@LihanZha·
Today's state-of-the-art VLAs struggle to generalize zero-shot to new robot embodiments, despite training on extensive multi-embodiment data. We introduce Language-Action Pre-training (LAP) and LAP-3B — the first VLA to achieve substantial zero-shot transfer to unseen real-world robot embodiments, through simply aligning action representation with language. Everything is open-sourced! Try it out on your own robot: 🌐 lap-vla.github.io
English
12
93
338
58.6K
Yanjiang Guo retweetledi
Tian Gao
Tian Gao@TianGao_19·
Long-tail scenarios remain a major challenge for autonomous driving. Unusual events—like accidents or construction zones—are underrepresented in driving data, yet require semantic and commonsense reasoning grounded in control. We propose SteerVLA, a framework that uses VLM reasoning to steer a driving policy via grounded, fine-grained language instructions. Paper: arxiv.org/abs/2602.08440 Website: steervla.github.io
English
6
23
176
70.3K
Yanjiang Guo
Yanjiang Guo@Yanjiang_Guo·
Excited to share VLAW: Iterative Co-Improvement of Vision-Language-Action Policy and World Model We explore improving VLA inside a learned world model, and find that the key is to jointly improve VLA and WM! Website: sites.google.com/view/vlaw-arxiv
Yanjiang Guo tweet media
English
4
44
272
57.9K