Tsung-Yi Lin

140 posts

Tsung-Yi Lin

@TsungYiLinCV

Principal Research Scientist @Nvidia | Ex-@Google Brain Team | Computer Vision & Machine Learning

Katılım Kasım 2018

366 Takip Edilen2.5K Takipçiler

Sabitlenmiş Tweet

Tsung-Yi Lin@TsungYiLinCV·1 Eki

Honored that COCO received the Koendrink Prize at ECCV 2024. It’s been incredible to witness advancements driven by well curated data over the past decade. I'm excited for the future of multi-modal understanding and generation—data will remain key, and we’re just getting started.

English

147

11.4K

Tsung-Yi Lin@TsungYiLinCV·7 Nis

@peteflorence Great read! Exciting developments ahead when we put physical AI as the first class citizen!!

English

1.9K

Pete Florence@peteflorence·7 Nis

x.com/i/article/2041…

ZXX

158

1.1K

310.2K

Tsung-Yi Lin@TsungYiLinCV·18 Şub

@yen_chen_lin Hardware and data will inevitably scale. The real leverage now is in better CV representations that help us scale data faster. Those intermediates probably won’t survive into the final model—but they’ll accelerate the path there.

English

105

Yen-Chen Lin@yen_chen_lin·18 Şub

@TsungYiLinCV and I often joke "Robot learning is hard mostly due to robot, not learning." If we have low-cost, versatile robots, we can collect action-perception data at scale and reuse the scaling recipe developed for video/LLMs.

Wenlong Huang@wenlong_huang

Fully agreed with the sentiment that much of computer vision research (concretely, those not for “human consumption”) should be grounded in robotics. But as a robotics researcher, I think the more nuanced question is: how can we *rethink* these intermediate representations for embodied intelligence rather than discarding them? Why? The challenge, as also pointed out in Vincent’s article, is precisely the lack of perception-action data at scale. This is why intermediate representations IMO are *preferable rather than obsolete* because they open up training from scalable data sources. This can include even the vision/language encoders people love and use in robot learning — it’s hard to imagine training low-level visual representation or high-level language understanding purely from limited robot data. The same goes for intermediate representations at the structure level — world modeling, learning from Internet videos, learning from humans, and simulation — many of which still rely on 3D representations too.

English

1.2K

Tsung-Yi Lin@TsungYiLinCV·12 Şub

Robot learning requires rich, diverse environments for interaction. Check out @hongchix’s work on scaling 3D environments to train robots effectively.

Hongchi Xia@hongchix

Here we introduce SAGE: Scalable Agentic 3D Scene Generation for Embodied AI, which can generate sim-ready 3D scenes with agents following user demands at scale, ready for robotic action generation. Paper, code, and SAGE-10k dataset are all released! nvlabs.github.io/sage/

English

5.4K

Tsung-Yi Lin retweetledi

NVIDIA Robotics@NVIDIARobotics·29 Oca

Cosmos Policy just dropped for robotics. 🤖 Cutting edge research is turning a world foundation model into a unified robot brain that can see, predict, and act—no extra action heads, no complicated control stack. Read our blog on @HuggingFace ➡️ nvda.ws/3MfqiPX Want to get hands-on with Cosmos (Reason, Predict, Policy, Cookbook)? Join the Cosmos Cookoff, sponsored by @nebiusai and @milestonesys ➡️ nvda.ws/4a12Z4f

English

394

14.2K

Tsung-Yi Lin retweetledi

Moo Jin Kim@moo_jin_kim·24 Oca

We release Cosmos Policy 💫: a state-of-the-art robot policy built on a video diffusion model backbone. - policy + world model + value function — in 1 model - no architectural changes to the base video model - SOTA in LIBERO (98.5%), RoboCasa (67.1%), & ALOHA tasks (93.6%) 🧵👇

English

109

863

146.9K

Tsung-Yi Lin@TsungYiLinCV·6 Oca

Cosmos Reason 2 is already powering video analytics AI agents, autonomous vehicles, and robots, and works hand-in-hand with our newest Cosmos releases: Predict 2.5, Transfer 2.5-2B, and the NVIDIA GR00T N1.6 robot foundation model!

English

208

Tsung-Yi Lin@TsungYiLinCV·6 Oca

I’m thrilled to share that Cosmos Reason 2 is here, our latest open, high-accuracy reasoning vision-language model for physical AI. Read our blog to learn more 📖 huggingface.co/blog/nvidia/nv… Download Cosmos Reason 2 👉 huggingface.co/nvidia/Cosmos-…

English

2.7K

Tsung-Yi Lin retweetledi

Max Li 李赵硕@mli0603·5 Oca

Our theoretical upper bound reaches 0.611. we believe that scaling model capacity could further absorb and consolidate this knowledge. But we’re still far from solving the BEHAVIOR benchmark. We hope this work provides a strong and practical starting point for the community.

English

150

Tsung-Yi Lin retweetledi

NVIDIA AI Developer@NVIDIAAIDev·5 Ara

Today’s Cosmos Cookbook Special 🍽️ A recipe to post-train Cosmos Reason into a physics-savvy critic that judges whether generated videos obey real-world physics. 📖 Score videos for physical plausibility 📖 Detect physically inaccurate issues like impossible trajectories or bad collisions 📖 Incorporate physics‑aware rewards into your generation or RL loops to keep the models grounded Read the full recipe 📖 nvda.ws/4pTI0GZ

English

2.5K

Tsung-Yi Lin retweetledi

Fei-Fei Li@drfeifei·7 Ara

🎉 How do we measure the rapid progress of robotic learning and embodied AI research? The 1st BEHAVIOR challenge results are out! And we're to see such strong performance on 50 challenging household tasks. Congrats to the winning teams! 🥇Robot Learning Collective 🥈Comet 🥉SimpleAI Robot Leaderboard: shorturl.at/xaAlU (1/N)

English

557

107.1K

Tsung-Yi Lin@TsungYiLinCV·7 Ara

Join us! @mli0603 is going to share the recipe for BEHAVIOR challenge at 11am behavior.stanford.edu/challenge/inde…

Max Li 李赵硕@mli0603

Our team won 2nd place for the BEHAVIOR challenge at NeurIPS🏅I’ll present our team’s solution Sunday, feel free to stop by! Event time: 11:00 AM - 1:45 PM PST, December 7 Event link: luma.com/9r2nskbz GitHub link: github.com/mli0603/openpi…

English

244

Tsung-Yi Lin retweetledi

Haotian Ye@haotian_yeee·6 Ara

🤔Want a principled way to RL your diffusion model? Check Data-regularized Reinforcement Learning (DDRL)! Post-train @nvidia #Cosmos World Foundation models with a million GPU hours! 🤯 Novel formulation ➡️ Theoretically integrates SFT into RL ➡️ Robust to Reward Hacking 🛑 Details: research.nvidia.com/labs/dir/ddrl/ #DDRL #Diffusion #RL #NVIDIA #Cosmos

English

266

76K

Tsung-Yi Lin retweetledi

Max Li 李赵硕@mli0603·26 Kas

This is a really smart setup for evaluating forward and inverse world modeling with VLMs💡— congrats on the paper! I also really appreciate the deep dive into Cosmos-Reason1. Lots of insightful details to learn from 📖

Qineng Wang@qineng_wang

Most VLM benchmarks watch the world; few ask how actions *change* it from a robot's eye. Embodied cognition tells us that intelligence isn't just watching – it's enacted through interaction. 👉We introduce ENACT: A benchmark that tests if VLMs can track the evolution of a home-scale environment from a robot's egocentric view. 🌐enact-embodied-cognition.github.io 📄enact-embodied-cognition.github.io/enact.pdf 1/N

English

Tsung-Yi Lin retweetledi

Marco Pavone@drmapavone·3 Kas

Excited to unveil @nvidia's latest work on #Reasoning Vision–Language–Action (#VLA) models — Alpamayo-R1! Alpamayo-R1 is a new #reasoning VLA architecture featuring a diffusion-based action expert built on top of the #Cosmos-#Reason backbone. It represents one of the core technologies driving NVIDIA’s push toward Level 4 autonomy and robotaxis (nvidianews.nvidia.com/news/nvidia-ub…), as announced by Jensen Huang at #gtc DC last week. 📄 Paper: Alpamayo-R1 research.nvidia.com/publication/20… We present: - Architecture & Design: How to transform a VLM into a driving-ready Reasoning VLA - Chain of Causation Labeling: A new framework enabling reasoning-based learning - Training Strategy: From internet-scale pre-training → AV-specific SFT → RL-based post-training - Extensive Evaluation: From closed-loop simulation to real-world, on-vehicle testing 📈 Results: Alpamayo-R1 delivers significant performance gains over end-to-end baselines — especially in rare, safety-critical scenarios — all while maintaining real-time inference (99 ms end-to-end latency). Coming soon: releases of model variants and reasoning metadata built on top of the Physical AI Dataset (huggingface.co/datasets/nvidi…)—with more updates on the way. Stay tuned! 🙌 Huge thanks to Wenjie Luo and @yan_wang_9 (project co-leads); the @nvidia AV Research team (@iamborisi, @YurongYou, @xinshuoweng, @tianran_, @wenhaoding95, and many others); collaborators across @nvidia Research (@liu_mingyu, @visualyang, @PavloMolchanov, and many others); and the @nvidia AV Product team (Sarah Tariq, Patrick Liu, Jack Huang, and many more). Full contributor list in the Appendix. @NVIDIADRIVE @NVIDIAAI

English

234

36.6K

Tsung-Yi Lin retweetledi

NVIDIA AI Developer@NVIDIAAIDev·29 Eki

NVIDIA Cosmos open models made major progress.✨ ✅ Cosmos Predict 2.5 unifies text, image, and video world generation into one model that creates longer and more coherent simulations with improved grounding and efficiency. ✅ Cosmos Transfer 2.5 introduces precise, spatially controlled world transformations that are 3.5× smaller, faster, and higher in fidelity than before. Together, these models push the boundaries of physical AI, enabling robots and agents to learn, reason, and operate in dynamically simulated worlds. Read the @HuggingFace blog. 🔗huggingface.co/blog/nvidia/co… #NVIDIAGTC

English

176

12.4K

Tsung-Yi Lin@TsungYiLinCV·1 Eki

Training Physical AI agents depends on rich environments. Simulating diverse worlds is key to speeding up progress—excited to see @moonlake pushing this forward!

Moonlake@moonlake

We raised $28M seed from Threshold Ventures, AIX Ventures, and NVentures (Nvidia's venture capital arm) —alongside 10+ unicorn founders and top AI researchers— to build reasoning models that generate real-time simulations and games. Models are bottlenecked by practical simulations that can act as Reinforcement Learning environments. Human self-expression is bounded by tools that let us create alternate realities. At Moonlake, we are building a future where anyone can create interactive worlds, bring their child-like wonder to life, learn within them, and most importantly, share experiences with people we care about. More in 🧵

English

2.1K

Tsung-Yi Lin@TsungYiLinCV·1 Eki

@LiamFedus @ekindogus @periodiclabs Congrats @LiamFedus and @ekindogus ! Incredible mission to pursue!!

English

William Fedus@LiamFedus·30 Eyl

Today, @ekindogus and I are excited to introduce @periodiclabs. Our goal is to create an AI scientist. Science works by conjecturing how the world might be, running experiments, and learning from the results. Intelligence is necessary, but not sufficient. New knowledge is created when ideas are found to be consistent with reality. And so, at Periodic, we are building AI scientists and the autonomous laboratories for them to operate. Until now, scientific AI advances have come from models trained on the internet. But despite its vastness — it’s still finite (estimates are ~10T text tokens where one English word may be 1-2 tokens). And in recent years the best frontier AI models have fully exhausted it. Researchers seek better use of this data, but as any scientist knows: though re-reading a textbook may give new insights, they eventually need to try their idea to see if it holds. Autonomous labs are central to our strategy. They provide huge amounts of high-quality data (each experiment can produce GBs of data!) that exists nowhere else. They generate valuable negative results which are seldom published. But most importantly, they give our AI scientists the tools to act. We’re starting in the physical sciences. Technological progress is limited by our ability to design the physical world. We’re starting here because experiments have high signal-to-noise and are (relatively) fast, physical simulations effectively model many systems, but more broadly, physics is a verifiable environment. AI has progressed fastest in domains with data and verifiable results - for example, in math and code. Here, nature is the RL environment. One of our goals is to discover superconductors that work at higher temperatures than today's materials. Significant advances could help us create next-generation transportation and build power grids with minimal losses. But this is just one example — if we can automate materials design, we have the potential to accelerate Moore’s Law, space travel, and nuclear fusion. We’re also working to deploy our solutions with industry. As an example, we're helping a semiconductor manufacturer that is facing issues with heat dissipation on their chips. We’re training custom agents for their engineers and researchers to make sense of their experimental data in order to iterate faster. Our founding team co-created ChatGPT, DeepMind’s GNoME, OpenAI’s Operator (now Agent), the neural attention mechanism, MatterGen; have scaled autonomous physics labs; and have contributed to some of the most important materials discoveries of the last decade. We’ve come together to scale up and reimagine how science is done. We’re fortunate to be backed by investors who share our vision, including @a16z who led our $300M round, as well as @Felicis, DST Global, NVentures (NVIDIA’s venture capital arm), @Accel and individuals including @JeffBezos , @eladgil , @ericschmidt, and @JeffDean. Their support will help us grow our team, scale our labs, and develop the first generation of AI scientists.

English

427

438

4.2K

3.5M

Tsung-Yi Lin retweetledi

NVIDIA Robotics@NVIDIARobotics·8 Ağu

Facing data bottlenecks in your robotics workflows? Explore how #NVIDIACosmos world foundation models from #NVIDIAResearch can be post trained for specific #PhysicalAI applications: 🔮 Cosmos Predict to simulate future scenarios. 🎨 Cosmos Transfer to create diverse synthetic environments. 💡 Cosmos Reason to enable advanced robotic decision-making. Learn more 👉 nvda.ws/4osm5Xt

English

3.6K

Tsung-Yi Lin@TsungYiLinCV·28 Ağu

🚀Earlier this year we launched Cosmos-Reason1 — and it just climbed to #1 on the new Physical Reasoning Leaderboard, released alongside V-JEPA 2! 🤗Try it out: huggingface.co/nvidia/Cosmos-…

NVIDIA AI Developer@NVIDIAAIDev

Ranked #1 on @Meta's Physical Reasoning Leaderboard on @huggingface for a reason. 👏 🔥 🏆 Cosmos Reason enables robots and AI agents to reason like humans by leveraging prior knowledge, physics, and common sense to intelligently interact with the real world. This state-of-the-art reasoning VLM excels in physical AI applications like: 📊 Data curation and annotation 🤖 Robot planning and reasoning ▶️ Video analytics AI agents See the leaderboard → nvda.ws/4mLUmjd Check out Cosmos Reason → nvda.ws/425mMfF

English

1.9K

Tsung-Yi Lin retweetledi

Hanzi Mao@hanna_mao·28 Haz

We build Cosmos-Predict2 as a world foundation model for Physical AI builders — fully open and adaptable. Post-train it for specialized tasks or different output types. Available in multiple sizes, resolutions, and frame rates. 📷 Watch the repo walkthrough youtube.com/watch?v=ibnVm6… ⚒️ Visit github.com/nvidia-cosmos/… for more #NVIDIACosmos #PhysicalAI

YouTube

English

278

31.8K

Keşfet

@peteflorence @yen_chen_lin @hongchix @HuggingFace @nebiusai @milestonesys @mli0603 @nvidia