Yu Lei

423 posts

Yu Lei

@_OutofMemory_

PhD student @UTCompSci | Learn to understand ourselves and build intelligence.🤖🧠👁️

Austin, TX Bergabung Temmuz 2023

2.3K Mengikuti608 Pengikut

Tweet Disematkan

Yu Lei@_OutofMemory_·20 Nis

🤖Co-training is everywhere (sim↔real[e.g. GR00T, LBM], human↔robot[e.g. PI, EgoScale], even non-robot data[e.g. PI, LBM). But why does it work? How can we improve it further? Taking sim-and-real imitation learning in diffusion/ flow-based models as the test bed, we performed a rigorous mechanistic analysis, drawing on theoretical insights and multi-layered experiments. 😮Key insight: it’s all about representations. - Alignment → enables transfer - Discernibility → enables adaptation ⚖️Both are necessary — it's better to have more aligned representations, but the model must be able to discern the domains. We term this as structured representation alignment. ⬇️Let’s take a deep dive into that: Paper: arxiv.org/pdf/2604.13645 Website: science-of-co-training.github.io

English

385

61.7K

Yu Lei me-retweet

Youngsun Wi@WiYoungsun·17 Şub

Dexterous hands vary widely—so do tactile modalities. 🖐️🌈 Our vision on tactile human-to-robot transfer: 🔓 Not tied to specific hardware ♻️ Reuse human tactile demos across embodiments Presenting TactAlign, a cross-sensor tactile alignment for cross-embodiment policy transfer.

English

173

32.9K

Yu Lei me-retweet

Demis Hassabis@demishassabis·6d

Gemini Omni is a major leap in world understanding & multimodal editing! It can take photos, video & audio and build entirely new scenes. Over time it’ll be able to handle any input & any output - starting w/ video You can even give it your own videos & iterate on your ideas:

English

378

944

9.5K

912.6K

Yu Lei me-retweet

Yuke Zhu@yukez·6d

I will be in Vienna in two weeks to give a keynote at #ICRA2026. I'll share our recent progress on building generalist humanoid robots and show some of the latest results. Check out my talk on June 3: 2026.ieee-icra.org/program/keynot…

English

Yu Lei me-retweet

Sapient Intelligence@Sapient_Int·19 May

Introducing HRM-Text. An ultra-lean 1B-parameter reasoning language model designed to deliver strong general performance with a fraction of the data, compute, and infrastructure. Trained on just 40B structured tokens, HRM-Text achieves competitive performance while using ~1/1000 of the training data of comparable models. The kicker? The full model trains in roughly one day on a $1,000 budget. This opens the door to a new generation of AI that is powerful, accessible, and radically easier to adapt. Theories and research concepts once deemed too expensive to test are officially back in the game. Sapient Intelligence invites you to help us shape a new paradigm for general intelligence.

English

150

435

489.2K

Yu Lei@_OutofMemory_·10 May

@wenhaocha1 @thoma_gu Deeply supervised(aligned) network

English

Wenhao Chai@wenhaocha1·10 May

@thoma_gu after chatting with mingyang. he think in pixel-space we could apply multiple representation alignment. and i think its true for both repa-like or fd-loss-like style. I totally agree repr is super important! just say pixel is not equals to no repr (?

English

912

Wenhao Chai@wenhaocha1·10 May

In my view, pixel space matters differently for VLMs and generative models (including UMMs). For VLMs we always fine-tune end-to-end, so I don't see a fundamental advantage to going encoder-free. For generative models, new features often hit a VAE/RAE ceiling; font rendering is the canonical example. And since end-to-end fine-tuning is off the table there, I suspect that on any product roadmap requiring sustained iteration, pixel may even be cheaper than latent. That's my take.

Jiatao Gu@thoma_gu

An encoder can be frozen or jointly trained with the backbone, the latter can be trained from scratch or finetuned from a different training stage. I don’t see these are huge distinctions. I think the original post is against the naive pixel patches and claiming to be NMMs.

English

19.7K

Yu Lei me-retweet

Figure@Figure_robot·8 May

We taught two F.03 robots to clean a room and make a bed in under 2 minutes - fully autonomous.

English

672

1.1K

8.4K

1.4M

Yu Lei@_OutofMemory_·8 May

@alpercanbe A culture.

English

Alper Canberk@alpercanbe·8 May

just a glimpse of the world we are building

Shy Yang@shyyang

I love having artists on the team

English

3.9K

Yu Lei@_OutofMemory_·8 May

Love this.

Danfei Xu@danfei_xu

Submit your CoRL workshop proposal! This year @RLioutikov and I wanted to make the workshop more "workshopy". Main changes are: - Half-day events only - Limited speaker slots - Challenge- and participation-driven - A post-workshop artifact (white paper, report, paper, etc.) summarizing the discussions

English

404

Yu Lei me-retweet

Zhengyi “Zen” Luo@zhengyiluo·8 May

Open-sourcing the whole package here! The last piece of our SONIC open-source, data collection, gr00t VLA post-training, inference just hit the repo! Train your Autonomous policies on G1 Whole-body with SONIC and gr00t N1.7! 🧑‍💻Code: github.com/NVlabs/GR00T-W… 📑Docs: nvlabs.github.io/GR00T-WholeBod…

Zhengyi “Zen” Luo@zhengyiluo

SONIC is now open-source! Generalist whole-body teleoperation for EVERYONE! Our team has long been building comprehensive pipelines for whole-body control, kinematic planner, and teleoperation, and they will all be shared. This will be a continuous update; inference code + model already there, training code and gr00t integration coming soon! Code: github.com/NVlabs/GR00T-W… Docs: nvlabs.github.io/GR00T-WholeBod… Site: nvlabs.github.io/GEAR-SONIC/

English

376

46.9K

Yu Lei me-retweet

Tairan He@TairanHe99·30 Nis

GR00T-VisualSim2Real is now open source! VIRAL and DoorMan are now available with training code, simulation assets, and the full recipe for bringing visual sim-to-real loco-manipulation skills to your own humanoids. Repo: github.com/NVlabs/GR00T-V…

Tairan He@TairanHe99

Zero teleoperation. Zero real-world data. ➔ Autonomous humanoid loco-manipulation in reality. Introducing VIRAL: Visual Sim-to-Real at Scale. We achieved 54 autonomous cycles (walk, stand, place, pick, turn) using a simple recipe: 1. RL 2. Simulation 3. GPUs Website: viral-humanoid.github.io Arxiv: arxiv.org/abs/2511.15200 Deep dive with me: 🧵

English

615

114.3K

Yu Lei me-retweet

Heng Yang@hankyang94·29 Nis

Just had a single-author paper accepted to #RSS2026! arxiv.org/abs/2604.21456 Motivated by growing interest in differentiable world models and physics simulators, we ask whether there is a unified principle for combining sampling-based global “exploration” with gradient-based local “exploitation” in trajectory and policy optimization with differentiable dynamics. By viewing control through the control-as-inference lens—recasting optimization as sampling from an unnormalized Boltzmann distribution defined by an energy function—Tempered Sequential Monte Carlo (TSMC) naturally integrates importance sampling with gradient-based Hamiltonian Monte Carlo. The key idea behind TSMC is to define a tempering path that gradually transforms an easy-to-sample prior into a complex, multi-modal posterior—or equivalently, deforms a convex energy landscape into a nonconvex one (graduated non-convexity)! We implement TSMC for both trajectory and policy optimization. On small- to medium-scale problems, it appears broadly applicable and compares favorably with state-of-the-art baselines. Excited to explore whether TSMC can scale to large-scale planning with complex, high-dimensional dynamics!

English

202

12.3K

Yu Lei@_OutofMemory_·28 Nis

@VectorWang2 Like it!!!

English

222

Vector Wang@VectorWang2·28 Nis

Building up a mini Tidybot for <$2k in total. More dynamic (realtime no speed up) and more robust Plugged into upcoming X-bot universe with OpenClaw agentic applications. github.com/TidyBot-Servic…

English

332

37.8K

Yu Lei@_OutofMemory_·27 Nis

@jiazhi_yang2024 Congrats jiazhi!

Indonesia

141

Jiazhi Yang@jiazhi_yang2024·27 Nis

🤗RISE got accepted to RSS 2026! Can't wait to see everybody in Sydney! 💻Code: github.com/OpenDriveLab/R…

Jiazhi Yang@jiazhi_yang2024

RISE (3/N) To address this bottleneck, we introduce RISE: Reinforcement learning via Imagination for SElf-improving robots. RISE shifts the learning environment from physical world to a Compositional World Model, which first emulates future observations for proposed actions, then evaluates imagined states to derive advantage for policy improvement.

English

4.9K

Yu Lei me-retweet

Yuandong Tian@tydsh·26 Nis

@MingchenZhuge Thanks for inviting me for the talk and the panel discussion! It was super fun! Talk slides in yuandong-tian.com/talks/rsi_work…. Thanks for promoting my book as well 😄

Mingchen Zhuge@MingchenZhuge

@tydsh always enjoy your presentations, whether at workshops or podcasts, as well as your insights on post-training, RSI, and even your sci-fi writing. 🥳🥳🥳 ~ recursive-workshop.github.io #RSI #ICLR2026 #破晓之钟

English

4.7K

Yu Lei@_OutofMemory_·27 Nis

@kevin_zakka @pabbeel congrats Kevin!!

English

191

Kevin Zakka@kevin_zakka·26 Nis

Gave my PhD dissertation talk on Friday! It's been an incredible journey made possible by the best advisor who believed in me and gave me the freedom and support to explore. Thank you @pabbeel! And thank you to everyone who came to support and share this milestone with me 🙏

English

654

31.5K

Yu Lei@_OutofMemory_·25 Nis

@tongzhou_mu We just say the same words🙂

English

Tongzhou Mu 🤖🦾🦿@tongzhou_mu·25 Nis

this is my dream

Neuralink@neuralink

We are working to restore mobility that was lost due to disease or spinal cord injury by allowing participants to control robotic arms with their thoughts. See how this is possible.

English

2.5K

Yu Lei@_OutofMemory_·25 Nis

My dream of robot learning research.

Neuralink@neuralink

We are working to restore mobility that was lost due to disease or spinal cord injury by allowing participants to control robotic arms with their thoughts. See how this is possible.

English

1.4K

Yu Lei me-retweet

Rhoda AI@RhodaAI·23 Nis

Teaching a robot a new task typically means stopping operations, collecting teleoperated demonstrations, and retraining. That process takes hours at a minimum. We wanted to know if we could collapse it to seconds — from a single human demo, on the fly, no retraining required. Early research preview: we can.

English

7.2K

Yu Lei me-retweet

Galbot@GalbotRobotics·23 Nis

Introducing LDA, a latent world action foundation model that, for the first time, unifies the utilization of heterogeneous embodied data across simulation and reality, humans and robots, and varying levels of action quality and annotation. By breaking long-standing data silos in embodied intelligence, LDA enables the field, much like GPT did for language, to benefit continuously from scaling data, marking the transition into a new era of scalable learning. #Galbot #Robotics #Innovation #AI #Technology #Humanoid #WorldModel