Yu Lei

423 posts

Yu Lei banner
Yu Lei

Yu Lei

@_OutofMemory_

PhD student @UTCompSci | Learn to understand ourselves and build intelligence.🤖🧠👁️

Austin, TX Bergabung Temmuz 2023
2.3K Mengikuti608 Pengikut
Tweet Disematkan
Yu Lei
Yu Lei@_OutofMemory_·
🤖Co-training is everywhere (sim↔real[e.g. GR00T, LBM], human↔robot[e.g. PI, EgoScale], even non-robot data[e.g. PI, LBM). But why does it work? How can we improve it further? Taking sim-and-real imitation learning in diffusion/ flow-based models as the test bed, we performed a rigorous mechanistic analysis, drawing on theoretical insights and multi-layered experiments. 😮Key insight: it’s all about representations. - Alignment → enables transfer - Discernibility → enables adaptation ⚖️Both are necessary — it's better to have more aligned representations, but the model must be able to discern the domains. We term this as structured representation alignment. ⬇️Let’s take a deep dive into that: Paper: arxiv.org/pdf/2604.13645 Website: science-of-co-training.github.io
Yu Lei tweet media
English
5
66
385
61.7K
Yu Lei me-retweet
Youngsun Wi
Youngsun Wi@WiYoungsun·
Dexterous hands vary widely—so do tactile modalities. 🖐️🌈 Our vision on tactile human-to-robot transfer: 🔓 Not tied to specific hardware ♻️ Reuse human tactile demos across embodiments Presenting TactAlign, a cross-sensor tactile alignment for cross-embodiment policy transfer.
English
5
36
173
32.9K
Yu Lei me-retweet
Demis Hassabis
Demis Hassabis@demishassabis·
Gemini Omni is a major leap in world understanding & multimodal editing! It can take photos, video & audio and build entirely new scenes. Over time it’ll be able to handle any input & any output - starting w/ video You can even give it your own videos & iterate on your ideas:
English
378
944
9.5K
912.6K
Yu Lei me-retweet
Yuke Zhu
Yuke Zhu@yukez·
I will be in Vienna in two weeks to give a keynote at #ICRA2026. I'll share our recent progress on building generalist humanoid robots and show some of the latest results. Check out my talk on June 3: 2026.ieee-icra.org/program/keynot…
Yuke Zhu tweet media
English
3
5
89
5K
Yu Lei me-retweet
Sapient Intelligence
Sapient Intelligence@Sapient_Int·
Introducing HRM-Text. An ultra-lean 1B-parameter reasoning language model designed to deliver strong general performance with a fraction of the data, compute, and infrastructure. Trained on just 40B structured tokens, HRM-Text achieves competitive performance while using ~1/1000 of the training data of comparable models. The kicker? The full model trains in roughly one day on a $1,000 budget. This opens the door to a new generation of AI that is powerful, accessible, and radically easier to adapt. Theories and research concepts once deemed too expensive to test are officially back in the game. Sapient Intelligence invites you to help us shape a new paradigm for general intelligence.
English
150
435
3K
489.2K
Wenhao Chai
Wenhao Chai@wenhaocha1·
@thoma_gu after chatting with mingyang. he think in pixel-space we could apply multiple representation alignment. and i think its true for both repa-like or fd-loss-like style. I totally agree repr is super important! just say pixel is not equals to no repr (?
English
2
0
8
912
Wenhao Chai
Wenhao Chai@wenhaocha1·
In my view, pixel space matters differently for VLMs and generative models (including UMMs). For VLMs we always fine-tune end-to-end, so I don't see a fundamental advantage to going encoder-free. For generative models, new features often hit a VAE/RAE ceiling; font rendering is the canonical example. And since end-to-end fine-tuning is off the table there, I suspect that on any product roadmap requiring sustained iteration, pixel may even be cheaper than latent. That's my take.
Jiatao Gu@thoma_gu

An encoder can be frozen or jointly trained with the backbone, the latter can be trained from scratch or finetuned from a different training stage. I don’t see these are huge distinctions. I think the original post is against the naive pixel patches and claiming to be NMMs.

English
3
5
73
19.7K
Yu Lei me-retweet
Figure
Figure@Figure_robot·
We taught two F.03 robots to clean a room and make a bed in under 2 minutes - fully autonomous.
English
672
1.1K
8.4K
1.4M
Yu Lei me-retweet
Yu Lei me-retweet
Tairan He
Tairan He@TairanHe99·
GR00T-VisualSim2Real is now open source! VIRAL and DoorMan are now available with training code, simulation assets, and the full recipe for bringing visual sim-to-real loco-manipulation skills to your own humanoids. Repo: github.com/NVlabs/GR00T-V…
Tairan He@TairanHe99

Zero teleoperation. Zero real-world data. ➔ Autonomous humanoid loco-manipulation in reality. Introducing VIRAL: Visual Sim-to-Real at Scale. We achieved 54 autonomous cycles (walk, stand, place, pick, turn) using a simple recipe: 1. RL 2. Simulation 3. GPUs Website: viral-humanoid.github.io Arxiv: arxiv.org/abs/2511.15200 Deep dive with me: 🧵

English
6
98
615
114.3K
Yu Lei me-retweet
Heng Yang
Heng Yang@hankyang94·
Just had a single-author paper accepted to #RSS2026! arxiv.org/abs/2604.21456 Motivated by growing interest in differentiable world models and physics simulators, we ask whether there is a unified principle for combining sampling-based global “exploration” with gradient-based local “exploitation” in trajectory and policy optimization with differentiable dynamics. By viewing control through the control-as-inference lens—recasting optimization as sampling from an unnormalized Boltzmann distribution defined by an energy function—Tempered Sequential Monte Carlo (TSMC) naturally integrates importance sampling with gradient-based Hamiltonian Monte Carlo. The key idea behind TSMC is to define a tempering path that gradually transforms an easy-to-sample prior into a complex, multi-modal posterior—or equivalently, deforms a convex energy landscape into a nonconvex one (graduated non-convexity)! We implement TSMC for both trajectory and policy optimization. On small- to medium-scale problems, it appears broadly applicable and compares favorably with state-of-the-art baselines. Excited to explore whether TSMC can scale to large-scale planning with complex, high-dimensional dynamics!
Heng Yang tweet media
English
4
20
202
12.3K
Vector Wang
Vector Wang@VectorWang2·
Building up a mini Tidybot for <$2k in total. More dynamic (realtime no speed up) and more robust Plugged into upcoming X-bot universe with OpenClaw agentic applications. github.com/TidyBot-Servic…
English
16
31
332
37.8K
Yu Lei me-retweet
Yuandong Tian
Yuandong Tian@tydsh·
@MingchenZhuge Thanks for inviting me for the talk and the panel discussion! It was super fun! Talk slides in yuandong-tian.com/talks/rsi_work…. Thanks for promoting my book as well 😄
Mingchen Zhuge@MingchenZhuge

@tydsh always enjoy your presentations, whether at workshops or podcasts, as well as your insights on post-training, RSI, and even your sci-fi writing. 🥳🥳🥳 ~ recursive-workshop.github.io #RSI #ICLR2026 #破晓之钟

English
1
4
28
4.7K
Kevin Zakka
Kevin Zakka@kevin_zakka·
Gave my PhD dissertation talk on Friday! It's been an incredible journey made possible by the best advisor who believed in me and gave me the freedom and support to explore. Thank you @pabbeel! And thank you to everyone who came to support and share this milestone with me 🙏
Kevin Zakka tweet media
English
63
15
654
31.5K
Yu Lei
Yu Lei@_OutofMemory_·
@tongzhou_mu We just say the same words🙂
English
0
0
0
62
Yu Lei me-retweet
Rhoda AI
Rhoda AI@RhodaAI·
Teaching a robot a new task typically means stopping operations, collecting teleoperated demonstrations, and retraining. That process takes hours at a minimum. We wanted to know if we could collapse it to seconds — from a single human demo, on the fly, no retraining required. Early research preview: we can.
English
9
14
84
7.2K
Yu Lei me-retweet
Galbot
Galbot@GalbotRobotics·
Introducing LDA, a latent world action foundation model that, for the first time, unifies the utilization of heterogeneous embodied data across simulation and reality, humans and robots, and varying levels of action quality and annotation. By breaking long-standing data silos in embodied intelligence, LDA enables the field, much like GPT did for language, to benefit continuously from scaling data, marking the transition into a new era of scalable learning. #Galbot #Robotics #Innovation #AI #Technology #Humanoid #WorldModel
English
5
39
244
37.4K