Zhecheng Yuan

210 posts

Zhecheng Yuan

@fancy_yzc

PhD @Tsinghua University, IIIS. Interested in reinforcement learning, representation learning, robotics.

Katılım Temmuz 2021

618 Takip Edilen606 Takipçiler

Sabitlenmiş Tweet

Zhecheng Yuan@fancy_yzc·3 Eyl

👐How can we leverage multi-source human motion data, transform it into robot-feasible behaviors, and deploy it across diverse scenarios?  👤🤖Introduce 𝐇𝐄𝐑𝐌𝐄𝐒: a versatile human-to-robot embodied learning framework tailored for mobile bimanual dexterous manipulation.

English

174

24.3K

Zhecheng Yuan@fancy_yzc·2d

@Guozheng_Ma congrats!

English

133

Guozheng Ma@Guozheng_Ma·2d

Our "What Makes Value Learning Efficient in Residual RL?" accepted to #ICML2026 as a ✨Spotlight✨! Value learning silently fails in residual RL. We pinpoint why, and propose 𝐃𝐀𝐖𝐍: a minimal fix that delivers ~5× faster convergence across benchmarks, policies, and modalities. 📄 Preprint: arxiv.org/abs/2602.10539

English

3.4K

Zhecheng Yuan retweetledi

Oier Mees@oier_mees·3d

🎉 𝗺𝗶𝗺𝗶𝗰-𝘃𝗶𝗱𝗲𝗼 𝗵𝗮𝘀 𝗯𝗲𝗲𝗻 𝗮𝗰𝗰𝗲𝗽𝘁𝗲𝗱 𝘁𝗼 #𝗥𝗦𝗦𝟮𝟬𝟮𝟲 𝗶𝗻 𝗦𝘆𝗱𝗻𝗲𝘆! Sharing this one with a lot of pride. This work was led by my two students, @jjonpai and @Liam862373. Both are Master's students at @ETH who I have the privilege of advising at @Microsoft, and TAs for my new robot learning course at ETHZ. Watching them take an ambitious idea and carry it all the way to RSS has been one of my highlights of the past year. They are going to do extraordinary things. For those new to the work: mimic-video is a Video-Action Model that grounds robot policies in pretrained video models (NVIDIA Cosmos) instead of static image-text VLM backbones. The result is 𝟭𝟬𝘅 𝗯𝗲𝘁𝘁𝗲𝗿 𝘀𝗮𝗺𝗽𝗹𝗲 𝗲𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝗰𝘆 📉and 𝟮𝘅 𝗳𝗮𝘀𝘁𝗲𝗿 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 ⏩than standard VLAs, with performance that scales directly with video model quality, and no expensive video reconstruction at inference time. And we have recently 𝗼𝗽𝗲𝗻-𝘀𝗼𝘂𝗿𝗰𝗲𝗱 𝗲𝘃𝗲𝗿𝘆𝘁𝗵𝗶𝗻𝗴: code, models, the works. If you are building in robot learning, please take it, break it, build on it. We genuinely cannot wait to see where the community takes this. Last but not least: huge shoutout to the team at @mimicrobotics, they brought world-class robot systems, deep dexterity expertise and a genuine research partnership that shaped this work from day one. This is exactly the kind of partnerships between @Microsoft and cutting-edge robotics startups that I believe moves the field fastest. See you in Sydney! 🇦🇺 Code: github.com/mimic-video/mi… Project: mimic-video.github.io Paper: arxiv.org/abs/2512.15692

Oier Mees@oier_mees

Excited to introduce mimic-video, a new class of Video-Action Model that achieves 10x better data efficiency 📉and trains ⏩ 2x faster than standard VLAs! mimic-video.github.io

English

145

9.4K

Zhecheng Yuan retweetledi

Yifan Zhang@yifan_zhang_·25 Nis

Scaling KL-Regularized Policy Gradient and REINFORCE Is All You Need. Our ICLR 2026 paper, “On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning,” will be presented at Pavilion 4, Riocentro Convention and Event Center, today! Glad to see that V4 and V3.2 have adopted the corrected KL formulation presented in our paper. Project Page: github.com/complex-reason… Paper: arxiv.org/abs/2505.17508 It would be even better if they used the REINFORCE estimator instead of the GRPO estimator in future versions! IN REINFORCE WE TRUST.

English

331

24.8K

Zhecheng Yuan retweetledi

Minghuan Liu@ericliuof97·24 Nis

Now really an era of building!

Jake (softservo)@soft_servo

Codex to @sendcutsend! This STEP part was generated with GPT 5.5. By reusing the same python script, it was able to one-shot export a flat pattern DXF file for sheet metal cutting on sendcutsend. Can’t wait to see these parts IRL.

English

560

Zhecheng Yuan retweetledi

Chelsea Finn@chelseabfinn·23 Nis

RL fine-tuning often prematurely collapses LLM entropy. Poly-EPO is a scalable set-RL algorithm that optimizes for a set of accurate solutions with diverse reasoning strategies. Paper: arxiv.org/abs/2604.17654

Ifdita Hasan@ifdita_hasan

Deploying language models in scientific discovery domains requires extraordinary amounts of test-time compute for search algorithms. An ideal training algorithm should be designed with this goal in mind - that we want agents to learn how to not only exploit but also optimistically explore novel strategies. The agent should learn how to synergistically explore and exploit. We propose Poly-EPO, a set RL algorithm that explores and discovers diverse reasoning paths. Work with @jubayer_hamid (co-lead), Shreya, @ShirleyYXWu, @HengyuanH, @noahdgoodman, @DorsaSadigh, and @chelseabfinn.

English

397

49.1K

Zhecheng Yuan retweetledi

Paul Zhou@zhiyuan_zhou_·19 Ara

Do you ever find finetuning VLA overfits to the target task, to the point where generalist ability is lost and even minor deviations beyond the SFT data break the policy? We found an extremely simple solution: directly merge the base and finetuned policy in weight space 🤯 👇🧵

English

388

123.5K

Zhecheng Yuan retweetledi

Jean Mercat@MercatJean·22 Nis

Releasing VLA Foundry: an open-source framework that unifies LLM, VLM, and VLA training in a single codebase. End-to-end control from language pretraining to action-expert fine-tuning — no more stitching together incompatible repos.

English

490

73.1K

Zhecheng Yuan retweetledi

Xinhu Li@xinhuliusc·22 Nis

Teleoperating robots is difficult due to high DoF. As a result, most demonstrations are collected under interface or safety constraints (e.g., keyboard / joystick). For example, a joystick can move a robotic arm only in a 2D plane, even though the robot is able to operate in a higher-dimensional space. Direct imitation of such behavior leads to suboptimal robots! As a result, demonstrations are inherently suboptimal and limit what robots can learn.

GIF

English

9.1K

Zhecheng Yuan retweetledi

Haoran Xu✈️ICLR26@ryanxhr·20 Nis

Both offline RL and LLM RL fine-tuning can be formulated as behavior-regularized RL problems. We propose Value Grdient Flow (VGF), a new scalable and sample-efficient paradigam that treats behavior-regularized RL as an optimal transport problem. arxiv.org/abs/2604.14265 🧵[1/7]

GIF

English

176

13.2K

Zhecheng Yuan retweetledi

Yu Lei@_OutofMemory_·20 Nis

🤖Co-training is everywhere (sim↔real[e.g. GR00T, LBM], human↔robot[e.g. PI, EgoScale], even non-robot data[e.g. PI, LBM). But why does it work? How can we improve it further? Taking sim-and-real imitation learning in diffusion/ flow-based models as the test bed, we performed a rigorous mechanistic analysis, drawing on theoretical insights and multi-layered experiments. 😮Key insight: it’s all about representations. - Alignment → enables transfer - Discernibility → enables adaptation ⚖️Both are necessary — it's better to have more aligned representations, but the model must be able to discern the domains. We term this as structured representation alignment. ⬇️Let’s take a deep dive into that: Paper: arxiv.org/pdf/2604.13645 Website: science-of-co-training.github.io

English

382

59.7K

Zhecheng Yuan retweetledi

Ksenia_TuringPost@TheTuringPost·19 Nis

13+ Attention mechanisms you should know ▪️ Self-attention ▪️ Cross-attention ▪️ Causal attention ▪️ Linear Attention ▪️ Softmax attention ▪️ Sliding Window (local attention) ▪️ Global attention ▪️ FlashAttention ▪️ Multi-Head Attention (MHA) ▪️ Multi-Query Attention (MQA) ▪️ Grouped-Query Attention (GQA) ▪️ Multi-Head Latent Attention (MLA) ▪️ Interleaved Head Attention (IHA) + Slim Attention, KArAt, XAttention, Mixture-of-Depths Attention (MoDA) Save the list and explore more about them here: turingpost.com/p/attention-ty…

English

294

138K

Zhecheng Yuan retweetledi

Lucy Shi@lucy_x_shi·16 Nis

1/ We just released π0.7 — a steerable generalist robot model with emergent capabilities. I want to share a bit of the backstory, because π0.7 taught me something surprising about where robot learning is heading. A thread on bittersweet lessons 🧵

English

102

849

82K

Zhecheng Yuan retweetledi

Sergey Levine@svlevine·16 Nis

We finished evaluating π0.7, our new model at Physical Intelligence. What I'm most excited about with π0.7 is that it's starting to show some surprising emergent compositional generalization, being able to both perform complex tasks and learn new tasks just from instructions.

English

743

59.2K

Zhecheng Yuan@fancy_yzc·16 Nis

🔥

Boris Cherny@bcherny

Opus 4.7 is in Claude Code today. It's more agentic, more precise, and a lot better at long-running work. It carries context across sessions and handles ambiguity much better.

ART

Zhecheng Yuan retweetledi

Google DeepMind@GoogleDeepMind·14 Nis

We’re rolling out an upgrade designed to help robots reason about the physical world. 🤖 Gemini Robotics-ER 1.6 has significantly better visual and spatial understanding in order to plan and complete more useful tasks. Here’s why this is important 🧵

English

141

422

2.6K

544.5K

Zhecheng Yuan retweetledi

David Bar@observie·13 Nis

System identification (sysid) is the process of finding the physical parameters that make a simulation match reality. If you're training an RL locomotion policy in simulation, the accuracy of your motor model directly affects how well the policy transfers to the real robot. A recent git commit by @kevin_zakka added a sysid toolbox to MuJoCo which automates this process: you provide recorded motor data and a MuJoCo model, and it optimizes the model parameters to minimize the difference between simulated and real trajectories. For my @RobStride_com RS02 QDD motors (17 Nm peak, 7.75:1 gear), I built a Rust tool that sends multi-sine torque excitation at 1 kHz and records position/velocity feedback. I then feed this data into MuJoCo's sysid optimizer.

English

362

47.3K

Zhecheng Yuan retweetledi

Akshay 🚀@akshay_pachaar·12 Nis

A single 𝗖𝗟𝗔𝗨𝗗𝗘.𝗺𝗱 file just hit 15K GitHub stars. (derived from Karpathy's coding rules) Andrej Karpathy observed that LLMs make the same predictable mistakes when writing code: over-engineering, ignoring existing patterns, and adding dependencies you never asked for. If you've used AI coding assistants, you've hit all of these. But here's the thing: If the mistakes are predictable, you can prevent them with the right instructions. That's exactly what this 𝗖𝗟𝗔𝗨𝗗𝗘.𝗺𝗱 does. You drop one markdown file into your repo, and it gives Claude Code a structured set of behavioral guidelines for your entire project. This is a big deal. - Built entirely around prompt engineering for AI coding assistants - No framework, no complex tooling, just one .md file that shapes behavior Developers are moving past "use AI to write code" and into "engineer the AI's behavior so the code is actually good." The Claude Code ecosystem is growing fast, and the best tools in it aren't always software. Sometimes they're just well-crafted instructions. 100% open-source. I've shared a link to the GitHub repo in the next tweet!

English

746

7.7K

732.5K

Zhecheng Yuan retweetledi

Ankur Handa@ankurhandos·12 Nis

just created a repo for the prompts I used here if someone wants to try them out for fun: github.com/ankurhanda/cla… They are specific to build123 for now but should serve as good reference for various other CAD tools also.

English

643

Zhecheng Yuan retweetledi

Yongyuan Liang@cheryyun_l·11 Nis

In our new blog (written w/ @RchalYang), we discuss where, when, and how to make vibe agents come alive in the real world. cheryyunl.github.io/blog/vibe-agen… We look at two interfaces for bringing AI agents into the physical world: code (composable, transparent, but whose reliability hinges on API design and feedback) and action (fluid, contact-rich, but compounds errors and forgets). The hybrid of both is emerging, but the harness that closes the loop does not yet exist. Every roboticist in 2026 will need to think about which problems or tasks are fundamentally solvable by vibe agents, where that boundary lies, and what scaffolding and harness robots need.

English

10.1K

Zhecheng Yuan retweetledi

James Zou@james_y_zou·9 Nis

You can now give your agent deep knowledge of millions of papers in one line with #paperclip!📎 >8 million papers natively indexed for agents. Much more thorough + often 10x faster than standard deep research. Just add the paperclip mcp (instruction below).

English

210

207K

Keşfet

@Guozheng_Ma @jjonpai @Liam862373 @ETH @Microsoft @mimicrobotics @kevin_zakka @RobStride_com