Zhecheng Yuan

210 posts

Zhecheng Yuan banner
Zhecheng Yuan

Zhecheng Yuan

@fancy_yzc

PhD @Tsinghua University, IIIS. Interested in reinforcement learning, representation learning, robotics.

Katılım Temmuz 2021
618 Takip Edilen606 Takipçiler
Sabitlenmiş Tweet
Zhecheng Yuan
Zhecheng Yuan@fancy_yzc·
👐How can we leverage multi-source human motion data, transform it into robot-feasible behaviors, and deploy it across diverse scenarios? 
👤🤖Introduce 𝐇𝐄𝐑𝐌𝐄𝐒: a versatile human-to-robot embodied learning framework tailored for mobile bimanual dexterous manipulation.
English
8
41
174
24.3K
Guozheng Ma
Guozheng Ma@Guozheng_Ma·
Our "What Makes Value Learning Efficient in Residual RL?" accepted to #ICML2026 as a ✨Spotlight✨! Value learning silently fails in residual RL. We pinpoint why, and propose 𝐃𝐀𝐖𝐍: a minimal fix that delivers ~5× faster convergence across benchmarks, policies, and modalities. 📄 Preprint: arxiv.org/abs/2602.10539
Guozheng Ma tweet media
English
4
7
49
3.4K
Zhecheng Yuan retweetledi
Oier Mees
Oier Mees@oier_mees·
🎉 𝗺𝗶𝗺𝗶𝗰-𝘃𝗶𝗱𝗲𝗼 𝗵𝗮𝘀 𝗯𝗲𝗲𝗻 𝗮𝗰𝗰𝗲𝗽𝘁𝗲𝗱 𝘁𝗼 #𝗥𝗦𝗦𝟮𝟬𝟮𝟲 𝗶𝗻 𝗦𝘆𝗱𝗻𝗲𝘆! Sharing this one with a lot of pride. This work was led by my two students, @jjonpai and @Liam862373. Both are Master's students at @ETH who I have the privilege of advising at @Microsoft, and TAs for my new robot learning course at ETHZ. Watching them take an ambitious idea and carry it all the way to RSS has been one of my highlights of the past year. They are going to do extraordinary things. For those new to the work: mimic-video is a Video-Action Model that grounds robot policies in pretrained video models (NVIDIA Cosmos) instead of static image-text VLM backbones. The result is 𝟭𝟬𝘅 𝗯𝗲𝘁𝘁𝗲𝗿 𝘀𝗮𝗺𝗽𝗹𝗲 𝗲𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝗰𝘆 📉and 𝟮𝘅 𝗳𝗮𝘀𝘁𝗲𝗿 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 ⏩than standard VLAs, with performance that scales directly with video model quality, and no expensive video reconstruction at inference time. And we have recently 𝗼𝗽𝗲𝗻-𝘀𝗼𝘂𝗿𝗰𝗲𝗱 𝗲𝘃𝗲𝗿𝘆𝘁𝗵𝗶𝗻𝗴: code, models, the works. If you are building in robot learning, please take it, break it, build on it. We genuinely cannot wait to see where the community takes this. Last but not least: huge shoutout to the team at @mimicrobotics, they brought world-class robot systems, deep dexterity expertise and a genuine research partnership that shaped this work from day one. This is exactly the kind of partnerships between @Microsoft and cutting-edge robotics startups that I believe moves the field fastest. See you in Sydney! 🇦🇺 Code: github.com/mimic-video/mi… Project: mimic-video.github.io Paper: arxiv.org/abs/2512.15692
Oier Mees@oier_mees

Excited to introduce mimic-video, a new class of Video-Action Model that achieves 10x better data efficiency 📉and trains ⏩ 2x faster than standard VLAs! mimic-video.github.io

English
5
25
145
9.4K
Zhecheng Yuan retweetledi
Yifan Zhang
Yifan Zhang@yifan_zhang_·
Scaling KL-Regularized Policy Gradient and REINFORCE Is All You Need. Our ICLR 2026 paper, “On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning,” will be presented at Pavilion 4, Riocentro Convention and Event Center, today! Glad to see that V4 and V3.2 have adopted the corrected KL formulation presented in our paper. Project Page: github.com/complex-reason… Paper: arxiv.org/abs/2505.17508 It would be even better if they used the REINFORCE estimator instead of the GRPO estimator in future versions! IN REINFORCE WE TRUST.
Yifan Zhang tweet media
English
6
37
331
24.8K
Zhecheng Yuan retweetledi
Minghuan Liu
Minghuan Liu@ericliuof97·
Now really an era of building!
Jake (softservo)@soft_servo

Codex to @sendcutsend! This STEP part was generated with GPT 5.5. By reusing the same python script, it was able to one-shot export a flat pattern DXF file for sheet metal cutting on sendcutsend. Can’t wait to see these parts IRL.

English
0
1
2
560
Zhecheng Yuan retweetledi
Zhecheng Yuan retweetledi
Paul Zhou
Paul Zhou@zhiyuan_zhou_·
Do you ever find finetuning VLA overfits to the target task, to the point where generalist ability is lost and even minor deviations beyond the SFT data break the policy? We found an extremely simple solution: directly merge the base and finetuned policy in weight space 🤯 👇🧵
English
7
50
388
123.5K
Zhecheng Yuan retweetledi
Jean Mercat
Jean Mercat@MercatJean·
Releasing VLA Foundry: an open-source framework that unifies LLM, VLM, and VLA training in a single codebase. End-to-end control from language pretraining to action-expert fine-tuning — no more stitching together incompatible repos.
English
10
76
490
73.1K
Zhecheng Yuan retweetledi
Xinhu Li
Xinhu Li@xinhuliusc·
Teleoperating robots is difficult due to high DoF. As a result, most demonstrations are collected under interface or safety constraints (e.g., keyboard / joystick). For example, a joystick can move a robotic arm only in a 2D plane, even though the robot is able to operate in a higher-dimensional space. Direct imitation of such behavior leads to suboptimal robots! As a result, demonstrations are inherently suboptimal and limit what robots can learn.
GIF
Xinhu Li tweet media
English
4
3
22
9.1K
Zhecheng Yuan retweetledi
Haoran Xu✈️ICLR26
Haoran Xu✈️ICLR26@ryanxhr·
Both offline RL and LLM RL fine-tuning can be formulated as behavior-regularized RL problems. We propose Value Grdient Flow (VGF), a new scalable and sample-efficient paradigam that treats behavior-regularized RL as an optimal transport problem. arxiv.org/abs/2604.14265 🧵[1/7]
GIF
English
3
23
176
13.2K
Zhecheng Yuan retweetledi
Yu Lei
Yu Lei@_OutofMemory_·
🤖Co-training is everywhere (sim↔real[e.g. GR00T, LBM], human↔robot[e.g. PI, EgoScale], even non-robot data[e.g. PI, LBM). But why does it work? How can we improve it further? Taking sim-and-real imitation learning in diffusion/ flow-based models as the test bed, we performed a rigorous mechanistic analysis, drawing on theoretical insights and multi-layered experiments. 😮Key insight: it’s all about representations. - Alignment → enables transfer - Discernibility → enables adaptation ⚖️Both are necessary — it's better to have more aligned representations, but the model must be able to discern the domains. We term this as structured representation alignment. ⬇️Let’s take a deep dive into that: Paper: arxiv.org/pdf/2604.13645 Website: science-of-co-training.github.io
Yu Lei tweet media
English
5
66
382
59.7K
Zhecheng Yuan retweetledi
Ksenia_TuringPost
Ksenia_TuringPost@TheTuringPost·
13+ Attention mechanisms you should know ▪️ Self-attention ▪️ Cross-attention ▪️ Causal attention ▪️ Linear Attention ▪️ Softmax attention ▪️ Sliding Window (local attention) ▪️ Global attention ▪️ FlashAttention ▪️ Multi-Head Attention (MHA) ▪️ Multi-Query Attention (MQA) ▪️ Grouped-Query Attention (GQA) ▪️ Multi-Head Latent Attention (MLA) ▪️ Interleaved Head Attention (IHA) + Slim Attention, KArAt, XAttention, Mixture-of-Depths Attention (MoDA) Save the list and explore more about them here: turingpost.com/p/attention-ty…
Ksenia_TuringPost tweet media
English
13
294
2K
138K
Zhecheng Yuan retweetledi
Lucy Shi
Lucy Shi@lucy_x_shi·
1/ We just released π0.7 — a steerable generalist robot model with emergent capabilities. I want to share a bit of the backstory, because π0.7 taught me something surprising about where robot learning is heading. A thread on bittersweet lessons 🧵
English
31
102
849
82K
Zhecheng Yuan retweetledi
Sergey Levine
Sergey Levine@svlevine·
We finished evaluating π0.7, our new model at Physical Intelligence. What I'm most excited about with π0.7 is that it's starting to show some surprising emergent compositional generalization, being able to both perform complex tasks and learn new tasks just from instructions.
English
11
69
743
59.2K
Zhecheng Yuan retweetledi
Google DeepMind
Google DeepMind@GoogleDeepMind·
We’re rolling out an upgrade designed to help robots reason about the physical world. 🤖 Gemini Robotics-ER 1.6 has significantly better visual and spatial understanding in order to plan and complete more useful tasks. Here’s why this is important 🧵
English
141
422
2.6K
544.5K
Zhecheng Yuan retweetledi
David Bar
David Bar@observie·
System identification (sysid) is the process of finding the physical parameters that make a simulation match reality. If you're training an RL locomotion policy in simulation, the accuracy of your motor model directly affects how well the policy transfers to the real robot. A recent git commit by @kevin_zakka added a sysid toolbox to MuJoCo which automates this process: you provide recorded motor data and a MuJoCo model, and it optimizes the model parameters to minimize the difference between simulated and real trajectories. For my @RobStride_com RS02 QDD motors (17 Nm peak, 7.75:1 gear), I built a Rust tool that sends multi-sine torque excitation at 1 kHz and records position/velocity feedback. I then feed this data into MuJoCo's sysid optimizer.
English
16
48
362
47.3K
Zhecheng Yuan retweetledi
Akshay 🚀
Akshay 🚀@akshay_pachaar·
A single 𝗖𝗟𝗔𝗨𝗗𝗘.𝗺𝗱 file just hit 15K GitHub stars. (derived from Karpathy's coding rules) Andrej Karpathy observed that LLMs make the same predictable mistakes when writing code: over-engineering, ignoring existing patterns, and adding dependencies you never asked for. If you've used AI coding assistants, you've hit all of these. But here's the thing: If the mistakes are predictable, you can prevent them with the right instructions. That's exactly what this 𝗖𝗟𝗔𝗨𝗗𝗘.𝗺𝗱 does. You drop one markdown file into your repo, and it gives Claude Code a structured set of behavioral guidelines for your entire project. This is a big deal. - Built entirely around prompt engineering for AI coding assistants - No framework, no complex tooling, just one .md file that shapes behavior Developers are moving past "use AI to write code" and into "engineer the AI's behavior so the code is actually good." The Claude Code ecosystem is growing fast, and the best tools in it aren't always software. Sometimes they're just well-crafted instructions. 100% open-source. I've shared a link to the GitHub repo in the next tweet!
Akshay 🚀 tweet media
English
93
746
7.7K
732.5K
Zhecheng Yuan retweetledi
Ankur Handa
Ankur Handa@ankurhandos·
just created a repo for the prompts I used here if someone wants to try them out for fun: github.com/ankurhanda/cla… They are specific to build123 for now but should serve as good reference for various other CAD tools also.
English
1
3
11
643
Zhecheng Yuan retweetledi
Yongyuan Liang
Yongyuan Liang@cheryyun_l·
In our new blog (written w/ @RchalYang), we discuss where, when, and how to make vibe agents come alive in the real world. cheryyunl.github.io/blog/vibe-agen… We look at two interfaces for bringing AI agents into the physical world: code (composable, transparent, but whose reliability hinges on API design and feedback) and action (fluid, contact-rich, but compounds errors and forgets). The hybrid of both is emerging, but the harness that closes the loop does not yet exist. Every roboticist in 2026 will need to think about which problems or tasks are fundamentally solvable by vibe agents, where that boundary lies, and what scaffolding and harness robots need.
Yongyuan Liang tweet media
English
1
11
51
10.1K
Zhecheng Yuan retweetledi
James Zou
James Zou@james_y_zou·
You can now give your agent deep knowledge of millions of papers in one line with #paperclip!📎 >8 million papers natively indexed for agents. Much more thorough + often 10x faster than standard deep research. Just add the paperclip mcp (instruction below).
English
31
210
2K
207K