Chengzhi Mao

12 posts

Chengzhi Mao

@ChengzhiM

Researcher in Machine Learning and Computer Vision. Assistant Professor at Rutgers CS. Prior Google Research Scientist. PhD from Columbia.

Katılım Eylül 2018

324 Takip Edilen186 Takipçiler

Sabitlenmiş Tweet

Chengzhi Mao@ChengzhiM·22 Mar

Call for Papers is OPEN! We want your work on Actionable Perception, VLA models, and Robot Manipulation. 🗓️ Deadline: May 4 (AOE) ✅ Non-archival (Dual submission welcome!) Details & Submissions: 🔗 activis-workshop.github.io #CVPR2026 #AI #Robotics #VLA

English

204

Chengzhi Mao@ChengzhiM·21 Nis

@yinghui_he_ Interesting work!

English

Yinghui He@yinghui_he_·17 Nis

RLVR gives sparse supervision; On-Policy Self-Distillation often requires high-quality demonstrations. Our new method, ✨SD-Zero✨, gets the best of both worlds – we use model’s self-revision to turn binary rewards into dense token-level supervision. No external teacher. No curated demonstrations. 🚨 Introducing Self-Distillation Zero (SD-Zero), which trains one model to play two roles: (1) “Generator” that makes attempts, and (2) “Reviser” that conditions on the generator’s failed/successful attempt + binary reward to produce a better answer. ‼️Even WRONG attempts can become the training signal.‼️ 🔗Paper: arxiv.org/abs/2604.12002 🏆 SD-Zero brings 10%+ improvement over base models (Qwen3,4B; Olmo3,7B) on math & code reasoning, beating GRPO and vanilla On-Policy Self-Distillation under the same training budget. SD-Zero also enables iterative self-evolution.

English

398

213K

Chengzhi Mao@ChengzhiM·22 Mar

We’re bridging the gap between Vision, Language, and Action for Robotics. Everything you need to know: 👉 activis-workshop.github.io #EmbodiedAI #VLA #CVPR

English

Chengzhi Mao@ChengzhiM·22 Mar

How does visual perception actually serve robotic action? 🤔 Announcing the #CVPR2026 Workshop: ActiVis — "Bridging Vision, Language, and Action: What's Missing in Actionable Visual Perception for Robotics." Submit your paper and Join us in Denver this June! 📍

English

3.2K

Chengzhi Mao@ChengzhiM·22 Mar

Our invited speaker lineup is stacked: 🌟 Jitendra Malik (@UCBerkeley) 🌟 Chelsea Finn (@Stanford) 🌟 Marco Pavone (@Stanford) 🌟 Nathan F. Lepora (@UoBristol) 🌟 Ming-Yu Liu (@NVIDIA) 🌟 Ruoshi Liu (@Amazon)

English

Chengzhi Mao@ChengzhiM·22 Mar

English

204

Chengzhi Mao@ChengzhiM·3 Ara

It turns out, the best way to track the world is to learn how to generate it. 📍

English

Chengzhi Mao@ChengzhiM·3 Ara

We found that Video Diffusion Models naturally solve this. They don't just hallucinate pixels; they inherently learn motion in the early, noisy stages of generation, independent of appearance. By tapping into this, we can track visually identical objects without any supervision.

English

Chengzhi Mao@ChengzhiM·3 Ara

#NeurIPS2025 Poster #3611 Computer vision has a dirty secret: most object trackers are actually just "recognizers." They track by looking at colors and textures, not movement. That’s why they fail when two objects look identical.

English

Chengzhi Mao@ChengzhiM·19 Eki

In O‘ahu 🌴—this time for #ICCV2025 (Oct 19–23)! I’ll be speaking at the TrustFM Workshop on Oct 20, 4 PM (HST), Room 308B: “Seeing Through Words: Understanding and Controlling Vision via Language.” Come chat about how language helps interpret and steer vision foundation models!

English

124

Chengzhi Mao retweetledi

Lihao Sun@1e0sun·10 Haz

🚨New #ACL2025 paper! Today’s “safe” language models can look unbiased—but alignment can actually make them more biased implicitly by reducing their sensitivity to race-related associations. 🧵Find out more below!

English

2.8K

Chengzhi Mao@ChengzhiM·2 May

#ICML 2024 How do large language models (LLM) reach their decisions? Our latest research project, SelfIE, is the first to use an LLM to explain the same LLM's internals. The interpretation can be used for safety alignment and understanding hallucinations. selfie.cs.columbia.edu

English

763

Chengzhi Mao@ChengzhiM·15 Tem

@hongyangzh @cvondrick @djhsu The amount of training data is the same for single/multitask. We guess by training multitask, the features learned are more biased to the robust ones.

English

Hongyang Zhang@hongyangzh·15 Tem

@cvondrick @djhsu @ChengzhiM Very interesting work! Intuitively, is it because more tasks bring more training data, so the adversarial generalization is better and the robustness is strengthened?

English

Carl Vondrick@cvondrick·15 Tem

What causes adversarial examples? Latest #ECCV2020 paper from @ChengzhiM and Amogh shows that deep networks are vulnerable partly because they are trained on too few tasks. Just by increasing tasks, we strengthen robustness for each task individually. arxiv.org/pdf/2007.07236…

English

141

Keşfet

@yinghui_he_ @UCBerkeley @Stanford @UoBristol @NVIDIA @Amazon @hongyangzh @cvondrick