John Zhou

61 posts

John Zhou

@johnlyzhou

CS PhD student @UCLA, previously @Columbia | Scalable reinforcement learning

Los Angeles, CA Katılım Ağustos 2021

360 Takip Edilen136 Takipçiler

Sabitlenmiş Tweet

John Zhou@johnlyzhou·10 Haz

Hierarchical methods for offline goal-conditioned RL (GCRL) can scale to very distant goals that stymie flat (non-hierarchical) policies — but are they really necessary? Paper: arxiv.org/abs/2505.14975 Project page: johnlyzhou.github.io/saw/ Code: github.com/johnlyzhou/saw Thread ↓

English

7.5K

John Zhou retweetledi

Haoran Xu✈️ICLR26@ryanxhr·20 Nis

Both offline RL and LLM RL fine-tuning can be formulated as behavior-regularized RL problems. We propose Value Grdient Flow (VGF), a new scalable and sample-efficient paradigam that treats behavior-regularized RL as an optimal transport problem. arxiv.org/abs/2604.14265 🧵[1/7]

GIF

English

176

13.3K

John Zhou retweetledi

Ebrahim Feghhi@ebrahim_feghhi·17 Mar

Excited to introduce LightBeam, a CTC decoder for speech neuroprostheses that drastically cuts memory load while achieving state-of-the art (SOTA) results Paper: arxiv.org/abs/2603.14002 Code: github.com/ebrahimfeghhi/… Co-authors: @hu_is_lionel @nrhadidi @JonathanCKao Thread 🧵

English

984

John Zhou retweetledi

Dan Lee@Danicmhlee·12 Mar

We're moving to a future vision of fully synthetic pre-training for LLMs. Our new work explores using Neural Cellular Automata to embed reasoning before language training even begins! I'm deeply grateful to @seungwookh, @akarshkumar0101, and @pulkitology for their mentorship, guidance, and deep insights throughout this work.

Seungwook Han@seungwookh

Can language models learn useful priors without ever seeing language? We pre-pre-train transformers on neural cellular automata — fully synthetic, zero language. This improves language modeling by up to 6%, speeds up convergence by 40%, and strengthens downstream reasoning. Surprisingly, it even beats pre-pre-training on natural text! Blog: hanseungwook.github.io/blog/nca-pre-p… (1/n)

English

1.5K

John Zhou retweetledi

Omar Rayyan@omarrayyann·11 Şub

MolmoSpaces provides singular scale and diversity. We built a benchmark that puts that scale to use. MolmoSpaces-Bench evaluates zero-shot policies across thousands of environments previously unseen to them under systematic variation, providing insights that go beyond a success rate % More Below:

Ai2@allen_ai

Introducing MolmoSpaces, a large-scale, fully open platform + benchmark for embodied AI research. 🤖 230k+ indoor scenes, 130k+ object models, & 42M annotated robotic grasps—all in one ecosystem.

English

156

13.1K

John Zhou retweetledi

Omar Rayyan@omarrayyann·10 Şub

Very excited to release Contact-Anchored Policies (CAP) 🧢 today! Check out this thread for more details on that and on our in-the-loop simulation evaluations:

Mahi Shafiullah 🏠🤖@notmahi

Best ideas are often the simplest in hindsight. Meet Contact-Anchored Policies (CAP)🧢: by conditioning policies on physical contact (vs language) we achieve env & embodiment generalization with super low resources. This policy ⬇️ learned to pick from scratch w/ 16 hrs of data 🧵

English

10.8K

John Zhou@johnlyzhou·10 Şub

@edward_s_hu I asked 🫡

English

421

Edward Hu@edward_s_hu·10 Şub

Nobody asked, but here's 4 world model papers that I read early on in my PhD which I still ponder over now. - Value Equivalence Principle - Learning Awareness Models - Embedded Agency (figure pic below), Big World Hypothesis See the thread for details:

English

287

17.2K

John Zhou retweetledi

Seohong Park@seohong_park·13 Ara

We developed a new action-chunking RL method that scales well! tl;dr: Use *longer* action chunks for value learning (than for policy learning).

Qiyang (Colin) Li@qiyang_li

Action chunking is drawing growing interest in RL, yet its theoretical properties are still understudied. We are excited to share some insights on when we should use action chunking in Q-learning + a new algo (DQC) to tackle hard long-horizon tasks!colinqiyangli.github.io/dqc🧵1/N

English

273

34K

John Zhou retweetledi

Micah Goldblum@micahgoldblum·11 Ara

We built methods to handle both (1) and (2), but I’ll focus on a stupid simple trick that works particularly well: adversarial training. Adversarial training makes input gradients better behaved, in turn making gradient-based planning fast and easy. 8/11

English

7.5K

John Zhou@johnlyzhou·30 Kas

@zhiyuan_zhou_ Hey Paul, would love to meet up and learn more about the benefits of advantage conditioning!

English

115

Paul Zhou@zhiyuan_zhou_·29 Kas

I’ll be at #NeurIPS2025 in San Diego! Happy to chat about advantage conditioning and RECAP, and the making of pi*06, and robot learning and RL in general. Also presenting two RL papers 👇

Paul Zhou@zhiyuan_zhou_

Very excited to finally share what I’ve been up to @physical_int for the past 6 months: developing advantage-conditioned VLAs! We are finally moving beyond imitating teleop data, and towards improving models with suboptimal deployment data using scalable real-world RL. 👇🧵

English

118

16.6K

John Zhou@johnlyzhou·30 Kas

@TongheZhang01 I’ll be presenting Thursday morning but mostly free otherwise - happy to hash out times in DMs!

English

Tonghe Zhang@TongheZhang01·29 Kas

@johnlyzhou oh that's wonderful! will be presenting poster on Friday but definitely we can talk earlier! what's your availability?

English

114

Tonghe Zhang@TongheZhang01·29 Kas

If you are interested in RL, VLM, VLA, and efficient real world data collection for manipulators, come and chat with me at San Diego from Dec 3rd to 7th.

English

191

30.3K

John Zhou@johnlyzhou·29 Kas

If you’re at #NeurIPS2025 from Tuesday to Sunday and interested in any of: offline RL, offline-to-online finetuning, VLM value functions/reward models/VLAs, or RL for real-world robots, please reach out and let’s chat!

English

1.1K

John Zhou@johnlyzhou·26 Kas

@jaesikyoon_ Hi Jaesik, I really enjoyed your MCTD works and would love to chat more about it at NeurIPS!

English

105

Jaesik Yoon@jaesikyoon_·25 Kas

I’ll be attending NeurIPS next week. Happy to connect and discuss ideas around diffusion-based planning, generative search, and reasoning with generative models!

English

558

John Zhou retweetledi

Chang Shi@sshchang·19 Kas

As a robotics researcher, I believe accurately modeling complex interactions between agents would be a big step for scaling up robot learning from unlabeled video. Looking forward to some inspiring discussion with the Cohere Labs Embodied AI community!

Cohere Labs@Cohere_Labs

Don't miss our Embodied AI group's session this week on November 21st with @sshchang for a presentation on "FLAM: Scaling Latent Action World Models with Factorization." Thanks to @nahidalam and Cole Harrison for organizing this event! ✨ Learn more: cohere.com/events/cohere-…

English

4.5K

John Zhou retweetledi

Seohong Park@seohong_park·29 Eki

We scaled up an "alternative" paradigm in RL: *divide and conquer*. Compared to Q-learning (TD learning), divide and conquer can naturally scale to much longer horizons. Blog post: seohong.me/blog/rl-withou… Paper: arxiv.org/abs/2510.22512 ↓

English

505

76.3K

John Zhou retweetledi

Seohong Park@seohong_park·25 Eki

I had a fun chat with @chris_j_paxton and @micoolcho about the scalability of RL for robotics!

RoboPapers@RoboPapers

Offline reinforcement learning is crucial for robotics, but does it scale? We talk to @seohong_park , who discusses how for long-horizon manipulation problems the answer may be no — at least not yet. But there are tricks that you can use to make it work effectively. Watch episode #38 of RoboPapers with @micoolcho and @chris_j_paxton now!

English

12.1K

John Zhou retweetledi

Jiaxun Cui 🐿️@cuijiaxun·23 Eki

Meta has gone crazy on the squid game! Many new PhD NGs are deactivated today (I am also impacted🥲 happy to chat)

Yuandong Tian@tydsh

Several of my team members + myself are impacted by this layoff today. Welcome to connect :)

English

110

1.4M

John Zhou retweetledi

Seohong Park@seohong_park·9 Eki

Introducing *dual representations*! tl;dr: We represent a state by the "set of similarities" to all other states. This dual perspective has lots of nice properties and practical benefits in RL. Blog post: seohong.me/blog/dual-repr… Paper: arxiv.org/abs/2510.06714 ↓

English

118

938

176.1K

John Zhou retweetledi

UCLA Samueli Engineering@UCLAengineering·9 Eki

Congrats to @UCLA Assoc. Prof. Jonathan Kao of @ECE_UCLA @dgsomucla, on receiving the National Institutes of Health @NIH Director’s Pioneer Award, a 5-year, $5.5 million grant to fund Kao’s research into noninvasive assistive devices for the paralyzed. samueli.ucla.edu/ucla-samueli-p…

English

689

John Zhou@johnlyzhou·19 Eyl

SAW will be presented as a spotlight @NeurIPSConf 2025! Check out more details below:

John Zhou@johnlyzhou

English

806

Keşfet

@hu_is_lionel @nrhadidi @JonathanCKao @seungwookh @akarshkumar0101 @pulkitology @edward_s_hu @zhiyuan_zhou_