
John Zhou
61 posts

John Zhou
@johnlyzhou
CS PhD student @UCLA, previously @Columbia | Scalable reinforcement learning




Can language models learn useful priors without ever seeing language? We pre-pre-train transformers on neural cellular automata — fully synthetic, zero language. This improves language modeling by up to 6%, speeds up convergence by 40%, and strengthens downstream reasoning. Surprisingly, it even beats pre-pre-training on natural text! Blog: hanseungwook.github.io/blog/nca-pre-p… (1/n)

Introducing MolmoSpaces, a large-scale, fully open platform + benchmark for embodied AI research. 🤖 230k+ indoor scenes, 130k+ object models, & 42M annotated robotic grasps—all in one ecosystem.

Best ideas are often the simplest in hindsight. Meet Contact-Anchored Policies (CAP)🧢: by conditioning policies on physical contact (vs language) we achieve env & embodiment generalization with super low resources. This policy ⬇️ learned to pick from scratch w/ 16 hrs of data 🧵

Action chunking is drawing growing interest in RL, yet its theoretical properties are still understudied. We are excited to share some insights on when we should use action chunking in Q-learning + a new algo (DQC) to tackle hard long-horizon tasks!colinqiyangli.github.io/dqc🧵1/N


Very excited to finally share what I’ve been up to @physical_int for the past 6 months: developing advantage-conditioned VLAs! We are finally moving beyond imitating teleop data, and towards improving models with suboptimal deployment data using scalable real-world RL. 👇🧵





Don't miss our Embodied AI group's session this week on November 21st with @sshchang for a presentation on "FLAM: Scaling Latent Action World Models with Factorization." Thanks to @nahidalam and Cole Harrison for organizing this event! ✨ Learn more: cohere.com/events/cohere-…



Offline reinforcement learning is crucial for robotics, but does it scale? We talk to @seohong_park , who discusses how for long-horizon manipulation problems the answer may be no — at least not yet. But there are tricks that you can use to make it work effectively. Watch episode #38 of RoboPapers with @micoolcho and @chris_j_paxton now!

Several of my team members + myself are impacted by this layoff today. Welcome to connect :)





Hierarchical methods for offline goal-conditioned RL (GCRL) can scale to very distant goals that stymie flat (non-hierarchical) policies — but are they really necessary? Paper: arxiv.org/abs/2505.14975 Project page: johnlyzhou.github.io/saw/ Code: github.com/johnlyzhou/saw Thread ↓




