Nikhil Barhate

1.2K posts

Nikhil Barhate

@nikhilbarhate99

ML @scale_AI | prev @AMD @mila_quebec

San Francisco, CA Katılım Haziran 2015

888 Takip Edilen234 Takipçiler

Nikhil Barhate retweetledi

Taco Cohen@TacoCohen·5d

Apparently it is not well known and not easy to see that this "simple masked loss" is EXACTLY gradient-equivalent to PPO-Clip (at least for one way of computing the mask). Here's how to see this: The standard token-level PPO-Clip objective is the rather unintuitive J_t = min(r_t A_t, clip(r_t, 1 - eps, 1 + eps) A_t) To understand what's going on, split by cases: 1) Positive advantage: J_t = r_t A_t if r_t <= 1 + eps, else constant (clipped) 2) Negative advantage: J_t = r_t A_t if r_t >= 1 - eps, else constant (clipped) So when we differentiate, we either get grad r_t A_t = r_t A_t grad log pi_t, or we get 0 if the token got clipped. So we can use the objective J_t = M_t A_t r_t with gradient grad J_t = M_t r_t A_t grad log pi_t, where M_t = stop-grad((A_t >= 0 AND r_t <= 1 + eps) OR (A_t < 0 AND r_t >= 1 - eps)) In other words, PPO-Clip is gradient-equivalent to a simple masked loss. The loss value may differ, but it produces identical gradients. And so we see that actually, PPO-Clip is really quite intuitive. John just wanted to make sure that we are paying attention.

wh@nrehiew_

Official confirmation that Periodic Labs uses a simple masked importance sampling RL loss

English

323

43.3K

Nikhil Barhate retweetledi

Tiffany Zhao@tiffzhao05·28 Nis

I left Google DeepMind, moved from SF to NYC, all within 2 weeks to join @quadrillion_ai — to build the future of automated research intelligence with the highest slope founder and most talent dense team. I grew up in Silicon Valley — the old Facebook office was my second home. I’d hang out there after school, drawing with my crayons while looking around at the sea of computers with lines of code. Since a young age, I felt empowered to have an array of interests beyond tech: piano, ballet, figure skating, art. The valley embraced diversity of thought, and that’s what inspired me to stay for Stanford and my career thus far. But today, SF is one big hive-mind. So, I moved to NYC, away from family and friends to build a company that doesn’t need to rely on a bubble to survive. I’m meeting customers day after day in all kinds of verticals, connecting with them in different ways and seeing our product bring real value. Here, I’m able to live in diversity of thought. I’m excited to build the future of research in the city of opportunity. Let’s chat if this excites you.

English

337

212.5K

Nikhil Barhate retweetledi

Chongyi Zheng@chongyiz1·22 Nis

1/ Reinforcement learning is usually framed as maximizing rewards. But can we cast it as reaching the right goals? New blog on bridging RL, goal-conditioned RL, and stochastic shortest path: iclr-blogposts.github.io/2026/blog/2026… Also #ICLR2026 Poster: Thu 10:30 AM–1:00 PM, P4 #4611. 🧵⬇️

English

144

22.4K

Nikhil Barhate retweetledi

Yu Lei@_OutofMemory_·20 Nis

🤖Co-training is everywhere (sim↔real[e.g. GR00T, LBM], human↔robot[e.g. PI, EgoScale], even non-robot data[e.g. PI, LBM). But why does it work? How can we improve it further? Taking sim-and-real imitation learning in diffusion/ flow-based models as the test bed, we performed a rigorous mechanistic analysis, drawing on theoretical insights and multi-layered experiments. 😮Key insight: it’s all about representations. - Alignment → enables transfer - Discernibility → enables adaptation ⚖️Both are necessary — it's better to have more aligned representations, but the model must be able to discern the domains. We term this as structured representation alignment. ⬇️Let’s take a deep dive into that: Paper: arxiv.org/pdf/2604.13645 Website: science-of-co-training.github.io

English

383

59.9K

Nikhil Barhate retweetledi

Max Fu@letian_fu·1 Nis

Robotics: coding agents’ next frontier. So how good are they? We introduce CaP-X: an open-source framework and benchmark for coding agents, where they write code for robot perception and control, execute it on sim and real robots, observe the outcomes, and iteratively improve code reliability. From @NVIDIA @Berkeley_AI @CMU_Robotics @StanfordAILab capgym.github.io 🧵

English

128

632

158.2K

Nikhil Barhate retweetledi

Mingchen Zhuge@MingchenZhuge·10 Nis

🫱 Introducing 𝐍𝐞𝐮𝐫𝐚𝐥 𝐂𝐨𝐦𝐩𝐮𝐭𝐞𝐫s: 𝐰𝐡𝐚𝐭 𝐢𝐟 𝐀𝐈 𝐝𝐨𝐞𝐬 𝐧𝐨𝐭 𝐣𝐮𝐬𝐭 𝐮𝐬𝐞 𝐜𝐨𝐦𝐩𝐮𝐭𝐞𝐫𝐬 𝐛𝐞𝐭𝐭𝐞𝐫, 𝐛𝐮𝐭 𝐛𝐞𝐠𝐢𝐧𝐬 𝐭𝐨 𝐛𝐞𝐜𝐨𝐦𝐞 𝐭𝐡𝐞 𝐫𝐮𝐧𝐧𝐢𝐧𝐠 𝐜𝐨𝐦𝐩𝐮𝐭𝐞𝐫 𝐢𝐭𝐬𝐞𝐥𝐟? Beyond today's conventional computers, agents, and world models, Neural Computers (NCs) are new frontiers where computation, memory, and I/O move into a learned runtime state. We ask: whether parts of runtime can move inward into the learning system itself. This is our first step toward the Completely Neural Computer (CNC): a general-purpose neural computer with stable execution, explicit reprogramming, and durable capability reuse. Work done with Mingchen Zhuge (@MingchenZhuge), Changsheng Zhao, Haozhe Liu (@HaoZhe65347 ), Zijian Zhou (@ZijianZhou524 ), Shuming Liu (@shuming96 ), Wenyi Wang (@Wenyi_AI_Wang ), Ernie Chang (@erniecyc ), Gael Le Lan, Junjie Fei, Wenxuan Zhang, Zhipeng Cai (@cai_zhipeng ), Zechun Liu (@zechunliu ), Yunyang Xiong (@YoungXiong1 ), Yining Yang, Yuandong Tian (@tydsh ), Yangyang Shi, Vikas Chandra (@vikasc), Juergen Schmidhuber (@SchmidhuberAI)

English

133

525

209.3K

Nikhil Barhate retweetledi

Anirudh Goyal@anirudhg9119·8 Nis

Reasoning doesn’t have to mean longer chains of thought: PDR = draft in parallel → distill into a compact workspace → refine, and shift the Pareto frontier. arxiv.org/abs/2510.01123

Alexandr Wang@alexandr_wang

3/ we’re also releasing contemplating mode, which orchestrates multiple agents that reason in parallel designed to handle complex scientific & reasoning queries. in our testing we found it competitive w/ other extreme reasoning models such as Gemini Deep Think & GPT Pro.

English

297

40.6K

Nikhil Barhate retweetledi

Hojoon Lee@hojoon_ai·7 Nis

We scaled off-policy RL to sim-to-real. To our knowledge, FlashSAC is the fastest and most performant RL algorithm across IsaacLab, MuJoCo Playground, and many more, all with a single set of hyperparameters. Project page: holiday-robot.github.io/FlashSAC Paper: arxiv.org/pdf/2604.04539

English

293

27.2K

Nikhil Barhate retweetledi

chuyi shang@chuyishang·24 Mar

Wrote a deep dive on implementing a language model from scratch in JAX and scaling it with distributed training! If you’re coming from PyTorch and want to see how the same ideas look in JAX, or just want a hands-on intro to distributed training, check out this blog post: chuyishang.com/blog/2026/jax-… Comes with code + an assignment and test cases so you can follow along!

English

602

32.7K

Nikhil Barhate retweetledi

Olga Zaghen@olgazaghen·23 Mar

🔮 Working on ML on curved manifolds? Don't miss out on Jacobi Fields! 🔮 I wrote a quick, highly visual and hopefully accessible introduction to the topic: "Jacobi Fields in Machine Learning" 🤠 Check it out here: olgatticus.github.io/blog/jacobi-fi…!

English

451

25.9K

Nikhil Barhate retweetledi

Ian Osband@IanOsband·20 Mar

Something is rotten with policy gradient. PG has become *the* RL loss for LLMs. But it’s not even good at basic RL. Even on MNIST with bandit feedback, vanilla PG performs far worse than cross-entropy because it wastes gradient budget. Delightful Policy Gradient: arxiv.org/abs/2603.14608…

English

464

176.1K

Nikhil Barhate retweetledi

Peter Tong@TongPetersb·4 Mar

Train Beyond Language. We bet on the visual world as the critical next step alongside and beyond language modeling. So, we studied building foundation models from scratch with vision. We share our exploration: visual representations, data, world modeling, architecture, and scaling behavior! [1/9]

English

219

1.1K

215.8K

Nikhil Barhate retweetledi

William Shen@shenbokui·6 Mar

Excited to introduce Uni-1, our new multimodal model that *unifies* understanding and generation. TLDR: a team of ~15 researchers is going pound-for-pound with nano banana and gpt image 🧵

Jiaming Song@baaadas

Excited to introduce Uni-1, our new *unified* multimodal model that does both understanding and generation: lumalabs.ai/uni-1 TLDR: I think Uni-1 @LumaLabsAI is > GPT Image 1.5 in many cases, and toe-to-toe with Nano Banana Pro/2. (showcase below)

English

540

77.1K

Nikhil Barhate retweetledi

Jason Ramapuram@jramapuram·26 Şub

Autoregressive models dominate, but what if we treat multimodal generation as discrete order agnostic iterative refinement? Excited to share our systematic study on the design space of Tri-Modal Masked Diffusion Models (MDMs). We pre-trained the first Tri-Modal MDM from scratch on (text,), (image, text), and (audio, text). The same model can do ASR, TTS, T2I, captioning and native text generation. What I'm the most proud of in this work is the scientific rigor. Over 3,500 training runs. Principled hyperparameter transfer. Honest results. Carefully controlled ablations across multiple different axis of entanglement. A thread on our empirical findings (arXiV: arxiv.org/abs/2602.21472)

English

238

39.5K

Nikhil Barhate retweetledi

Amandeep Kumar@Amandeep__kumar·11 Şub

🚀 Unlocking Standard Diffusion Transformers on Representation Encoders Why do standard DiTs fail to converge on high-dimensional features like DINOv2? 📉 We found the answer isn't just "more parameters"—it's Geometry. Introducing Riemannian Flow Matching with Jacobi Regularization (RJF) 📄 Paper: arxiv.org/abs/2602.10099

English

122

786

68.8K

Nikhil Barhate retweetledi

Alan Baade@BaadeAlan·17 Şub

What's the right space to diffuse in: Raw Data or Latents? Why not both! In Latent Forcing, we order a joint diffusion trajectory to reveal Latents before Pixels, leading to improved convergence while being lossless at encoding and end-to-end at inference. w/ @drfeifei+... 1/n

English

540

119.9K

Nikhil Barhate retweetledi

Charlie Ruan@charlie_ruan·18 Şub

Releasing the official SkyRL + Harbor integration: a standardized way to train terminal-use agents with RL. From the creators of Terminal-Bench, Harbor is a widely adopted framework for evaluating terminal-use agents on any task expressible as a Dockerfile + instruction + test script. This integration extends it: the same tasks you evaluate on, you can now RL-train on. Blog: novasky-ai.notion.site/skyrl-harbor 🧵

English

243

34.1K

Nikhil Barhate retweetledi

Oscar Davis@osclsd·16 Şub

You like discrete diffusion, but it's too slow? 🥀 You like test-time inference, but it's for continuous methods? 😩 We fixed it. Introducing Categorical Flow Maps: continuously sample discrete data in a single step 🚀💫 How? 🧵⬇️ 💪 Co-led with @FEijkelboom, @daan_roos_

English

643

104.4K

Nikhil Barhate retweetledi

Tyler Griggs@tyler_griggs_·13 Şub

SkyRL now implements the Tinker API. Now, training scripts written for Tinker can run on your own GPUs with zero code changes using SkyRL's FSDP2, Megatron, and vLLM backends. Blog: novasky-ai.notion.site/skyrl-tinker 🧵

English

235

56.7K

Nikhil Barhate retweetledi

Yibo Yang@YiboYang·12 Şub

We've known that diffusion models are theoretically very good lossy data compressors , but how can we actually implement this idea in practice? I discuss this and related topics in a new review article on diffusion-based generative compression arxiv.org/abs/2601.18932

English

148

13.7K

Keşfet

@quadrillion_ai @NVIDIA @Berkeley_AI @CMU_Robotics @StanfordAILab @MingchenZhuge @HaoZhe65347 @ZijianZhou524