Nikhil Barhate

1.2K posts

Nikhil Barhate banner
Nikhil Barhate

Nikhil Barhate

@nikhilbarhate99

ML @scale_AI | prev @AMD @mila_quebec

San Francisco, CA Katılım Haziran 2015
888 Takip Edilen234 Takipçiler
Nikhil Barhate retweetledi
Taco Cohen
Taco Cohen@TacoCohen·
Apparently it is not well known and not easy to see that this "simple masked loss" is EXACTLY gradient-equivalent to PPO-Clip (at least for one way of computing the mask). Here's how to see this: The standard token-level PPO-Clip objective is the rather unintuitive J_t = min(r_t A_t, clip(r_t, 1 - eps, 1 + eps) A_t) To understand what's going on, split by cases: 1) Positive advantage: J_t = r_t A_t if r_t <= 1 + eps, else constant (clipped) 2) Negative advantage: J_t = r_t A_t if r_t >= 1 - eps, else constant (clipped) So when we differentiate, we either get grad r_t A_t = r_t A_t grad log pi_t, or we get 0 if the token got clipped. So we can use the objective J_t = M_t A_t r_t with gradient grad J_t = M_t r_t A_t grad log pi_t, where M_t = stop-grad((A_t >= 0 AND r_t <= 1 + eps) OR (A_t < 0 AND r_t >= 1 - eps)) In other words, PPO-Clip is gradient-equivalent to a simple masked loss. The loss value may differ, but it produces identical gradients. And so we see that actually, PPO-Clip is really quite intuitive. John just wanted to make sure that we are paying attention.
wh@nrehiew_

Official confirmation that Periodic Labs uses a simple masked importance sampling RL loss

English
9
18
323
43.3K
Nikhil Barhate retweetledi
Tiffany Zhao
Tiffany Zhao@tiffzhao05·
I left Google DeepMind, moved from SF to NYC, all within 2 weeks to join @quadrillion_ai — to build the future of automated research intelligence with the highest slope founder and most talent dense team. I grew up in Silicon Valley — the old Facebook office was my second home. I’d hang out there after school, drawing with my crayons while looking around at the sea of computers with lines of code. Since a young age, I felt empowered to have an array of interests beyond tech: piano, ballet, figure skating, art. The valley embraced diversity of thought, and that’s what inspired me to stay for Stanford and my career thus far. But today, SF is one big hive-mind. So, I moved to NYC, away from family and friends to build a company that doesn’t need to rely on a bubble to survive. I’m meeting customers day after day in all kinds of verticals, connecting with them in different ways and seeing our product bring real value. Here, I’m able to live in diversity of thought. I’m excited to build the future of research in the city of opportunity. Let’s chat if this excites you.
Tiffany Zhao tweet media
English
94
18
337
212.5K
Nikhil Barhate retweetledi
Chongyi Zheng
Chongyi Zheng@chongyiz1·
1/ Reinforcement learning is usually framed as maximizing rewards. But can we cast it as reaching the right goals? New blog on bridging RL, goal-conditioned RL, and stochastic shortest path: iclr-blogposts.github.io/2026/blog/2026… Also #ICLR2026 Poster: Thu 10:30 AM–1:00 PM, P4 #4611. 🧵⬇️
Chongyi Zheng tweet media
English
2
26
144
22.4K
Nikhil Barhate retweetledi
Yu Lei
Yu Lei@_OutofMemory_·
🤖Co-training is everywhere (sim↔real[e.g. GR00T, LBM], human↔robot[e.g. PI, EgoScale], even non-robot data[e.g. PI, LBM). But why does it work? How can we improve it further? Taking sim-and-real imitation learning in diffusion/ flow-based models as the test bed, we performed a rigorous mechanistic analysis, drawing on theoretical insights and multi-layered experiments. 😮Key insight: it’s all about representations. - Alignment → enables transfer - Discernibility → enables adaptation ⚖️Both are necessary — it's better to have more aligned representations, but the model must be able to discern the domains. We term this as structured representation alignment. ⬇️Let’s take a deep dive into that: Paper: arxiv.org/pdf/2604.13645 Website: science-of-co-training.github.io
Yu Lei tweet media
English
5
66
383
59.9K
Nikhil Barhate retweetledi
Max Fu
Max Fu@letian_fu·
Robotics: coding agents’ next frontier. So how good are they? We introduce CaP-X: an open-source framework and benchmark for coding agents, where they write code for robot perception and control, execute it on sim and real robots, observe the outcomes, and iteratively improve code reliability. From @NVIDIA @Berkeley_AI @CMU_Robotics @StanfordAILab capgym.github.io 🧵
English
19
128
632
158.2K
Nikhil Barhate retweetledi
Mingchen Zhuge
Mingchen Zhuge@MingchenZhuge·
🫱 Introducing 𝐍𝐞𝐮𝐫𝐚𝐥 𝐂𝐨𝐦𝐩𝐮𝐭𝐞𝐫s: 𝐰𝐡𝐚𝐭 𝐢𝐟 𝐀𝐈 𝐝𝐨𝐞𝐬 𝐧𝐨𝐭 𝐣𝐮𝐬𝐭 𝐮𝐬𝐞 𝐜𝐨𝐦𝐩𝐮𝐭𝐞𝐫𝐬 𝐛𝐞𝐭𝐭𝐞𝐫, 𝐛𝐮𝐭 𝐛𝐞𝐠𝐢𝐧𝐬 𝐭𝐨 𝐛𝐞𝐜𝐨𝐦𝐞 𝐭𝐡𝐞 𝐫𝐮𝐧𝐧𝐢𝐧𝐠 𝐜𝐨𝐦𝐩𝐮𝐭𝐞𝐫 𝐢𝐭𝐬𝐞𝐥𝐟? Beyond today's conventional computers, agents, and world models, Neural Computers (NCs) are new frontiers where computation, memory, and I/O move into a learned runtime state. We ask: whether parts of runtime can move inward into the learning system itself. This is our first step toward the Completely Neural Computer (CNC): a general-purpose neural computer with stable execution, explicit reprogramming, and durable capability reuse. Work done with Mingchen Zhuge (@MingchenZhuge), Changsheng Zhao, Haozhe Liu (@HaoZhe65347 ), Zijian Zhou (@ZijianZhou524 ), Shuming Liu (@shuming96 ), Wenyi Wang (@Wenyi_AI_Wang ), Ernie Chang (@erniecyc ), Gael Le Lan, Junjie Fei, Wenxuan Zhang, Zhipeng Cai (@cai_zhipeng ), Zechun Liu (@zechunliu ), Yunyang Xiong (@YoungXiong1 ), Yining Yang, Yuandong Tian (@tydsh ), Yangyang Shi, Vikas Chandra (@vikasc), Juergen Schmidhuber (@SchmidhuberAI)
Mingchen Zhuge tweet media
English
29
133
525
209.3K
Nikhil Barhate retweetledi
Nikhil Barhate retweetledi
Hojoon Lee
Hojoon Lee@hojoon_ai·
We scaled off-policy RL to sim-to-real. To our knowledge, FlashSAC is the fastest and most performant RL algorithm across IsaacLab, MuJoCo Playground, and many more, all with a single set of hyperparameters. Project page: holiday-robot.github.io/FlashSAC Paper: arxiv.org/pdf/2604.04539
English
16
43
293
27.2K
Nikhil Barhate retweetledi
chuyi shang
chuyi shang@chuyishang·
Wrote a deep dive on implementing a language model from scratch in JAX and scaling it with distributed training! If you’re coming from PyTorch and want to see how the same ideas look in JAX, or just want a hands-on intro to distributed training, check out this blog post: chuyishang.com/blog/2026/jax-… Comes with code + an assignment and test cases so you can follow along!
chuyi shang tweet mediachuyi shang tweet media
English
9
65
602
32.7K
Nikhil Barhate retweetledi
Olga Zaghen
Olga Zaghen@olgazaghen·
🔮 Working on ML on curved manifolds? Don't miss out on Jacobi Fields! 🔮 I wrote a quick, highly visual and hopefully accessible introduction to the topic: "Jacobi Fields in Machine Learning" 🤠 Check it out here: olgatticus.github.io/blog/jacobi-fi…!
Olga Zaghen tweet media
English
12
65
451
25.9K
Nikhil Barhate retweetledi
Ian Osband
Ian Osband@IanOsband·
Something is rotten with policy gradient. PG has become *the* RL loss for LLMs. But it’s not even good at basic RL. Even on MNIST with bandit feedback, vanilla PG performs far worse than cross-entropy because it wastes gradient budget. Delightful Policy Gradient: arxiv.org/abs/2603.14608…
Ian Osband tweet media
English
17
44
464
176.1K
Nikhil Barhate retweetledi
Peter Tong
Peter Tong@TongPetersb·
Train Beyond Language. We bet on the visual world as the critical next step alongside and beyond language modeling. So, we studied building foundation models from scratch with vision. We share our exploration: visual representations, data, world modeling, architecture, and scaling behavior! [1/9]
Peter Tong tweet media
English
36
219
1.1K
215.8K
Nikhil Barhate retweetledi
William Shen
William Shen@shenbokui·
Excited to introduce Uni-1, our new multimodal model that *unifies* understanding and generation. TLDR: a team of ~15 researchers is going pound-for-pound with nano banana and gpt image 🧵
William Shen tweet media
Jiaming Song@baaadas

Excited to introduce Uni-1, our new *unified* multimodal model that does both understanding and generation: lumalabs.ai/uni-1 TLDR: I think Uni-1 @LumaLabsAI is > GPT Image 1.5 in many cases, and toe-to-toe with Nano Banana Pro/2. (showcase below)

English
21
64
540
77.1K
Nikhil Barhate retweetledi
Jason Ramapuram
Jason Ramapuram@jramapuram·
Autoregressive models dominate, but what if we treat multimodal generation as discrete order agnostic iterative refinement? Excited to share our systematic study on the design space of Tri-Modal Masked Diffusion Models (MDMs). We pre-trained the first Tri-Modal MDM from scratch on (text,), (image, text), and (audio, text). The same model can do ASR, TTS, T2I, captioning and native text generation. What I'm the most proud of in this work is the scientific rigor. Over 3,500 training runs. Principled hyperparameter transfer. Honest results. Carefully controlled ablations across multiple different axis of entanglement. A thread on our empirical findings (arXiV: arxiv.org/abs/2602.21472)
Jason Ramapuram tweet media
English
6
43
238
39.5K
Nikhil Barhate retweetledi
Amandeep Kumar
Amandeep Kumar@Amandeep__kumar·
🚀 Unlocking Standard Diffusion Transformers on Representation Encoders Why do standard DiTs fail to converge on high-dimensional features like DINOv2? 📉 We found the answer isn't just "more parameters"—it's Geometry. Introducing Riemannian Flow Matching with Jacobi Regularization (RJF) 📄 Paper: arxiv.org/abs/2602.10099
Amandeep Kumar tweet media
English
24
122
786
68.8K
Nikhil Barhate retweetledi
Alan Baade
Alan Baade@BaadeAlan·
What's the right space to diffuse in: Raw Data or Latents? Why not both! In Latent Forcing, we order a joint diffusion trajectory to reveal Latents before Pixels, leading to improved convergence while being lossless at encoding and end-to-end at inference. w/ @drfeifei+... 1/n
Alan Baade tweet media
English
9
72
540
119.9K
Nikhil Barhate retweetledi
Charlie Ruan
Charlie Ruan@charlie_ruan·
Releasing the official SkyRL + Harbor integration: a standardized way to train terminal-use agents with RL. From the creators of Terminal-Bench, Harbor is a widely adopted framework for evaluating terminal-use agents on any task expressible as a Dockerfile + instruction + test script. This integration extends it: the same tasks you evaluate on, you can now RL-train on. Blog: novasky-ai.notion.site/skyrl-harbor 🧵
Charlie Ruan tweet media
English
9
46
243
34.1K
Nikhil Barhate retweetledi
Oscar Davis
Oscar Davis@osclsd·
You like discrete diffusion, but it's too slow? 🥀 You like test-time inference, but it's for continuous methods? 😩 We fixed it. Introducing Categorical Flow Maps: continuously sample discrete data in a single step 🚀💫 How? 🧵⬇️ 💪 Co-led with @FEijkelboom, @daan_roos_
English
10
88
643
104.4K
Nikhil Barhate retweetledi
Tyler Griggs
Tyler Griggs@tyler_griggs_·
SkyRL now implements the Tinker API. Now, training scripts written for Tinker can run on your own GPUs with zero code changes using SkyRL's FSDP2, Megatron, and vLLM backends. Blog: novasky-ai.notion.site/skyrl-tinker 🧵
Tyler Griggs tweet media
English
6
53
235
56.7K
Nikhil Barhate retweetledi
Yibo Yang
Yibo Yang@YiboYang·
We've known that diffusion models are theoretically very good lossy data compressors , but how can we actually implement this idea in practice? I discuss this and related topics in a new review article on diffusion-based generative compression arxiv.org/abs/2601.18932
Yibo Yang tweet mediaYibo Yang tweet media
English
0
16
148
13.7K