Aravind Venugopal

14 posts

Aravind Venugopal

@avenugo2

ML PhD student @ Carnegie Mellon

Pittsburgh, PA Katılım Ağustos 2024

25 Takip Edilen21 Takipçiler

Sabitlenmiş Tweet

Aravind Venugopal@avenugo2·25 Nis

1/ 🧵 Generative world models implicitly encode the geometry of the world. We present Occupancy Reward Shaping, a method to extract temporal geometry as information-rich rewards for goal-reaching tasks. with @JiayuChen98666 @chongyiz1 @ben_eysenbach Paper: arxiv.org/abs/2604.20627 Visit our poster at ICLR Poster Session 5, Pavilion 4 (9:30am-noon): iclr.cc/virtual/2026/p…

English

5.4K

Aravind Venugopal retweetledi

Pulkit Agrawal@pulkitology·29 Nis

Eka means unity -- “one,” in Sanskrit and “first” in Finnish. We’re building intelligence for the physical world in its native language: forces. Until now, robotics faced a tradeoff — generality or speed. The real world requires both. Robotics also faced a data problem. Our Vision–Force–Action (VFA) model — the first of its kind — breaks the generality-speed tradeoff and the data barrier. It's a new foundation uniting performance, generality, and safety for putting capable robots in everyone's hands. Today, I am excited to share our journey of pushing robots beyond human limits. Today, dexterity becomes scalable. Today, I welcome you to the Era of Eka. Co-founded with @haarnoja, and so thrilled and grateful to be working with a dream team at @EkaRobotics. Learn more: ekarobotics.com

English

221

315K

Aravind Venugopal@avenugo2·26 Nis

Thanks @JesseFarebro ! We haven't written out the connection to log p(g|s,a) in our paper but a score-matching-based objective (for a diffusion occupancy model) analogous to eq. 5 in our paper should give a bound on log p(g|s,a), usable as a reward bonus. We use your TD-flow formulation for learning the flow-matching occupancy model.

English

Jesse Farebrother@JesseFarebro·26 Nis

@avenugo2 Cool work! Was curious if you tried using the likelihood of g as a reward bonus?

English

Aravind Venugopal@avenugo2·25 Nis

English

5.4K

Aravind Venugopal retweetledi

Mahsa Bastankhah@MBastankhah·25 Nis

We will be presenting our poster “demystifying the mechanism behind emergent exploration in goal conditioned RL” today @iclr_conf Time: 3:15-5:45 pm Location: Pavilion 4 P4-#3404 @ben_eysenbach @GraceLiu78 @Dilip_Arumugam

Grace Liu@GraceLiu78

NEW PAPER: "Demystifying the Mechanisms Behind Emergent Exploration in Goal-conditioned RL" How do RL algorithms develop sophisticated exploration strategies without explicit rewards? We provide insight into this question by studying Single-Goal Contrastive RL (SGCRL). [1/9]

English

3.1K

Aravind Venugopal@avenugo2·25 Nis

9/ 🧵 I thank my advisor Jeff Schneider, co-author Jiayu Chen and my amazing collaborators Xudong Wu, Chongyi Zheng and Ben Eysenbach.

English

118

Aravind Venugopal@avenugo2·25 Nis

8/ 🧵 Paper + code: aravindvenu7.github.io/website/ors/ In the near future, we hope to scale up this method and test its ability to perhaps, function as a large-scale general-purpose reward model.

English

137

Aravind Venugopal retweetledi

Fahim Tajwar@FahimTajwar10·5 Şub

Are we done with new RL algorithms? Turns out we might have been optimizing the wrong objective. Introducing MaxRL, a framework to bring maximum likelihood optimization to RL settings. Paper + code + project website: zanette-labs.github.io/MaxRL/ 🧵 1/n

English

161

808

207.3K

Aravind Venugopal retweetledi

Fahim Tajwar@FahimTajwar10·28 May

RL with verifiable reward has shown impressive results in improving LLM reasoning, but what can we do when we do not have ground truth answers? Introducing Self-Rewarding Training (SRT): where language models provide their own reward for RL training! 🧵 1/n

English

137

828

86.5K

Keşfet

@haarnoja @EkaRobotics @JesseFarebro @JiayuChen98666 @chongyiz1 @ben_eysenbach @iclr_conf @GraceLiu78