Aravind Venugopal

14 posts

Aravind Venugopal

Aravind Venugopal

@avenugo2

ML PhD student @ Carnegie Mellon

Pittsburgh, PA Katılım Ağustos 2024
25 Takip Edilen21 Takipçiler
Aravind Venugopal retweetledi
Pulkit Agrawal
Pulkit Agrawal@pulkitology·
Eka means unity -- “one,” in Sanskrit and “first” in Finnish. We’re building intelligence for the physical world in its native language: forces. Until now, robotics faced a tradeoff — generality or speed. The real world requires both. Robotics also faced a data problem. Our Vision–Force–Action (VFA) model — the first of its kind — breaks the generality-speed tradeoff and the data barrier. It's a new foundation uniting performance, generality, and safety for putting capable robots in everyone's hands. Today, I am excited to share our journey of pushing robots beyond human limits. Today, dexterity becomes scalable. Today, I welcome you to the Era of Eka. Co-founded with @haarnoja, and so thrilled and grateful to be working with a dream team at @EkaRobotics. Learn more: ekarobotics.com
English
65
221
2K
315K
Aravind Venugopal
Aravind Venugopal@avenugo2·
Thanks @JesseFarebro ! We haven't written out the connection to log p(g|s,a) in our paper but a score-matching-based objective (for a diffusion occupancy model) analogous to eq. 5 in our paper should give a bound on log p(g|s,a), usable as a reward bonus. We use your TD-flow formulation for learning the flow-matching occupancy model.
English
0
0
0
7
Jesse Farebrother
Jesse Farebrother@JesseFarebro·
@avenugo2 Cool work! Was curious if you tried using the likelihood of g as a reward bonus?
English
1
0
0
32
Aravind Venugopal retweetledi
Aravind Venugopal
Aravind Venugopal@avenugo2·
9/ 🧵 I thank my advisor Jeff Schneider, co-author Jiayu Chen and my amazing collaborators Xudong Wu, Chongyi Zheng and Ben Eysenbach.
English
0
0
0
118
Aravind Venugopal retweetledi
Fahim Tajwar
Fahim Tajwar@FahimTajwar10·
Are we done with new RL algorithms? Turns out we might have been optimizing the wrong objective. Introducing MaxRL, a framework to bring maximum likelihood optimization to RL settings. Paper + code + project website: zanette-labs.github.io/MaxRL/ 🧵 1/n
English
14
161
808
207.3K
Aravind Venugopal retweetledi
Fahim Tajwar
Fahim Tajwar@FahimTajwar10·
RL with verifiable reward has shown impressive results in improving LLM reasoning, but what can we do when we do not have ground truth answers? Introducing Self-Rewarding Training (SRT): where language models provide their own reward for RL training! 🧵 1/n
Fahim Tajwar tweet media
English
21
137
828
86.5K