Anirudh Vemula

327 posts

Anirudh Vemula

Anirudh Vemula

@vvanirudh

Roboticist@Aurora. Primarily work in Robot Planning, Reinforcement Learning, and Optimization. Previously PhD@CMU, CS@IITB and SPG@Apple

Pittsburgh, PA Katılım Mart 2015
472 Takip Edilen412 Takipçiler
Anirudh Vemula retweetledi
Dylan Foster 🐢
Dylan Foster 🐢@canondetortugas·
Now that I have started using twitter somewhat regularly, let me take a minute to advertise the RL theory lecture notes I have been developing with Sasha Rakhlin: arxiv.org/abs/2312.16730
Dylan Foster 🐢 tweet media
English
6
90
642
67.5K
Anirudh Vemula retweetledi
Vaishnavh Nagarajan
Vaishnavh Nagarajan@_vaishnavh·
🗣️ “Next-token predictors can’t plan!” ⚔️ ​​“False! Every distribution is expressible as product of next-token probabilities!” 🗣️ In work w/ @GregorBachmann1 , we carefully flesh out this emerging, fragmented debate & articulate a key new failure. 🔴 arxiv.org/abs/2403.06963
English
13
79
395
55.4K
Wenxuan Zhou
Wenxuan Zhou@Wenxuan_Zhou·
Life updates: Successfully finished my Ph.D. thesis defense! It’s been an incredible journey of exploring the possibilities of robots and RL. I’m actively seeking full-time scientist/engineer positions in AI/Robotics. Looking forward to new adventures! ⛵️
Wenxuan Zhou tweet media
English
46
25
704
68.5K
Anirudh Vemula retweetledi
AK
AK@_akhaliq·
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback paper page: huggingface.co/papers/2307.15… Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals. RLHF has emerged as the central method used to finetune state-of-the-art large language models (LLMs). Despite this popularity, there has been relatively little public work systematizing its flaws. In this paper, we (1) survey open problems and fundamental limitations of RLHF and related methods; (2) overview techniques to understand, improve, and complement RLHF in practice; and (3) propose auditing and disclosure standards to improve societal oversight of RLHF systems. Our work emphasizes the limitations of RLHF and highlights the importance of a multi-faceted approach to the development of safer AI systems.
AK tweet media
English
4
144
651
132.6K
Ching-An Cheng
Ching-An Cheng@chinganc_rl·
Will be at #ICML2023 this week and present 4 cool papers on offline RL, lifelong RL and RL with exogenous processes. Looking forward to meeting new and old friends. Ping me if you wanna meet up. 🌺
English
2
0
23
3.8K
Anirudh Vemula retweetledi
Gokul Swamy
Gokul Swamy@g_k_swamy·
I'm rarely as excited about a paper as our #ICML2023 paper: we develop an algorithm for doing inverse reinforcement w/o an expensive RL inner loop, providing an *exponential* speedup. Works *extremely* well in practice. Joint work w/ @sanjibac, @zstevenwu, and Drew Bagnell. [1/n]
Gokul Swamy tweet media
English
6
93
586
77K
Anirudh Vemula retweetledi
Micah Corah
Micah Corah@CorahMicah·
I am delighted to say that I will be joining the Colorado School of Mines @CSatMines 💻🤖 as an Assistant Professor 👨‍🏫 this January! #academia #AcademicTwitter
English
12
5
95
11.2K
Francesco Orabona
Francesco Orabona@bremen79·
That's it! There is much more on this topic, so let me know if this is interesting to you and if you think I should write a blog post on it. 6/6
English
7
0
41
2.3K
Francesco Orabona
Francesco Orabona@bremen79·
A mini-thread about optimization algorithms and "implicit preconditioners". If you optimize a function where the Hessian is ill-conditioned, gradient descent will be very slow (left fig). However, if you precondition it, it will go straight to the minimum (right fig). 1/6
Francesco Orabona tweet media
English
4
36
249
32.1K
Nan Jiang
Nan Jiang@nanjiang_cs·
@g_k_swamy @vvanirudh @yus167 This one and your IRL paper are on my ICML "shopping list". My first in-person big conf ever since pandemic and so much looking forward to connecting with old & n w friends who like theory-inspired thinking in RL :)
English
2
0
9
1.4K
Anirudh Vemula
Anirudh Vemula@vvanirudh·
If this has been a long thread, this can be the only tweet to pay attention to the example figure to understand awesomeness of PDAM. MBPO: O(2^H) computation per iteration, and converges to bad model LAMPS-MM: O(H) computation per iteration and converges to good model
Anirudh Vemula tweet media
English
1
1
1
860
Anirudh Vemula
Anirudh Vemula@vvanirudh·
Our paper on a new (lazy) approach to model-based RL that is both computationally efficient and avoids the objective mismatch problem has been accepted for ICML! Excited to present it at Honolulu this summer! arxiv.org/abs/2303.00694
English
1
11
55
10.5K