Anirudh Vemula

327 posts

Anirudh Vemula

@vvanirudh

Roboticist@Aurora. Primarily work in Robot Planning, Reinforcement Learning, and Optimization. Previously PhD@CMU, CS@IITB and SPG@Apple

Pittsburgh, PA Katılım Mart 2015

472 Takip Edilen412 Takipçiler

Anirudh Vemula retweetledi

Dylan Foster 🐢@canondetortugas·4 Ağu

Now that I have started using twitter somewhat regularly, let me take a minute to advertise the RL theory lecture notes I have been developing with Sasha Rakhlin: arxiv.org/abs/2312.16730

English

642

67.5K

Anirudh Vemula retweetledi

Vaishnavh Nagarajan@_vaishnavh·12 Mar

🗣️ “Next-token predictors can’t plan!” ⚔️ “False! Every distribution is expressible as product of next-token probabilities!” 🗣️ In work w/ @GregorBachmann1 , we carefully flesh out this emerging, fragmented debate & articulate a key new failure. 🔴 arxiv.org/abs/2403.06963

English

395

55.4K

Anirudh Vemula@vvanirudh·19 Eki

@Wenxuan_Zhou @yufei_ye Congrats Wenxuan!!

Português

129

Wenxuan Zhou@Wenxuan_Zhou·17 Eki

Life updates: Successfully finished my Ph.D. thesis defense! It’s been an incredible journey of exploring the possibilities of robots and RL. I’m actively seeking full-time scientist/engineer positions in AI/Robotics. Looking forward to new adventures! ⛵️

English

704

68.5K

Anirudh Vemula retweetledi

AK@_akhaliq·31 Tem

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback paper page: huggingface.co/papers/2307.15… Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals. RLHF has emerged as the central method used to finetune state-of-the-art large language models (LLMs). Despite this popularity, there has been relatively little public work systematizing its flaws. In this paper, we (1) survey open problems and fundamental limitations of RLHF and related methods; (2) overview techniques to understand, improve, and complement RLHF in practice; and (3) propose auditing and disclosure standards to improve societal oversight of RLHF systems. Our work emphasizes the limitations of RLHF and highlights the importance of a multi-faceted approach to the development of safer AI systems.

English

144

651

132.6K

Anirudh Vemula@vvanirudh·25 Tem

@aravindr93 @philippswu @arjunmajum @kevinleestone @yixin_lin_ @IMordatch @pabbeel @ncklashansen @haosu_twitr @HarryXu12 @xiaolonw Also at ICML this week. We should catch up :)

English

172

Aravind Rajeswaran@aravindr93·24 Tem

✈️ Just landed in Hawaii 🌴 to present two cool projects at #ICML2023 🚀 Masked Trajectory Models (w/ @philippswu, @arjunmajum, @kevinleestone, @yixin_lin_ , @IMordatch, @pabbeel) 📚 LfS Revisited (w/ @ncklashansen, @haosu_twitr, @HarryXu12, @xiaolonw et al.) Details in 🧵👇

English

6.6K

Anirudh Vemula@vvanirudh·25 Tem

@chinganc_rl Also at ICML this week. We should catch up :)

English

Ching-An Cheng@chinganc_rl·24 Tem

Will be at #ICML2023 this week and present 4 cool papers on offline RL, lifelong RL and RL with exogenous processes. Looking forward to meeting new and old friends. Ping me if you wanna meet up. 🌺

English

3.8K

Anirudh Vemula@vvanirudh·25 Tem

On my way to Honolulu to present this work! Hit me up if you want to hike and check out cool beaches B) arxiv.org/abs/2303.00694

Anirudh Vemula@vvanirudh

If this has been a long thread, this can be the only tweet to pay attention to the example figure to understand awesomeness of PDAM. MBPO: O(2^H) computation per iteration, and converges to bad model LAMPS-MM: O(H) computation per iteration and converges to good model

English

571

Anirudh Vemula@vvanirudh·25 Tem

@debidatta Why did you tweet this??

English

Anirudh Vemula retweetledi

Gokul Swamy@g_k_swamy·9 Tem

I'm rarely as excited about a paper as our #ICML2023 paper: we develop an algorithm for doing inverse reinforcement w/o an expensive RL inner loop, providing an *exponential* speedup. Works *extremely* well in practice. Joint work w/ @sanjibac, @zstevenwu, and Drew Bagnell. [1/n]

English

586

77K

Anirudh Vemula retweetledi

Micah Corah@CorahMicah·22 May

I am delighted to say that I will be joining the Colorado School of Mines @CSatMines 💻🤖 as an Assistant Professor 👨‍🏫 this January! #academia #AcademicTwitter

English

11.2K

Anirudh Vemula retweetledi

Sanjiban Choudhury@sanjibac·26 Nis

Why is being laziness a fundamental virtue in both model based RL and IRL? Excited to share our new ICML'23 papers arxiv.org/abs/2303.00694 and arxiv.org/abs/2303.14623 that gets at the heart of this question. Check out my talk at my CMU to learn more! youtube.com/live/ndtnvmwHK…

YouTube

English

1.3K

Anirudh Vemula@vvanirudh·26 Nis

@rkbanoth Nuvvu unnavu kada Mari atluntadi

Türkçe

103

Rama Krishna@rkbanoth·26 Nis

Rey Enti maa Khammam ki antha scene unda? :o

Aviral Bhatnagar@aviralbhat

Top Indian cities for GRE test takers: 1. Hyderabad: 25K 2. Guntur: 9K 3. Mumbai: 6K 4. Bangalore: 6K 5. Vijayawada: 4K 6. Pune: 4K 7. Chennai: 3K 8. Delhi: 3K 9. Vizag: 3K 10. Khammam: 2K Truly becoming the United States of Andhra

Eesti

245

Anirudh Vemula@vvanirudh·26 Nis

@bremen79 Yes please

English

Francesco Orabona@bremen79·25 Nis

That's it! There is much more on this topic, so let me know if this is interesting to you and if you think I should write a blog post on it. 6/6

English

2.3K

Francesco Orabona@bremen79·25 Nis

A mini-thread about optimization algorithms and "implicit preconditioners". If you optimize a function where the Hessian is ill-conditioned, gradient descent will be very slow (left fig). However, if you precondition it, it will go straight to the minimum (right fig). 1/6

English

249

32.1K

Anirudh Vemula@vvanirudh·25 Nis

@nanjiang_cs @g_k_swamy @yus167 Thanks for pointing out that connection @g_k_swamy Loved it! And thanks @nanjiang_cs def looking forward to catching up again after a long time at ICML

English

225

Nan Jiang@nanjiang_cs·25 Nis

@g_k_swamy @vvanirudh @yus167 This one and your IRL paper are on my ICML "shopping list". My first in-person big conf ever since pandemic and so much looking forward to connecting with old & n w friends who like theory-inspired thinking in RL :)

English

1.4K

Gokul Swamy@g_k_swamy·25 Nis

Really impressive work that gets at the heart of what makes MBRL hard and how to fix it. Nice job @vvanirudh + @yus167!

Anirudh Vemula@vvanirudh

Our paper on a new (lazy) approach to model-based RL that is both computationally efficient and avoids the objective mismatch problem has been accepted for ICML! Excited to present it at Honolulu this summer! arxiv.org/abs/2303.00694

English

3.3K

Anirudh Vemula@vvanirudh·25 Nis

Joint work with my awesome collaborators @yus167, @sanjibac, Aarti Singh, and Drew Bagnell

English

179