Gugan Thoppe

195 posts

Gugan Thoppe

@GThoppe

Asst. Prof. @IIScCSA PhD @TIFRScience Postdocs @TechnionLive, @DukeU Part of @Indiaacm eminent speaker panel #ReinforcementLearning #RLTheory

Bharat Katılım Mart 2020

132 Takip Edilen250 Takipçiler

Sabitlenmiş Tweet

Gugan Thoppe@GThoppe·10 Haz

Policy Iteration’s super-power—monotonic improvement + guaranteed convergence—vanishes under general function approximation. To bring them, we introduce Reliable Policy Iteration (RPI) : arxiv.org/abs/2506.07134. #ReinforcementLearning #RL @EshwarSR @today_itself @DalalGal

English

6.7K

Gugan Thoppe retweetledi

Pratyush Kumar@pratykumar·6 Mar

📢 Open-sourcing the Sarvam 30B and 105B models! Trained from scratch with all data, model research and inference optimisation done in-house, these models punch above their weight in most global benchmarks plus excel in Indian languages. Get the weights at Hugging Face and AIKosh. Thanks to the good folks at SGLang for day 0 support, vLLM support coming soon. Links, benchmark scores, examples, and more in our blog - sarvam.ai/blogs/sarvam-3…

English

209

1.3K

6.8K

737.8K

Gugan Thoppe retweetledi

IIT Madras@iitmadras·25 Oca

@iitmadras is delighted and proud to share that our Director, Prof. V. Kamakoti, has been selected for the prestigious Padma Shri Awards 2026 by the Government of India. A distinguished academician and eminent researcher, Prof. Kamakoti has made transformative contributions in the fields of computer science, systems engineering and interdisciplinary research. His visionary leadership has further strengthened IIT Madras’s commitment to cutting-edge research, technological innovation, a thriving startup ecosystem and inclusive institutional development. The Padma Shri Awards recognise his outstanding service and exemplary leadership in the field of Science and Technology, which stands as a testament to his enduring impact in academia. The entire IIT Madras community takes pride in this well-deserved national recognition and extends its heartfelt congratulations, wishing him continued success in service of the nation. @EduMinOfIndia #PadmaAwards2026 #Padmashri2026 #PadmaAwards

English

390

19K

Gugan Thoppe@GThoppe·24 Oca

@starlitmatcha Intuitively, Q-learning works because it mimics value iteration: Close the gap between Q_n and TQ_n. Technically, it works since it's a stochastic fixed point Iteration with respect to the Bellman operator, i.e., Q_{n +1} = Q_n + a_n [TQ_n -Q_n + noise], and T is contraction.

English

Khushi@starlitmatcha·23 Oca

Also understood the contrast: Monte Carlo waits until the episode ends whereas TD learning updates the value function step by step. Q-learning works because it uses these TD updates. So you don’t pick actions directly. You learn Q-values and choose actions based on them.

English

839

Khushi@starlitmatcha·23 Oca

Spent time with Q-learning and value-based RL today. Fun thing I noticed: the Bellman equation follows the same logic as dynamic programming.

English

10K

Gugan Thoppe@GThoppe·23 Oca

Attending #AAAI2026 in Singapore and working on RL theory or algorithm design? I’d love to meet! Happy to chat about our recent #AISTATS2026 work extending policy iteration and conservative policy iteration beyond the tabular setting to function approximation.

English

186

Gugan Thoppe@GThoppe·23 Oca

Can Q-learning in the average-reward setup, involving semi-norm contraction, have optimal rates without parameter-dependent step-sizes? Yes—using Polyak–Ruppert averaging. Catch our #AAAI2026 poster! Location: Hall 4 Date: Sunday, Jan 25

English

163

Gugan Thoppe@GThoppe·22 Oca

Happy to share that this work has been accepted to #AISTATS2026!

Gugan Thoppe@GThoppe

English

339

Gugan Thoppe@GThoppe·22 Oca

Here is a link to our tutorial from the recent Reinforcement Learning (#RL) Workshop 2026: youtube.com/live/1o2PUBPgY…. This six-part series covers RL fundamentals, key theoretical challenges, our recent #RPI-based solutions, and a hands-on component.

YouTube

English

130

Gugan Thoppe retweetledi

President of India@rashtrapatibhvn·23 Ara

President Droupadi Murmu presents Vigyan Yuva- Shanti Swarup Bhatnagar Awards to: •Dr Dibyendu Das in Chemistry. •Dr Waliur Rahaman in Earth Science. •Prof. Arkaprava Basu in Engineering Sciences. •Prof. Sabyasachi Mukherjee in Mathematics and Computer Science.

English

153

723

34.1K

Gugan Thoppe@GThoppe·6 Ara

Wormhole at #NeurIPS2025

English

170

Gugan Thoppe retweetledi

IISc CSA@IIScCSA·5 Ara

The Walmart Centre of Tech Excellence at CSA proudly presents the 2026 edition of the Reinforcement Learning Workshop. This workshop aims to expose students and researchers in India to state-of-the-art Reinforcement Learning (RL) research. Register now: events.csa.iisc.ac.in/rlworkshop2026/

English

1.4K

Gugan Thoppe@GThoppe·3 Ara

ZXX

Gugan Thoppe@GThoppe·3 Ara

At the beginning of the talk!

English

Gugan Thoppe@GThoppe·3 Ara

Before the #RL Guru, Richard Sutton's, talk at #NeurIPS2025 !

English

229

Gugan Thoppe@GThoppe·2 Ara

If you are interested in the theory and applications of #RL, I would love to chat with you! Let me also know if you're interested in academic positions in India. @iiscbangalore @csa

English

Gugan Thoppe@GThoppe·2 Ara

Paper: arxiv.org/abs/2507.05077

English

Gugan Thoppe@GThoppe·2 Ara

Presenting at #NeurIPS2025, Exhibit Hall C,D,E #1803 Wed 3 Dec 11 a.m. — 2 p.m. PST

English

212

Gugan Thoppe@GThoppe·10 Kas

Analyzing fixed-point iterations for semi-norm contractions is challenging due to the latter's non-monotonicity. My Ph.D. student, Ankur Naskar, recently showed how to handle it and derive parameter-free convergence rates! Happy to note that this work got accepted to #AAAI2026.

IISc CSA@IIScCSA

Accepted to #AAAI2026: “Parameter-free Optimal Rates for Nonlinear Semi-Norm Contractions with Applications to Q-Learning,” by Ankur Naskar, Gugan Thoppe (@GThoppe), and Vijay Gupta. This work shows how to handle the non-monotonicity of semi-norms in #RL. arxiv.org/pdf/2508.05984

English

200

Keşfet

@iitmadras @EduMinOfIndia @starlitmatcha @iiscbangalore @CSA @elonmusk @BarackObama @taylorswift13