Gugan Thoppe

195 posts

Gugan Thoppe banner
Gugan Thoppe

Gugan Thoppe

@GThoppe

Asst. Prof. @IIScCSA PhD @TIFRScience Postdocs @TechnionLive, @DukeU Part of @Indiaacm eminent speaker panel #ReinforcementLearning #RLTheory

Bharat Katılım Mart 2020
132 Takip Edilen250 Takipçiler
Gugan Thoppe retweetledi
Pratyush Kumar
Pratyush Kumar@pratykumar·
📢 Open-sourcing the Sarvam 30B and 105B models! Trained from scratch with all data, model research and inference optimisation done in-house, these models punch above their weight in most global benchmarks plus excel in Indian languages. Get the weights at Hugging Face and AIKosh. Thanks to the good folks at SGLang for day 0 support, vLLM support coming soon. Links, benchmark scores, examples, and more in our blog - sarvam.ai/blogs/sarvam-3…
English
209
1.3K
6.8K
737.8K
Gugan Thoppe retweetledi
IIT Madras
IIT Madras@iitmadras·
@iitmadras is delighted and proud to share that our Director, Prof. V. Kamakoti, has been selected for the prestigious Padma Shri Awards 2026 by the Government of India. A distinguished academician and eminent researcher, Prof. Kamakoti has made transformative contributions in the fields of computer science, systems engineering and interdisciplinary research. His visionary leadership has further strengthened IIT Madras’s commitment to cutting-edge research, technological innovation, a thriving startup ecosystem and inclusive institutional development. The Padma Shri Awards recognise his outstanding service and exemplary leadership in the field of Science and Technology, which stands as a testament to his enduring impact in academia. The entire IIT Madras community takes pride in this well-deserved national recognition and extends its heartfelt congratulations, wishing him continued success in service of the nation. @EduMinOfIndia #PadmaAwards2026 #Padmashri2026 #PadmaAwards
English
24
68
390
19K
Gugan Thoppe
Gugan Thoppe@GThoppe·
@starlitmatcha Intuitively, Q-learning works because it mimics value iteration: Close the gap between Q_n and TQ_n. Technically, it works since it's a stochastic fixed point Iteration with respect to the Bellman operator, i.e., Q_{n +1} = Q_n + a_n [TQ_n -Q_n + noise], and T is contraction.
English
0
0
1
41
Khushi
Khushi@starlitmatcha·
Also understood the contrast: Monte Carlo waits until the episode ends whereas TD learning updates the value function step by step. Q-learning works because it uses these TD updates. So you don’t pick actions directly. You learn Q-values and choose actions based on them.
English
1
0
7
839
Khushi
Khushi@starlitmatcha·
Spent time with Q-learning and value-based RL today. Fun thing I noticed: the Bellman equation follows the same logic as dynamic programming.
English
4
1
23
10K
Gugan Thoppe
Gugan Thoppe@GThoppe·
Attending #AAAI2026 in Singapore and working on RL theory or algorithm design? I’d love to meet! Happy to chat about our recent #AISTATS2026 work extending policy iteration and conservative policy iteration beyond the tabular setting to function approximation.
Gugan Thoppe tweet media
English
0
1
2
186
Gugan Thoppe
Gugan Thoppe@GThoppe·
Can Q-learning in the average-reward setup, involving semi-norm contraction, have optimal rates without parameter-dependent step-sizes? Yes—using Polyak–Ruppert averaging. Catch our #AAAI2026 poster! Location: Hall 4 Date: Sunday, Jan 25
Gugan Thoppe tweet media
English
1
0
3
163
Gugan Thoppe
Gugan Thoppe@GThoppe·
Here is a link to our tutorial from the recent Reinforcement Learning (#RL) Workshop 2026: youtube.com/live/1o2PUBPgY…. This six-part series covers RL fundamentals, key theoretical challenges, our recent #RPI-based solutions, and a hands-on component.
YouTube video
YouTube
Gugan Thoppe tweet media
English
0
0
3
130
Gugan Thoppe retweetledi
President of India
President of India@rashtrapatibhvn·
President Droupadi Murmu presents Vigyan Yuva- Shanti Swarup Bhatnagar Awards to: •Dr Dibyendu Das in Chemistry. •Dr Waliur Rahaman in Earth Science. •Prof. Arkaprava Basu in Engineering Sciences. •Prof. Sabyasachi Mukherjee in Mathematics and Computer Science.
President of India tweet mediaPresident of India tweet mediaPresident of India tweet mediaPresident of India tweet media
English
26
153
723
34.1K
Gugan Thoppe retweetledi
IISc CSA
IISc CSA@IIScCSA·
The Walmart Centre of Tech Excellence at CSA proudly presents the 2026 edition of the Reinforcement Learning Workshop. This workshop aims to expose students and researchers in India to state-of-the-art Reinforcement Learning (RL) research. Register now: events.csa.iisc.ac.in/rlworkshop2026/
IISc CSA tweet media
English
1
5
13
1.4K
Gugan Thoppe
Gugan Thoppe@GThoppe·
At the beginning of the talk!
English
1
0
0
91
Gugan Thoppe
Gugan Thoppe@GThoppe·
If you are interested in the theory and applications of #RL, I would love to chat with you! Let me also know if you're interested in academic positions in India. @iiscbangalore @csa
English
0
0
0
74
Gugan Thoppe
Gugan Thoppe@GThoppe·
Analyzing fixed-point iterations for semi-norm contractions is challenging due to the latter's non-monotonicity. My Ph.D. student, Ankur Naskar, recently showed how to handle it and derive parameter-free convergence rates! Happy to note that this work got accepted to #AAAI2026.
IISc CSA@IIScCSA

Accepted to #AAAI2026: “Parameter-free Optimal Rates for Nonlinear Semi-Norm Contractions with Applications to Q-Learning,” by Ankur Naskar, Gugan Thoppe (@GThoppe), and Vijay Gupta. This work shows how to handle the non-monotonicity of semi-norms in #RL. arxiv.org/pdf/2508.05984

English
0
0
1
200