Nishanth Anand

657 posts

Nishanth Anand

@itsNVA7

Ph.D. student in Continual Reinforcement Learning @Mila_Quebec, @mcgillu, @rllabmcgill

Montreal, Canada Katılım Ağustos 2013

615 Takip Edilen978 Takipçiler

Sabitlenmiş Tweet

Nishanth Anand@itsNVA7·26 Şub

Continual learning should be viewed through the lens of the stability-plasticity dilemma, and solving it requires rethinking learning architectures. I argue why in my first continual learning blog: itsnva7.substack.com/p/continual-le…

English

116

7.4K

Nishanth Anand retweetledi

Siva Reddy@sivareddyg·10 Mar

Montreal deep tech scene is getting hot!! Many recent hires of Cohere, Mistral, Periodic Labs, Poolside are all based in Montreal. And now, AMI will have an office here 🔥 It's a no-brainer, though. @Mila_Quebec has the highest concentration of deep learning expertise with interdisciplinary connections. Thanks to recent US regulation changes on immigration, no more brain drain! Let's build more in Canada!

Yann LeCun@ylecun

Unveiling our new startup Advanced Machine Intelligence (AMI Labs). We just completed our seed round: $1.03B / 890M€, one the largest seeds ever, probably the largest for a European company. We're hiring! [the background image is the Veil Nebula - a picture I took from my backyard, most appropriate for an unveiling] More details here: techcrunch.com/2026/03/09/yan…

English

751

71.9K

Nishanth Anand retweetledi

Cohere Labs@Cohere_Labs·2 Mar

Don't forget to tune in tomorrow, Tuesday, March 3rd for a session with @itsNVA7 focused on "The permanent and transient framework for continual reinforcement learning." Learn more: cohere.com/events/cohere-…

Cohere Labs@Cohere_Labs

Our Reinforcement Learning group is excited to welcome @itsNVA7 for a presentation on "The permanent and transient framework for continual reinforcement learning" on Tuesday, March 3rd. Thanks to @rahul_narava and Gusti Triandi Winata for organizing this event! 🔥 Learn more: cohere.com/events/cohere-…

English

1.4K

Nishanth Anand@itsNVA7·26 Şub

@bronzeagepapi One solution that I proposed: openreview.net/pdf?id=5XfxEQ2…

English

Kirito (e/acc) 🏴‍☠️@bronzeagepapi·26 Şub

@itsNVA7 Recommend these, they’re great mental models arxiv.org/abs/2512.24695 arxiv.org/abs/2504.13173

English

Nishanth Anand@itsNVA7·26 Şub

English

116

7.4K

Nishanth Anand@itsNVA7·26 Şub

@MaheshwariWork Nice idea! I believe the separation should also be along representations to enable instance-based learning or learning from rare-events, which NNs can’t do. You might find my works interesting: arxiv.org/abs/2312.11669 openreview.net/pdf?id=5XfxEQ2…

English

105

Mansi Maheshwari@MaheshwariWork·26 Şub

@itsNVA7 Agreed! I decouple plasticity and stability through a dual network architecture. Check this out: arxiv.org/abs/2512.01034 (Recently accepted at AAMAS)

English

158

Nishanth Anand@itsNVA7·26 Şub

@bronzeagepapi Hahah! Science is slow, and I have a long life ahead of me to figure it out.

English

Kirito (e/acc) 🏴‍☠️@bronzeagepapi·26 Şub

@itsNVA7 Indeed

English

Nishanth Anand@itsNVA7·26 Şub

@bronzeagepapi Possibly! There are many more solutions that fit the desiderata.

English

223

Kirito (e/acc) 🏴‍☠️@bronzeagepapi·26 Şub

@itsNVA7 Associative memory?

English

235

Nishanth Anand@itsNVA7·26 Şub

This post is inspired by numerous discussions with my Ph.D. Supervisor, Doina Precup. And thanks to @khurram and my friends for their valuable comments on the initial draft.

English

381

Nishanth Anand@itsNVA7·26 Şub

This strategy ensures rapid learning from a rare event by leveraging the transient system, which also shields the permanent system from abrupt changes demanded by the raw online experience, thereby preserving previously learned knowledge.

English

441

Nishanth Anand@itsNVA7·24 Şub

I am excited to share the past 5+ years of my PhD work on continual reinforcement learning at @Cohere_Labs on March 3rd! Doina and I spent significant time and thought developing this framework; we believe this holds the key to continual learning.

Cohere Labs@Cohere_Labs

English

1.9K

Nishanth Anand retweetledi

Richard Sutton@RichardSSutton·19 Şub

David Silver's new $4bn company, Ineffable Intelligence, will fulfil the promise of the Era of Experience. youtube.com/watch?v=zzXyPG… ft.com/content/dffe72…

YouTube

English

720

60.8K

Nishanth Anand retweetledi

Ewan Morrison@MrEwanMorrison·10 Şub

Since AI has been added to sinus surgery - "Two (patients) suffered strokes after surgeons accidentally damaged carotid arteries while the system allegedly misinformed them about where their instruments were inside patients' heads."

Hedgie@HedgieMarkets

🦔 Since Johnson & Johnson added AI to its TruDi Navigation System for sinus surgery in 2021, the FDA has received reports of at least 100 malfunctions and adverse events, up from 8 before the AI was added. At least 10 patients were injured. Two suffered strokes after surgeons accidentally damaged carotid arteries while the system allegedly misinformed them about where their instruments were inside patients' heads. My Take Medical device makers are racing to add AI to their products because it looks good in marketing materials and investor presentations. One lawsuit alleges the company pushed AI into TruDi "as a marketing tool" to claim it had "new and novel technology," and set a goal of only 80% accuracy before shipping it. Eighty percent accuracy is fine for a playlist recommendation. It's not fine for software telling a surgeon where his instrument is inside someone's skull. The FDA has now authorized over 1,350 AI-enabled medical devices, double the number from 2022. Researchers found that 43% of recalls for these devices happened less than a year after approval, twice the rate of non-AI devices. This is what happens when AI becomes a checkbox for fundraising and marketing instead of a technology you deploy because it actually works better. The rush to put AI on everything is running ahead of anyone's ability to know if it's safe. Patients are the ones finding out. Hedgie🤗

English

1.8K

7.9K

488.8K

Nishanth Anand@itsNVA7·4 Şub

8 years ago, I sat in Doina’s RL class as an enthusiastic MSc student. This winter, I’m co-teaching that same class with her at McGill 👨‍🏫 Life comes full circle 🔁

English

4.3K

Nishanth Anand@itsNVA7·9 Oca

Had a chance to attend this talk at our group meeting. Very exciting work!!

Mandana Samiei@MandanaSamiei

Got to catch the talk on this paper right before the Christmas break, super cool and inspiring! Temporal abstractions emerge in pretrained seq models and can be used to compress long sequences into latent actions, a promising route for hierarchical RL. Can’t wait to dive deeper.

English

303

Nishanth Anand retweetledi

David Abel@dabelcs·4 Kas

Thrilled to share our new #NeurIPS2025 paper done at @GoogleDeepMind, Plasticity as the Mirror of Empowerment We prove every agent faces a trade-off between its capacity to adapt (plasticity) and its capacity to steer (empowerment) Paper: david-abel.github.io/plasticity.pdf 🧵🧵🧵👇

English

450

101.6K

Nishanth Anand retweetledi

Khurram Javed@kjaved_·18 Eki

The Dwarkesh/Andrej interview is worth watching. Like many others in the field, my introduction to deep learning was Andrej’s CS231n. In this era when many are involved in wishful thinking driven by simple pattern matching (e.g., extrapolating scaling laws without nuance), it’s refreshing to hear an influential voice that is tethered to reality. One clarification for the podcast is that when Andrej says humans don’t use reinforcement learning, he is really saying humans don't use returns as learning targets. His example of LLMs struggling to learn to solve math problems from outcome-based rewards also elucidates the problem with learning directly from returns. Fortunately for RL, this exact problem is solved by temporal difference (TD) learning. All sample-efficient RL algorithms that show human-like learning (e.g., sample-efficient learning on Atari, and our work on learning from experience directly on a robot) rely on TD learning. Now Andrej is not primarily an RL person; he is looking at RL through the lens of LLMs these days, and all RL done in LLMs uses returns as targets, so it’s understandable that he is assuming that RL is all about learning from observed returns. But this assumption leads him to the incorrect conclusion that we need process-based dense rewards for RL to work. If you embrace TD learning, then you don't necessarily need a dense reward. Once you have learned a value function that encodes useful knowledge about the world, you can learn on the fly in the absence of rewards, just like humans and animals. This is possible because in TD learning there is no difference between learning from an unexpected reward and learning from an unexpected change in perceived value.

Dwarkesh Patel@dwarkesh_sp

The @karpathy interview 0:00:00 – AGI is still a decade away 0:30:33 – LLM cognitive deficits 0:40:53 – RL is terrible 0:50:26 – How do humans learn? 1:07:13 – AGI will blend into 2% GDP growth 1:18:24 – ASI 1:33:38 – Evolution of intelligence & culture 1:43:43 - Why self driving took so long 1:57:08 - Future of education Look up Dwarkesh Podcast on YouTube, Apple Podcasts, Spotify, etc. Enjoy!

English

449

196.3K

Nishanth Anand retweetledi

Milad Aghajohari@MAghajohari·25 Nis

Multi-Agent RL fails in real life. Agents cooperating to solve tasks remains a utopia. -No scalable algorithms for general-sum games. -In a simple apple-harvesting game, PPO agents overharvest and ruin bushes. Advantage Alignment (ICLR 2025 Oral📢) is a huge step forward. 1/n

English

8.8K

Nishanth Anand@itsNVA7·19 Mar

@MartinKlissarov @GoogleDeepMind @egrefen Congratulations Martin! 🎉

English

183

Martin Klissarov@MartinKlissarov·19 Mar

Thrilled to share I've joined @GoogleDeepMind in London! Grateful to work with the brilliant @egrefen & a great team towards open-ended autonomous assistants. To celebrate, I took my daughter to see the best view in London—she made sure I saw nothing. Guess I need an assistant!

English

522

25.6K

Keşfet

@Mila_Quebec @bronzeagepapi @MaheshwariWork @khurram @Cohere_Labs @GoogleDeepMind @elonmusk @BarackObama