Nishanth Anand

657 posts

Nishanth Anand banner
Nishanth Anand

Nishanth Anand

@itsNVA7

Ph.D. student in Continual Reinforcement Learning @Mila_Quebec, @mcgillu, @rllabmcgill

Montreal, Canada Katılım Ağustos 2013
615 Takip Edilen978 Takipçiler
Sabitlenmiş Tweet
Nishanth Anand
Nishanth Anand@itsNVA7·
Continual learning should be viewed through the lens of the stability-plasticity dilemma, and solving it requires rethinking learning architectures. I argue why in my first continual learning blog: itsnva7.substack.com/p/continual-le…
English
3
13
116
7.4K
Nishanth Anand retweetledi
Siva Reddy
Siva Reddy@sivareddyg·
Montreal deep tech scene is getting hot!! Many recent hires of Cohere, Mistral, Periodic Labs, Poolside are all based in Montreal. And now, AMI will have an office here 🔥 It's a no-brainer, though. @Mila_Quebec has the highest concentration of deep learning expertise with interdisciplinary connections. Thanks to recent US regulation changes on immigration, no more brain drain! Let's build more in Canada!
Yann LeCun@ylecun

Unveiling our new startup Advanced Machine Intelligence (AMI Labs). We just completed our seed round: $1.03B / 890M€, one the largest seeds ever, probably the largest for a European company. We're hiring! [the background image is the Veil Nebula - a picture I took from my backyard, most appropriate for an unveiling] More details here: techcrunch.com/2026/03/09/yan…

English
18
49
751
71.9K
Nishanth Anand retweetledi
Cohere Labs
Cohere Labs@Cohere_Labs·
Don't forget to tune in tomorrow, Tuesday, March 3rd for a session with @itsNVA7 focused on "The permanent and transient framework for continual reinforcement learning." Learn more: cohere.com/events/cohere-…
Cohere Labs@Cohere_Labs

Our Reinforcement Learning group is excited to welcome @itsNVA7 for a presentation on "The permanent and transient framework for continual reinforcement learning" on Tuesday, March 3rd. Thanks to @rahul_narava and Gusti Triandi Winata for organizing this event! 🔥 Learn more: cohere.com/events/cohere-…

English
0
2
9
1.4K
Nishanth Anand
Nishanth Anand@itsNVA7·
Continual learning should be viewed through the lens of the stability-plasticity dilemma, and solving it requires rethinking learning architectures. I argue why in my first continual learning blog: itsnva7.substack.com/p/continual-le…
English
3
13
116
7.4K
Nishanth Anand
Nishanth Anand@itsNVA7·
This post is inspired by numerous discussions with my Ph.D. Supervisor, Doina Precup. And thanks to @khurram and my friends for their valuable comments on the initial draft.
English
0
0
4
381
Nishanth Anand
Nishanth Anand@itsNVA7·
This strategy ensures rapid learning from a rare event by leveraging the transient system, which also shields the permanent system from abrupt changes demanded by the raw online experience, thereby preserving previously learned knowledge.
English
1
0
3
441
Nishanth Anand
Nishanth Anand@itsNVA7·
I am excited to share the past 5+ years of my PhD work on continual reinforcement learning at @Cohere_Labs on March 3rd! Doina and I spent significant time and thought developing this framework; we believe this holds the key to continual learning.
Cohere Labs@Cohere_Labs

Our Reinforcement Learning group is excited to welcome @itsNVA7 for a presentation on "The permanent and transient framework for continual reinforcement learning" on Tuesday, March 3rd. Thanks to @rahul_narava and Gusti Triandi Winata for organizing this event! 🔥 Learn more: cohere.com/events/cohere-…

English
0
4
35
1.9K
Nishanth Anand retweetledi
Ewan Morrison
Ewan Morrison@MrEwanMorrison·
Since AI has been added to sinus surgery - "Two (patients) suffered strokes after surgeons accidentally damaged carotid arteries while the system allegedly misinformed them about where their instruments were inside patients' heads."
Hedgie@HedgieMarkets

🦔 Since Johnson & Johnson added AI to its TruDi Navigation System for sinus surgery in 2021, the FDA has received reports of at least 100 malfunctions and adverse events, up from 8 before the AI was added. At least 10 patients were injured. Two suffered strokes after surgeons accidentally damaged carotid arteries while the system allegedly misinformed them about where their instruments were inside patients' heads. My Take Medical device makers are racing to add AI to their products because it looks good in marketing materials and investor presentations. One lawsuit alleges the company pushed AI into TruDi "as a marketing tool" to claim it had "new and novel technology," and set a goal of only 80% accuracy before shipping it. Eighty percent accuracy is fine for a playlist recommendation. It's not fine for software telling a surgeon where his instrument is inside someone's skull. The FDA has now authorized over 1,350 AI-enabled medical devices, double the number from 2022. Researchers found that 43% of recalls for these devices happened less than a year after approval, twice the rate of non-AI devices. This is what happens when AI becomes a checkbox for fundraising and marketing instead of a technology you deploy because it actually works better. The rush to put AI on everything is running ahead of anyone's ability to know if it's safe. Patients are the ones finding out. Hedgie🤗

English
36
1.8K
7.9K
488.8K
Nishanth Anand
Nishanth Anand@itsNVA7·
8 years ago, I sat in Doina’s RL class as an enthusiastic MSc student. This winter, I’m co-teaching that same class with her at McGill 👨‍🏫 Life comes full circle 🔁
Nishanth Anand tweet mediaNishanth Anand tweet mediaNishanth Anand tweet media
English
1
1
69
4.3K
Nishanth Anand retweetledi
David Abel
David Abel@dabelcs·
Thrilled to share our new #NeurIPS2025 paper done at @GoogleDeepMind, Plasticity as the Mirror of Empowerment We prove every agent faces a trade-off between its capacity to adapt (plasticity) and its capacity to steer (empowerment) Paper: david-abel.github.io/plasticity.pdf 🧵🧵🧵👇
David Abel tweet media
English
25
67
450
101.6K
Nishanth Anand retweetledi
Khurram Javed
Khurram Javed@kjaved_·
The Dwarkesh/Andrej interview is worth watching. Like many others in the field, my introduction to deep learning was Andrej’s CS231n. In this era when many are involved in wishful thinking driven by simple pattern matching (e.g., extrapolating scaling laws without nuance), it’s refreshing to hear an influential voice that is tethered to reality. One clarification for the podcast is that when Andrej says humans don’t use reinforcement learning, he is really saying humans don't use returns as learning targets. His example of LLMs struggling to learn to solve math problems from outcome-based rewards also elucidates the problem with learning directly from returns. Fortunately for RL, this exact problem is solved by temporal difference (TD) learning. All sample-efficient RL algorithms that show human-like learning (e.g., sample-efficient learning on Atari, and our work on learning from experience directly on a robot) rely on TD learning. Now Andrej is not primarily an RL person; he is looking at RL through the lens of LLMs these days, and all RL done in LLMs uses returns as targets, so it’s understandable that he is assuming that RL is all about learning from observed returns. But this assumption leads him to the incorrect conclusion that we need process-based dense rewards for RL to work. If you embrace TD learning, then you don't necessarily need a dense reward. Once you have learned a value function that encodes useful knowledge about the world, you can learn on the fly in the absence of rewards, just like humans and animals. This is possible because in TD learning there is no difference between learning from an unexpected reward and learning from an unexpected change in perceived value.
Dwarkesh Patel@dwarkesh_sp

The @karpathy interview 0:00:00 – AGI is still a decade away 0:30:33 – LLM cognitive deficits 0:40:53 – RL is terrible 0:50:26 – How do humans learn? 1:07:13 – AGI will blend into 2% GDP growth 1:18:24 – ASI 1:33:38 – Evolution of intelligence & culture 1:43:43 - Why self driving took so long 1:57:08 - Future of education Look up Dwarkesh Podcast on YouTube, Apple Podcasts, Spotify, etc. Enjoy!

English
14
45
449
196.3K
Nishanth Anand retweetledi
Milad Aghajohari
Milad Aghajohari@MAghajohari·
Multi-Agent RL fails in real life. Agents cooperating to solve tasks remains a utopia. -No scalable algorithms for general-sum games. -In a simple apple-harvesting game, PPO agents overharvest and ruin bushes. Advantage Alignment (ICLR 2025 Oral📢) is a huge step forward. 1/n
Milad Aghajohari tweet media
English
3
20
89
8.8K
Martin Klissarov
Martin Klissarov@MartinKlissarov·
Thrilled to share I've joined @GoogleDeepMind in London! Grateful to work with the brilliant @egrefen & a great team towards open-ended autonomous assistants. To celebrate, I took my daughter to see the best view in London—she made sure I saw nothing. Guess I need an assistant!
Martin Klissarov tweet media
English
30
7
522
25.6K