Anirbit

172 posts

Anirbit

@anirbit_maths

Lecturer in ML, The University of Manchester Action Editor @ TMLR Associate Editor @ ACM-TOPML

Katılım Ocak 2025

107 Takip Edilen52 Takipçiler

Anirbit@anirbit_maths·3d

@sidgairo18 This is exactly the mess that TMLR solved. There are no scores in round-1 of reviews. There is only yes/no decision after rebuttals. There are enough reasons why every system needs to converge to this.

English

Siddhartha Gairola@sidgairo18·4d

Food for thought - 🤔 I've been thinking about this long and hard - having been reviewing for popular ML / CV conferences (ICML, ICLR, NeurIPS, CVPR, ICCV, ECCV) - with the community submitting papers across these, it only makes sense to have a uniform reviewer form, guidelines, rules and format across these conferences. Personally I have a real hard time calibrating my scale from 1-10 (ICLR) to 1-6 for CVPR, then we comes ICML which also has 1-6 but 3,4 are weak reject/accept instead of 3,4 as borderline reject/accept (for CVPR). This only gets trickier and worse when you add ICCV, ECCV, NeurIPS into the mix. Then, you add NLP related conferences and Robotics ones, to make the entire system more and more confusing - with uncalibrated reviewer scores coming - which may or may not truly reflect the reviewer's intentions. Happy to hear the thoughts of others. cc: @icmlconf @CVPR @NeurIPSConf @iclr_conf @ICCVConference @eccvconf

English

1.3K

Anirbit@anirbit_maths·8 Mar

India's #Sarvam LLMs are among the best in class LLMs in these size categories. Both are reasoning models implementing Grouped Query Attention (GQA) and Multi-Head Latent Attention (MLA).

Pratyush Kumar@pratykumar

📢 Open-sourcing the Sarvam 30B and 105B models! Trained from scratch with all data, model research and inference optimisation done in-house, these models punch above their weight in most global benchmarks plus excel in Indian languages. Get the weights at Hugging Face and AIKosh. Thanks to the good folks at SGLang for day 0 support, vLLM support coming soon. Links, benchmark scores, examples, and more in our blog - sarvam.ai/blogs/sarvam-3…

English

Anirbit@anirbit_maths·23 Şub

@akshayrangamani Hello Akshay 😁 For a start, gradient-flow is a PDE! Way too much of modern AI is hinged on understanding gradient-flows. Then SDEs, underlie so much of noisy-S/GD, which we understand by their density evolution by FPS PDE.

English

the machine learner has logged on@akshayrangamani·23 Şub

@anirbit_maths Isn't the post saying PDEs aren't really represented in modern AI? What kind of PDE concepts are you thinking of?

English

Sanjeev Arora@prfsanjeevarora·22 Şub

Nice set of books, but not sure if many AI experts know **all** these topics.

dr. jack morris@jxmnop

it always disappointed me that such a small subset of mathematical ideas matter for AI i miss doing real math

English

9.2K

Anirbit@anirbit_maths·22 Şub

I mentioned this "test of AGI" in so many conversations 🙄 Only when @demishassabis says it, people start to talk about it 🙂

Rohan Paul@rohanpaul_ai

Demis Hassabis’s “Einstein test” for defining AGI: Train a model on all human knowledge but cut it off at 1911, then see if it can independently discover general relativity (as Einstein did by 1915); if yes, it’s AGI.

English

125

Anirbit@anirbit_maths·20 Şub

Live translation of lectures by #SarvamAI was a highlight at the #IndiaAISummit2026 🔥

RapperPandit@RapperPandit

🚨My God ! I was not understanding why this Video is Viral , the Prof is just giving a Lecture normally shifting from one language to other. Then I come across the post by @thebetterindia . And I came to know that it was a Realtime translation.

English

Anirbit@anirbit_maths·6 Şub

@thegautamkamath My anecdotal evidence is that rebuttals in general (for majority?) do no good to the authors. Is there statistics on how many papers crossed the acceptance threshold after rebuttals?

English

504

Gautam Kamath@thegautamkamath·6 Şub

Suppose one of NeurIPS/ICML/ICLR decided to do away with all rebuttals. Acceptances/rejections would be decided by the reviewers and the ACs, without input from the authors beyond the submissions. Which would you, as an author and a reviewer jointly, prefer?

English

12K

Anirbit@anirbit_maths·6 Şub

My post-doc mentor gets the COPSS award! Incredible! Congratulations to @weijie444 ! My students are still building projects on top of Weijie's works! community.amstat.org/copss/awards/p…

English

Anirbit@anirbit_maths·28 Oca

#MyXAnniversary 😎 It started with posting about our paper on provable training of nets of any size 😁

English

Anirbit@anirbit_maths·22 Oca

@kamalikac Is there a pathway from mechanistic interpretability to task specific neural architecture search? I would love to know your views on this 🙂

English

Kamalika Chaudhuri@kamalikac·21 Oca

I am looking for more topics for blog-posts; please DM your suggestions! Topics can be anything related to AI, privacy/security/safety, generalization, LLMs, career advice. A reminder that I cannot blog about my employers or anything specific/sensitive to them.

English

2.5K

Anirbit@anirbit_maths·22 Oca

@andrewgwils I thought overparametrization is a silver bullet. Then LLMs happened where almost always, it seems #training-tokens >>> #parameters ?

English

264

Andrew Gordon Wilson@andrewgwils·22 Oca

Have you ever reversed your position on a strongly held technical belief? What was the belief and what convinced you to change your mind?

English

20.8K

Anirbit@anirbit_maths·22 Oca

Seems our work on using Villani functions to prove neural training is now among the most-read recent papers in the IMA journal, II! 💥 This is the first (only?) truly “beyond-NTK” proof. Do check our related recent work, arxiv.org/abs/2503.10428…

English

137

Anirbit@anirbit_maths·13 Oca

@pfau I think parametetized PDEs provide a natural notion of in/out-domain. No amount of data can encompass fluid-flow at all possible viscosities. There is always an unseen value of that where one can ask for predictions - and possibly falter?

English

David Pfau@pfau·12 Oca

This is the key difference between in-domain and out-of-domain generalization, and we still have not truly solved out-of-domain generalization. It just turns out you can build world changing technology by throwing so much data at things that the entire universe is in-domain.

Niels Rogge@NielsRogge

One of the best visual explanations I've ever seen for why scaling Transformers works, but is suboptimal, as it's just brute-forcing things, by @YesThisIsLion (co-author of the Transformer) on @MLStreetTalk "In the (rejected) paper "Intelligent Matrix Exponentiation", they show the decision boundary of a classic MLP with a ReLu/Tanh activation function on the classic Spiral dataset." "You can see they both technically solve it with great scores on the test set. Next, they show the decision boundary of the "M-layer" they propose in the paper. And it represents the spiral ... as a spiral!" "Shouldn't we? If the data is a spiral... shouldn't we represent it as a spiral?" "If you look back at the decision boundaries of the MLP, it's clear that you just have these tiny, piecewise separations without learning the concept of a spiral. That's what I mean!" "If you train these things enough, it can fit the spiral and get a high accuracy. But there's no indication that the MLP actually understands a spiral. When you represent it as a spiral, it extrapolates correctly, cause the spiral just keeps going out."

English

337

35.9K

Anirbit@anirbit_maths·21 Ara

This was the slide where I outlined the 2 key questions which I think are foundational to progress with neural operators & #AI4Science . ACM IKDD #CODS2025, gave a platform for such discussions between new academics and subject stalwarts in the audience 💥

Anirbit@anirbit_maths

Gave my "new faculty highlight" talk at the ACM IKDD #CODS 2025 - where I outlined a vision for neural operator research - and reviewed our 2 #TMLR papers from 2024, in the theme.

English

157

Anirbit@anirbit_maths·20 Ara

Gave my "new faculty highlight" talk at the ACM IKDD #CODS 2025 - where I outlined a vision for neural operator research - and reviewed our 2 #TMLR papers from 2024, in the theme.

English

350

Anirbit@anirbit_maths·16 Ara

@deepcohen As much as I agree with the idea of giving "falsifiable predictions", I feel it would be near-impossible to publish such papers. A lot of theory is cool to do but not done because of this official requirement. I hope to be wrong about this!

English

Jeremy Cohen@deepcohen·15 Ara

So, we should focus on theories that can reliably predict “the small things” about deep learning, and gradually broaden the scope of what we can predict, until we have theory that can reliably predict “the big things” about deep learning too.

English

2.7K

Jeremy Cohen@deepcohen·15 Ara

The goal of deep learning theory/science is to guide practice. But most practical questions are >1 paper away from being legitimately answered by theory. How, then, can we make progress, without access to the ideal reward signal of “does this theory give us a SOTA algorithm?” …

English

182

26.9K

Anirbit@anirbit_maths·11 Ara

@PreetumNakkiran Ofcourse one can't rule out that tomorrow a training guarantee might emerge that critically leverages some subnet property. If that happens, that could be a serious bolstering of the view of mechanistic interpretability.

English

Anirbit@anirbit_maths·11 Ara

@PreetumNakkiran "understand" means opposite things to these 2 communities. As someone wanting provable training, I am far less concerned about the complexity of my proof. Mechanistics are hoping the otherway, that key features of big models somehow exist in simpler subnets.

English

176

Preetum Nakkiran@PreetumNakkiran·11 Ara

“Theory of deep learning” went through similar discussions about its goals & purpose some ~5yrs ago. Someone should write about the relations between mech-interp & theory: two communities w/ fundamentally similar motivations (“understand neural nets”), but very different methods.

David Bau@davidbau

At the #Neurips2025 mechanistic interpretability workshop I gave a brief talk about Venetian glassmaking, since I think we face a similar moment in AI research today. Here is a blog post summarizing the talk: davidbau.com/archives/2025/…

English

13.5K

Anirbit@anirbit_maths·10 Ara

@gowthami_s David Pfau's results are from years ago. It was clear right then that AI-for-Science works 🙂

English

Gowthami@gowthami_s·9 Ara

Tbh, this NeurIPS changed my perspective on "AI for science"! It looks like things are working, and there's also a lot of interest from traditional companies - both in material discovery and biotech. A topic worth exploring for the current generation of PhDs!

English

136

9.1K

Anirbit@anirbit_maths·10 Ara

A recent paper improved one of my PhD 1st year results by 0.63. Surprising that our upperbound held for 9 yrs 🤣 These are things only maths people get excited about 😁 Still Open : Are 2 layers sufficient for a net to compute the maximum of n numbers? 💥

English

Keşfet

@sidgairo18 @icmlconf @CVPR @NeurIPSConf @iclr_conf @ICCVConference @eccvconf @akshayrangamani