Puneesh Deora

767 posts

Puneesh Deora

@puneeshdeora

PhD student at UBC. Working on theory of Deep Learning.

เข้าร่วม Ağustos 2019

370 กำลังติดตาม136 ผู้ติดตาม

ทวีตที่ปักหมุด

Puneesh Deora@puneeshdeora·16 Ara

Check out our recent work on understanding and comparing generalization of Shampoo/Muon with GD 👇

Bhavya Vasudeva@bhavya_vasudeva

📢 Late post Our recent work studies when and why spectrum-aware optimizers like Muon generalize better than standard Euclidean gradient descent (GD) 🧵below (1/N)

English

621

Puneesh Deora รีทวีตแล้ว

ICML Conference@icmlconf·2 Nis

Announcing the #ICML2026 tutorials! All ten tutorials will be presented the first day of the conference, Monday July 6. Read the blog post for more details on the selection process!

English

14.4K

Puneesh Deora รีทวีตแล้ว

Juno KIM@junokim_ai·30 Mar

Excited to share our new paper on sharp capacity scaling of the Muon optimizer! Joint work with @EshaanNichani Denny Wu @albertobietti @jasondeanlee: arxiv.org/abs/2603.26554 (1/7)

English

123

19.7K

Puneesh Deora@puneeshdeora·11 Mar

@f14bertolotti I was looking for a wall clock time comparison instead of just steps

English

253

Francesco Bertolotti@f14bertolotti·11 Mar

Another optimizer to watch: Mousse, which combines SOAP and Muon. Early results show promising scaling up to roughly 1B-parameter LLMs, and the experiments look quite solid. Really cool work! 🔗arxiv.org/pdf/2603.09697

English

5.4K

Puneesh Deora@puneeshdeora·11 Mar

A big part is learning things, learning how to think, what to think, developing taste. Sending papers in the void is no good, but still formalizing and writing things helps. I'm not sure agent-written code would help with these

Jon Barron@jon_barron

If I was a grad student today, I would: 1) Not write papers, 2) push my (agent-written) code to a public repo ~weekly, 3) maintain (via agents) a writeup.tex (manually verified) and a skill.md in the repo, and 4) work towards establishing skill usage as the new "citation" format.

English

Puneesh Deora รีทวีตแล้ว

Edgar Dobriban@EdgarDobriban·9 Mar

AI is getting great at math, but how good is it at solving real research problems in areas outside of those covered by Erdős problems? Towards gauging this, I have started putting together a list of unsolved research problems in mathematical statistics and machine learning, sourced from recent papers in a leading statistics journal, the Annals of Statistics (with some bonus COLT open problems: solveall.org. Currently >100 problems. In my view, much of the value of AI for researchers in the mathematical sciences stems from helping with their own research problems. These are problems without known solutions. There are many math benchmarks, but few with the following properties: (1) of a realistic research-level, so that solving them can potentially lead to a publication in a top journal (problems discussed in papers already, not contest math, not Millenium problems, not problems created for a benchmark, not problems that have a known solution); I'd say Erdős problems are the best example of this. (2) cover problems outside of the usual focus (combinatorics, number theory, ... ) of Erdős problems. Especially under-represented are domains of applied math, along with statistics, operations research, etc. I'm interested in statistics and ML, so that's where I started, but this could grow over time. Hope this can grow into something useful to the community! Happy to hear your thoughts...

English

430

54.4K

Puneesh Deora@puneeshdeora·3 Mar

I can finally pack the 8D oranges in my kitchen I’ve been putting off for a while. Feeling quite relieved and satisfied.

Math, Inc.@mathematics_inc

We are pleased to share that using Gauss, we have completed a ~200K LOC formalization of Maryna Viazovska’s 2022 Fields Medal theorems on optimal sphere packing in dimensions 8 and 24. This is the only Fields Medal-winning result from this century to be completely formalized, and is the largest single-purpose Lean formalization in history. We are honored to have assisted @SidharthHarihar1 and the rest of the sphere packing team in this achievement. math.inc/sphere-packing

English

540

Puneesh Deora@puneeshdeora·2 Mar

@miniapeur Some of the issues I think we could also fix ourselves; I've seen this lot in papers citing other papers x.com/i/status/20202…

Puneesh Deora@puneeshdeora

Idk if people know this, but google scholar does not index the second reference format type (first it does), and the second is the bibtex you get from arxiv.

English

139

Mathieu@miniapeur·1 Mar

What is a good alternative to Google Scholar, particularly one that counts citations accurately?

English

23.5K

Puneesh Deora@puneeshdeora·1 Mar

Most definitely agree, while these tools level the playing field in a way by reducing the technical barriers, the thinking barrier still exists. Those who have practiced what/how to think in a more organic way will move even faster than before.

Haider.@haider1

OpenAI Sebastian Bubeck says deep expertise is more important than ever in the AI age to get maximum value from AI, you need enough real understanding to describe the problem clearly "this creates the gap between people who keep studying and those who rely too much on AI"

English

120

Puneesh Deora@puneeshdeora·25 Şub

Me laughing at ChatGPT in 2022 trying to solve research relevant Math vs now

English

Puneesh Deora@puneeshdeora·23 Şub

A large/medium scale experiment like this for a bunch of models would be interesting, but evaluation seems way too time consuming.

Dmitry Rybin@DmitryRybin1

I ran experiments with GPT-5.2-Pro on 20 latest arXiv preprints to see if it can: - independently prove the main theorem - find big mistakes in papers Statistics: - 1 easy proof can be re-derived by GPT-5.2-Pro - 1 contained a critical error - 1 more case is quite interesting ⬇️

English

118

Puneesh Deora@puneeshdeora·22 Şub

Daniel has been my go to commentator over the last year for AI and math. Unsurprisingly, this is a very good piece regarding recent progress and future timelines.

Daniel Litt@littmath

Some thoughts on AI and mathematics, inspired by "First Proof."

English

228

Puneesh Deora@puneeshdeora·15 Şub

@ben_golub Also, do we know how much human supervision was involved? They say "limited" but idk what that means

English

1.2K

Ben Golub@ben_golub·15 Şub

So what's the state of First Proof? Is there some consensus on how OAI did?

English

17.6K

Puneesh Deora@puneeshdeora·14 Şub

@yangpliu @littmath I feel one thing that comes out of attempting proofs with AI and is fully realizing how much information is already out there and the amount of new results we can get using core ideas that already exist, but we just can't search so efficiently but AI can

English

2.2K

Yang Liu@yangpliu·14 Şub

My thoughts on #1stProof Problem 6 (closely related to areas I've worked in): OpenAI’s solution is essentially correct, and the difficulty feels consistent with AI capabilities over the past several months. More detail in the thread.

English

380

79.5K

Puneesh Deora รีทวีตแล้ว

Difan Zou@difanzou·14 Şub

We are excited to share our latest work on the implicit bias of stochastic steepest descent! While optimizers like Adam and Muon, variants of steepest descent under different norms, are popular for large-scale pretraining, their theoretical behavior under mini-batch stochastic gradients has remained elusive. We provide a unified analysis of how batch size, momentum, and variance reduction shape the solutions these algorithms find. Key findings: 🔹 No Momentum: Convergence to the (approximated) max-margin solution requires large batches. Small batches fail to recover the full-batch implicit bias. 🔹 Momentum: Acts as a stabilizer! It enables convergence to the max-margin solution even with small batches, though at a slower rate. 🔹 Variance Reduction: A better fix, it recovers the exact full-batch implicit bias regardless of batch size or momentum. 🔹 Small Batch Size (B=1): In the extreme case (per-sample updates), the implicit bias fundamentally changes, which provably cannot be explained by standard max-margin solutions. We believe thoroughly understanding these algorithms in theory is crucial for developing the foundation of large model training, and our work provides the initial attempts along this line. Paper: arxiv.org/abs/2602.11557 Joint work with Jichu Li (fantastic undergrad student) and Xuan Tang (genius grad student).

English

148

9.5K

Puneesh Deora@puneeshdeora·8 Şub

I think it's got to do with non-contiguous meta data, when you separate arxiv id from title and authors with a url or whatever it messes up. It's the format you get using @ misc entry type which is default for arxiv's bibtex generator. Use @ article entry type instead.

English

Puneesh Deora@puneeshdeora·8 Şub

Idk if people know this, but google scholar does not index the second reference format type (first it does), and the second is the bibtex you get from arxiv.

English

214

Puneesh Deora@puneeshdeora·2 Şub

15 simple things put in the right way gives something great. Nothing surprising about it

Mathieu@miniapeur

English

Puneesh Deora@puneeshdeora·14 Oca

@damekdavis Cool work! We had a recent paper studying Spectral methods vs GD on imbalanced data, which might be of interest x.com/i/status/20010…

Bhavya Vasudeva@bhavya_vasudeva

📢 Late post Our recent work studies when and why spectrum-aware optimizers like Muon generalize better than standard Euclidean gradient descent (GD) 🧵below (1/N)

English

170

Damek@damekdavis·14 Oca

Finally finished our substantial revision of this paper and uploaded to arXiv. It's much cleaner and clearer, though not any shorter. I'll write a followup thread in the next couple of days when it appears!

Damek@damekdavis

New paper studies when spectral gradient methods (e.g., Muon) help in deep learning: 1. We identify a pervasive form of ill-conditioning in DL: post-activations matrices are low-stable rank. 2. We then explain why spectral methods can perform well despite this. Long thread

English

9.6K

Puneesh Deora@puneeshdeora·3 Oca

I was using the beautiful Marchenko-Pastur theorem today and randomly decided to check their wiki pages and learned that Volodymyr Marchenko passed away a few hours ago, aged 103. RIP.

English

257

Puneesh Deora@puneeshdeora·29 Ara

I just learned that Kepler in 1611 conjectured that cubic close packing is the most optimal way of arranging spheres (in terms of highest average density) and it took ~400 years for the formal proof :)

English

102

ค้นพบ

@EshaanNichani @albertobietti @jasondeanlee @f14bertolotti @miniapeur @ben_golub @yangpliu @littmath