Shuning Shang

26 posts

Shuning Shang banner
Shuning Shang

Shuning Shang

@susieshang

CS PhD @Princeton | Prev undergrad @ZJU_China I'm interested in ML Theory

Princeton, NJ Katılım Şubat 2024
297 Takip Edilen177 Takipçiler
Sabitlenmiş Tweet
Shuning Shang retweetledi
Yiping Wang
Yiping Wang@ypwang61·
We improve a 32-year lower bound in a challenging open problem, Ramsey numbers, through simply scaling autoresearch. ⭕ Proves R(3,17) >= 93. Previous 92 bound were obtained in 1994. Google’s AlphaEvolve (2026) matched previous result but did not beat it. All could be done with Claude Code / Codex + a CPU server. Graphs and evolving history are available at github.com/ypwang61/Scale… [1/n]
English
11
49
324
52K
Shuning Shang retweetledi
Boaz Barak
Boaz Barak@boazbaraktcs·
Some thoughts... Imagine that there is an AI that if you ask it questions such as "Is T true?" when T is a precise mathematical statement, then it gives you either a proof, or disproof, or tells you it doesn't know, but it never makes a mistake. And moreover, imagine that every 6 months, there is a new model that can handle more and more difficult problems T, eventually reaching the level of the Riemann Hypothesis or P vs NP. I think different people would handle it differently. For some mathematicians, this might be the most exciting time in the history of math, and they would use these models to explore terrains at a rate higher than before. The whole norms of the math community would have to change, with a premium not so much on settling questions but on finding ways to map out and simplify the relation between problems. Personally, I think that an "egoless" but brilliant AI could be put to great use in simplifying old proofs rather than proving new results. IMO there has been far too little effort invested in the former. For other people, it may be very different than the reasons that they got into math in the first place. Or (like me) they would feel that the moment is one where the main story is AI and its impact on humanity. I used to care very deeply about the unique games conjecture, but I guess I am "monogamous" in my intellectual life (as in my personal one..) and these days too focused on AI alignment to even try attacking it with AI. (It's also still too soon - models are not yet good enough.) I would still love to see the UGC get resolved, especially if I could interrogate the AI to explain the proof to me in a way tailored to my taste and understanding. But I won't deny that if the UGC gets resolved by an AI, it will feel very bittersweet, and an end of an era of sitting in coffee shops, sometimes but myself and sometimes with others, spending hours talking and working out (mostly wrong) ideas on pads of paper.
jacob tsimerman@Jacob_Tsimerman

I want to clarify my thoughts on problem-solving in mathematics, and the potential consequences of AI for the field. For context, I’m quoting here my post in reply to Daniel Litt (who, echoing others, I find very clear, grounded, and insightful in his thinking). The claim The short version is that I think problem-solving is an immense, and pervasive part of modern mathematical research. Consequently, if human problem-solving disappears by virtue of the AIs becoming strictly and substantially better at it, then most of the time currently spent by modern mathematical researchers will have to be spent on an activity that is altogether pretty different. Whether such an activity is viable as a professional endeavour is something I am unsure of, but strongly encourage others to think about and try to envision, so that if/when the time comes, we can steer such a future into being. Allow me to make this somewhat concrete: by problem-solving I mean questions of the form “is T true? If so find a proof. If not, find a disproof.” where T is a precise mathematical statement. I’ll also include “find an example of S, if there is one” where S is some structure (variety/category/property/isomorphism/….). The argument Ok. Now as I said (and some have echoed) I spend ~all of my time problem-solving as my primary goal. This has sub-goals, but my entire main research field disappears if someone solves the Zilber-Pink Conjecture in its more general form. This is a single conjecture (precisely stated!) and lots of mathematicians, postdocs, and graduate students are engaged in picking apart special cases of it, trying strategies, finding analogies to develop intuition, etc.. Of course, lots of motivation and intuition and analogizing and understanding have gone into deciding to make the ZP conjecture a focus! But the fact remains that this is now what is being worked on ~all of the time by this community. This is true of many mathematicians. They have a problem (or ten) and spend most of their time doing it. If someone solves it, they have to find a different problem. This can be a big, disorienting process involving a lot of energy, and is neither trivial nor always fun (though often rewarding in the end). People have written a lot about Theory building vs. Problem-solving, and I want to first of all clarify I have nothing against theory building or theory builders! It is a valuable part of mathematics, and while there are differences in perspective between the “camps” there is way more mutual respect and agreement. However, I gather there is a perception that theory-builders spend most of their time not-problem-solving, and I think this is largely untrue. Now I’m not a theory-builder primarily (though I’ve partaken a LITTLE BIT by necessity) so I am outside of my comfort zone. As such, I apologize for mistakes and welcome corrections! But theory-building constantly runs through problem-solving. Let’s say you want to define the right notion of a cohomology theory. Of course you must make candidate definitions. But then what does it mean for it to be the right one? Well, you start asking if it has natural properties. These are T statements. Does it satisfy a Kunneth formula? Is it functorial in the right way? When you have the wrong one you have to find the properties it’s missing, and when you have the right one you have to prove that it indeed has those properties. Again, I am not saying nor do I believe that this makes problem-solving “real math” and theory-building lesser. I am just trying to draw attention to the way I think research mathematicians operate, and mathematics is practiced. To put all this a different way, imagine you had access to an AI oracle that could resolve statements T, but somehow lacked any creativity to build technology or make definitions (I think this is unlikely, but for the purpose of this thought experiment lets imagine it). How would your mathematics change, if you were a theory builder? Well, you make a definition, and want to know if it’s the right one. You immediately ask your oracle a thousand questions. From “are these basic properties true” to “ooh, so is this deep conjecture true?” and start getting back answers, and amending your definitions. You could invent and resolve entire research directions in days. But the confusion you would have had to push through to flesh out your theory would largely (probably not entirely) be instantly resolved and the whole process sped up tremendously by your oracle. A big part of the process would be gone. This is very very different to modern mathematics. One more thought This post is too long already, but I’ve seen some people say that they only do mathematics to find truth and others valourize that as the only virtuous way to be. I do not do mathematics only to find truth. I do it largely because I enjoy it and I am good at it. I also find it beautiful and am grateful I get to spend my days understanding beautiful things. But I enjoy the challenge, the process, resolving confusions, finding strategies, grappling with problems. I would like to push for this being de-stigmatized. Mathematicians are people who need money, housing, food, love, exercise, and a great deal of other stuff including various forms of meaning. There are many people whose primary enjoyment of math comes through problem solving in one of its incarnations. If that disappears, that is not a trivial issue and many of them might not want to do it anymore (even if there were some way to proceed).

English
7
10
121
17.3K
Shuning Shang retweetledi
Yinghui He
Yinghui He@yinghui_he_·
RLVR gives sparse supervision; On-Policy Self-Distillation often requires high-quality demonstrations. Our new method, ✨SD-Zero✨, gets the best of both worlds – we use model’s self-revision to turn binary rewards into dense token-level supervision. No external teacher. No curated demonstrations. 🚨 Introducing Self-Distillation Zero (SD-Zero), which trains one model to play two roles: (1) “Generator” that makes attempts, and (2) “Reviser” that conditions on the generator’s failed/successful attempt + binary reward to produce a better answer. ‼️Even WRONG attempts can become the training signal.‼️ 🔗Paper: arxiv.org/abs/2604.12002 🏆 SD-Zero brings 10%+ improvement over base models (Qwen3,4B; Olmo3,7B) on math & code reasoning, beating GRPO and vanilla On-Policy Self-Distillation under the same training budget. SD-Zero also enables iterative self-evolution.
Yinghui He tweet mediaYinghui He tweet media
English
16
56
403
214.5K
Shuning Shang retweetledi
Boris Hanin
Boris Hanin@BorisHanin·
🚨 2026 @Princeton ML Theory Summer School 🔥 Learn from amazing researchers and meet your peers. Mini-courses by: - Subhabrata Sen @subhabratasen90 - Lenaic Chizat @LenaicChizat - Sinho Chewi - Elliot Paquette @poseypaquet - Elad Hazan @HazanPrinceton - Surya Ganguli @SuryaGanguli (to be confirmed) August 3 - 14, 2026 Apply by March 31, 2026. Link 👇 Sponsored by @NSF, @PrincetonAInews, @EPrinceton, @JaneStreetGroup, @DARPA, @PrincetonPLI, Princeton NAM, Princeton AI2, Princeton PACM
Boris Hanin tweet media
English
6
35
272
48K
Shuning Shang retweetledi
Kaiyue Wen
Kaiyue Wen@wen_kaiyue·
(1/n) Introducing Hyperball — an optimizer wrapper that keeps weight & update norm constant and lets you control the effective (angular) step size directly. Result: sustained speedups across scales + strong hyperparameter transfer.
Kaiyue Wen tweet media
English
27
125
709
201.7K
Shuning Shang retweetledi
idan shenfeld
idan shenfeld@IdanShenfeld·
People keep saying 2026 will be the year of continual learning. But there are still major technical challenges to making it a reality. Today we take the next step towards that goal — a new on-policy learning algorithm, suitable for continual learning! (1/n)
idan shenfeld tweet media
English
50
222
1.5K
238.5K
Shuning Shang retweetledi
Abhishek Panigrahi
Abhishek Panigrahi@Abhishek_034·
Distillation is a key step in training LLMs—but with so many possible teachers, picking the right one is hard. The best model is often not the best teacher. We propose GRACE to identify the best teacher for a given student to learn math tasks. It's a cost-efficient gradient score that can exhibit low regret across many teacher families and scales. Bonus: GRACE also guides key choices like teacher generation temperature. Theory + empirics. (1/6). Joint work w @BingbinL, @SadhikaMalladi, @ShamKakade6, @SurbhiGoel_ Arxiv: arxiv.org/abs/2511.02833 Blog: unprovenalgos.github.io/GRACE
Abhishek Panigrahi tweet media
English
3
22
125
15.8K
Shuning Shang retweetledi
rapha
rapha@rapha_gl·
GPT-5 is proof that synthetic data just keeps working! And that OpenAI has the best synthetic data team in the world 👁️ @SebastienBubeck the team has our eyeballs on you! 🙌
rapha tweet media
English
29
33
470
292.6K
Shuning Shang retweetledi
Yong Lin
Yong Lin@Yong18850571·
(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B model matches DeepSeek-671B on MiniF2F. 📚 Leading on MathOlympiadBench (IMO-level problems) * Solves 73 vs 50 over 671B DeepSeek Prover 🔓 Website: blog.goedel-prover.com 🔓 Model 32B: huggingface.co/Goedel-LM/Goed… 🔓 Model 8B huggingface.co/Goedel-LM/Goed… 🔓Data and training pipeline will be released soon. Amazing Collaborators: @sangertang1999 @Lyubh22 @__zrrr__ @juihuichung @thomaszhao1998 @pero733858111 @thiiis_user @EmilyJge @JingruoS5931 @wujiayun12 @GesiJiri68334 @davidjesusacu @KaiyuYang4 @hongzhou__lin @YejinChoinka @danqi_chen @prfsanjeevarora @chijinML
Yong Lin tweet mediaYong Lin tweet media
English
9
91
261
95.3K
Narutatsu Ri
Narutatsu Ri@narutatsuri·
【Life Update】 I’m happy to share that I will be starting a CS PhD at @PrincetonPLI under Prof. Sanjeev Arora and supported by a Gordon Wu Fellowship. I'm forever indebted to my advisors (Prof. Kathy McKeown, Daniel Hsu, Nakul Verma) and collaborators. Excited for the fall!
English
15
4
329
24K
Shuning Shang retweetledi
Jingfeng Wu
Jingfeng Wu@uuujingfeng·
1/3 Sharing two new papers on accelerating GD via large stepsizes! Classical GD analysis assumes small stepsizes for stability. However, in practice, GD is often used with large stepsizes, which lead to instability. See my slides for more details: uuujf.github.io/postdoc/wu2025…
Jingfeng Wu tweet media
English
2
12
113
7.4K
Shuning Shang retweetledi
Jason Lee
Jason Lee@jasondeanlee·
When you discretize and reparametrized to theta=Bw, you no longer can use a constant lr. We found exactly this phenomenon in our work on scaling laws for neural nets that are not 1-homogeneous. arxiv.org/abs/2504.19983 and slides dropbox.com/scl/fi/n53x2pe… also YouTube youtube.com/live/2ELvfieIy… It is really easy and tempting to confuse continuous time with actual runtime /samples. But from the slides you can see the sqrt speedup in exponent is fake, it's only in cts time. Now if it were noiseless, it would not actually violate any Minimax lb however sgd/gd still can't be discretized but other algs probably can (eg for noiseless linear you can just solve ols/pcr/ridge and I would guess some similar method would work in the two layer net also). Though noiseless is a pretty niche setting and not our focus esp in pretrain.
YouTube video
YouTube
Jason Lee tweet media
English
1
3
19
8K
Shuning Shang retweetledi
Yongqi Chen
Yongqi Chen@BrianChen112900·
Huge thanks to the amazing team! 🚀 Excited about the progress we’ve made—and even more excited to keep pushing video generation to be faster and easier for everyone to use!
Hao AI Lab@haoailab

Announcing FastVideo V1, a unified framework for accelerating video generation. FastVideo V1 offers: - A simple, consistent Python API - State of the art model performance optimizations - Optimized implementations of popular models Blog: hao-ai-lab.github.io/blogs/fastvide…

English
0
1
3
552
Shuning Shang retweetledi
Zhiyuan Li
Zhiyuan Li@zhiyuanli_·
Excited to share our new method ✏️PENCIL! It decouples space complexity from time complexity in LLM reasoning, by allowing model to recursively erase and generate thoughts. Joint work w. my student @chenxiao_yang_ , along with @BartomNati and @McAllesterDavid.
Chenxiao Yang@chenxiao_yang_

I've discovered a truly marvelous idea for building AGI, but Twitter's space limit won't let me explain it! Damn! 😫 Introducing ✏️PENCIL, a new LLM reasoning paradigm that generates and erases thoughts, enabling longer and deeper thinking with shorter context. #ICML2025 🧵1/n 🤯 Theoretically, ✏️PENCIL is Turing-complete with optimal space and time complexity, and thus can solve arbitrary computable problems efficiently. This is something fundamentally impossible for ✒️CoT. This post is based on the paper “PENCIL: Long thoughts with short memory” accepted in ICML 2025, a joint work with Nathan Srebro, David McAllester, and @zhiyuanli_ Paper: arxiv.org/pdf/2503.14337 Github: github.com/chr26195/PENCIL Expand thread for full details ⬇️

English
1
9
35
6K
Shuning Shang retweetledi