Shuning Shang

26 posts

Shuning Shang

@susieshang

CS PhD @Princeton | Prev undergrad @ZJU_China I'm interested in ML Theory

Princeton, NJ Katılım Şubat 2024

297 Takip Edilen177 Takipçiler

Sabitlenmiş Tweet

Shuning Shang@susieshang·7 May

Excited to share my first PhD work! We show that imperfect rewards in policy gradient are not always harmful, some can be benign or even beneficial. A lot of fun working with @hubstrauss @stanleyrwei @prfsanjeevarora and @noamrazin 😀

Noam Razin@noamrazin

📰 RL for LMs often relies on imperfect proxy rewards, which can lead to reward hacking. But are incorrect rewards necessarily harmful? Turns out, they can also be benign or even beneficial! This has implications for reward model evaluation and verifiable reward design. 🧵

English

11.2K

Shuning Shang retweetledi

Yiping Wang@ypwang61·8 May

We improve a 32-year lower bound in a challenging open problem, Ramsey numbers, through simply scaling autoresearch. ⭕ Proves R(3,17) >= 93. Previous 92 bound were obtained in 1994. Google’s AlphaEvolve (2026) matched previous result but did not beat it. All could be done with Claude Code / Codex + a CPU server. Graphs and evolving history are available at github.com/ypwang61/Scale… [1/n]

English

324

52K

Shuning Shang retweetledi

Boaz Barak@boazbaraktcs·5 May

Some thoughts... Imagine that there is an AI that if you ask it questions such as "Is T true?" when T is a precise mathematical statement, then it gives you either a proof, or disproof, or tells you it doesn't know, but it never makes a mistake. And moreover, imagine that every 6 months, there is a new model that can handle more and more difficult problems T, eventually reaching the level of the Riemann Hypothesis or P vs NP. I think different people would handle it differently. For some mathematicians, this might be the most exciting time in the history of math, and they would use these models to explore terrains at a rate higher than before. The whole norms of the math community would have to change, with a premium not so much on settling questions but on finding ways to map out and simplify the relation between problems. Personally, I think that an "egoless" but brilliant AI could be put to great use in simplifying old proofs rather than proving new results. IMO there has been far too little effort invested in the former. For other people, it may be very different than the reasons that they got into math in the first place. Or (like me) they would feel that the moment is one where the main story is AI and its impact on humanity. I used to care very deeply about the unique games conjecture, but I guess I am "monogamous" in my intellectual life (as in my personal one..) and these days too focused on AI alignment to even try attacking it with AI. (It's also still too soon - models are not yet good enough.) I would still love to see the UGC get resolved, especially if I could interrogate the AI to explain the proof to me in a way tailored to my taste and understanding. But I won't deny that if the UGC gets resolved by an AI, it will feel very bittersweet, and an end of an era of sitting in coffee shops, sometimes but myself and sometimes with others, spending hours talking and working out (mostly wrong) ideas on pads of paper.

jacob tsimerman@Jacob_Tsimerman

I want to clarify my thoughts on problem-solving in mathematics, and the potential consequences of AI for the field. For context, I’m quoting here my post in reply to Daniel Litt (who, echoing others, I find very clear, grounded, and insightful in his thinking). The claim The short version is that I think problem-solving is an immense, and pervasive part of modern mathematical research. Consequently, if human problem-solving disappears by virtue of the AIs becoming strictly and substantially better at it, then most of the time currently spent by modern mathematical researchers will have to be spent on an activity that is altogether pretty different. Whether such an activity is viable as a professional endeavour is something I am unsure of, but strongly encourage others to think about and try to envision, so that if/when the time comes, we can steer such a future into being. Allow me to make this somewhat concrete: by problem-solving I mean questions of the form “is T true? If so find a proof. If not, find a disproof.” where T is a precise mathematical statement. I’ll also include “find an example of S, if there is one” where S is some structure (variety/category/property/isomorphism/….). The argument Ok. Now as I said (and some have echoed) I spend ~all of my time problem-solving as my primary goal. This has sub-goals, but my entire main research field disappears if someone solves the Zilber-Pink Conjecture in its more general form. This is a single conjecture (precisely stated!) and lots of mathematicians, postdocs, and graduate students are engaged in picking apart special cases of it, trying strategies, finding analogies to develop intuition, etc.. Of course, lots of motivation and intuition and analogizing and understanding have gone into deciding to make the ZP conjecture a focus! But the fact remains that this is now what is being worked on ~all of the time by this community. This is true of many mathematicians. They have a problem (or ten) and spend most of their time doing it. If someone solves it, they have to find a different problem. This can be a big, disorienting process involving a lot of energy, and is neither trivial nor always fun (though often rewarding in the end). People have written a lot about Theory building vs. Problem-solving, and I want to first of all clarify I have nothing against theory building or theory builders! It is a valuable part of mathematics, and while there are differences in perspective between the “camps” there is way more mutual respect and agreement. However, I gather there is a perception that theory-builders spend most of their time not-problem-solving, and I think this is largely untrue. Now I’m not a theory-builder primarily (though I’ve partaken a LITTLE BIT by necessity) so I am outside of my comfort zone. As such, I apologize for mistakes and welcome corrections! But theory-building constantly runs through problem-solving. Let’s say you want to define the right notion of a cohomology theory. Of course you must make candidate definitions. But then what does it mean for it to be the right one? Well, you start asking if it has natural properties. These are T statements. Does it satisfy a Kunneth formula? Is it functorial in the right way? When you have the wrong one you have to find the properties it’s missing, and when you have the right one you have to prove that it indeed has those properties. Again, I am not saying nor do I believe that this makes problem-solving “real math” and theory-building lesser. I am just trying to draw attention to the way I think research mathematicians operate, and mathematics is practiced. To put all this a different way, imagine you had access to an AI oracle that could resolve statements T, but somehow lacked any creativity to build technology or make definitions (I think this is unlikely, but for the purpose of this thought experiment lets imagine it). How would your mathematics change, if you were a theory builder? Well, you make a definition, and want to know if it’s the right one. You immediately ask your oracle a thousand questions. From “are these basic properties true” to “ooh, so is this deep conjecture true?” and start getting back answers, and amending your definitions. You could invent and resolve entire research directions in days. But the confusion you would have had to push through to flesh out your theory would largely (probably not entirely) be instantly resolved and the whole process sped up tremendously by your oracle. A big part of the process would be gone. This is very very different to modern mathematics. One more thought This post is too long already, but I’ve seen some people say that they only do mathematics to find truth and others valourize that as the only virtuous way to be. I do not do mathematics only to find truth. I do it largely because I enjoy it and I am good at it. I also find it beautiful and am grateful I get to spend my days understanding beautiful things. But I enjoy the challenge, the process, resolving confusions, finding strategies, grappling with problems. I would like to push for this being de-stigmatized. Mathematicians are people who need money, housing, food, love, exercise, and a great deal of other stuff including various forms of meaning. There are many people whose primary enjoyment of math comes through problem solving in one of its incarnations. If that disappears, that is not a trivial issue and many of them might not want to do it anymore (even if there were some way to proceed).

English

121

17.3K

Shuning Shang@susieshang·28 Nis

@chengyun01 Congrats🥳

English

Yun Cheng@chengyun01·27 Nis

Our Contextual Drag got the 🏆 Best Paper Award at the ICLR 2026 Workshop on AI with Recursive Self-Improvement!

Mingchen Zhuge@MingchenZhuge

Excited to share our award-winning papers! 🏆 Best Paper Awards • Contextual Drag: How Errors in Context Affect LLM Reasoning • PostTrainBench: Can LLM Agents Automate LLM Post-Training? 🌟 Outstanding Paper Awards • Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning • Learning to Continually Learn via Meta-Learning Agentic Memory Designs @jeffclune @hrdkbhatnagar @CaimingXiong @chengyun01 @yimingxiong_ @shengranhu @richardxp888 @HuaxiuYaoML @prfsanjeevarora

English

3.6K

Shuning Shang@susieshang·28 Nis

@XingyuZhu_ Congrats!

English

Xingyu Zhu@XingyuZhu_·27 Nis

Happy to share that our recent paper: Contextual Drag: How Errors in Context Affect LLM Reasoning (arxiv.org/abs/2602.04288) has won the Best Paper Award in the RSI workshop of ICLR 2026!

Mingchen Zhuge@MingchenZhuge

English

1.9K

Shuning Shang retweetledi

Yinghui He@yinghui_he_·17 Nis

RLVR gives sparse supervision; On-Policy Self-Distillation often requires high-quality demonstrations. Our new method, ✨SD-Zero✨, gets the best of both worlds – we use model’s self-revision to turn binary rewards into dense token-level supervision. No external teacher. No curated demonstrations. 🚨 Introducing Self-Distillation Zero (SD-Zero), which trains one model to play two roles: (1) “Generator” that makes attempts, and (2) “Reviser” that conditions on the generator’s failed/successful attempt + binary reward to produce a better answer. ‼️Even WRONG attempts can become the training signal.‼️ 🔗Paper: arxiv.org/abs/2604.12002 🏆 SD-Zero brings 10%+ improvement over base models (Qwen3,4B; Olmo3,7B) on math & code reasoning, beating GRPO and vanilla On-Policy Self-Distillation under the same training budget. SD-Zero also enables iterative self-evolution.

English

403

214.5K

Shuning Shang retweetledi

Boris Hanin@BorisHanin·8 Şub

🚨 2026 @Princeton ML Theory Summer School 🔥 Learn from amazing researchers and meet your peers. Mini-courses by: - Subhabrata Sen @subhabratasen90 - Lenaic Chizat @LenaicChizat - Sinho Chewi - Elliot Paquette @poseypaquet - Elad Hazan @HazanPrinceton - Surya Ganguli @SuryaGanguli (to be confirmed) August 3 - 14, 2026 Apply by March 31, 2026. Link 👇 Sponsored by @NSF, @PrincetonAInews, @EPrinceton, @JaneStreetGroup, @DARPA, @PrincetonPLI, Princeton NAM, Princeton AI2, Princeton PACM

English

272

48K

Shuning Shang retweetledi

Kaiyue Wen@wen_kaiyue·21 Oca

(1/n) Introducing Hyperball — an optimizer wrapper that keeps weight & update norm constant and lets you control the effective (angular) step size directly. Result: sustained speedups across scales + strong hyperparameter transfer.

English

125

709

201.7K

Shuning Shang retweetledi

idan shenfeld@IdanShenfeld·29 Oca

People keep saying 2026 will be the year of continual learning. But there are still major technical challenges to making it a reality. Today we take the next step towards that goal — a new on-policy learning algorithm, suitable for continual learning! (1/n)

English

222

1.5K

238.5K

Shuning Shang retweetledi

Cameron R. Wolfe, Ph.D.@cwolferesearch·12 Oca

Currently reading / writing about the intersection of RL and continual learning. Here are some great papers I’ve found on these topics so far: - arxiv.org/abs/2507.05386 - arxiv.org/abs/2510.18874 - arxiv.org/abs/2509.04259 - arxiv.org/abs/2308.08747 Please share any others you’re aware of! Would love to find more work in this space to include in my writeup.

English

559

28.5K

Shuning Shang retweetledi

Abhishek Panigrahi@Abhishek_034·5 Oca

Distillation is a key step in training LLMs—but with so many possible teachers, picking the right one is hard. The best model is often not the best teacher. We propose GRACE to identify the best teacher for a given student to learn math tasks. It's a cost-efficient gradient score that can exhibit low regret across many teacher families and scales. Bonus: GRACE also guides key choices like teacher generation temperature. Theory + empirics. (1/6). Joint work w @BingbinL, @SadhikaMalladi, @ShamKakade6, @SurbhiGoel_ Arxiv: arxiv.org/abs/2511.02833 Blog: unprovenalgos.github.io/GRACE

English

125

15.8K

Shuning Shang retweetledi

rapha@rapha_gl·7 Ağu

GPT-5 is proof that synthetic data just keeps working! And that OpenAI has the best synthetic data team in the world 👁️ @SebastienBubeck the team has our eyeballs on you! 🙌

English

470

292.6K

Shuning Shang retweetledi

Yong Lin@Yong18850571·15 Tem

(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B model matches DeepSeek-671B on MiniF2F. 📚 Leading on MathOlympiadBench (IMO-level problems) * Solves 73 vs 50 over 671B DeepSeek Prover 🔓 Website: blog.goedel-prover.com 🔓 Model 32B: huggingface.co/Goedel-LM/Goed… 🔓 Model 8B huggingface.co/Goedel-LM/Goed… 🔓Data and training pipeline will be released soon. Amazing Collaborators: @sangertang1999 @Lyubh22 @__zrrr__ @juihuichung @thomaszhao1998 @pero733858111 @thiiis_user @EmilyJge @JingruoS5931 @wujiayun12 @GesiJiri68334 @davidjesusacu @KaiyuYang4 @hongzhou__lin @YejinChoinka @danqi_chen @prfsanjeevarora @chijinML

English

261

95.3K

Shuning Shang retweetledi

Wei Hu@weihu_·1 Tem

What happens behind the "abrupt learning" curve in Transformer training? Our new work (led by @GopalaniPulkit) reveals universal characteristics of Transformers' early-phase training dynamics—uncovering the implicit biases and the degenerate state the model gets stuck in. ⬇️

Pulkit Gopalani@GopalaniPulkit

Excited to announce our recent work on understanding training-time emergence in Transformers! Thread🧵(1/11)

English

4.2K

Shuning Shang@susieshang·12 Haz

@narutatsuri @PrincetonPLI Congrats!!

English

119

Narutatsu Ri@narutatsuri·10 Haz

【Life Update】 I’m happy to share that I will be starting a CS PhD at @PrincetonPLI under Prof. Sanjeev Arora and supported by a Gordon Wu Fellowship. I'm forever indebted to my advisors (Prof. Kathy McKeown, Daniel Hsu, Nakul Verma) and collaborators. Excited for the fall!

English

329

24K

Shuning Shang retweetledi

Jingfeng Wu@uuujingfeng·4 Haz

1/3 Sharing two new papers on accelerating GD via large stepsizes! Classical GD analysis assumes small stepsizes for stability. However, in practice, GD is often used with large stepsizes, which lead to instability. See my slides for more details: uuujf.github.io/postdoc/wu2025…

English

113

7.4K

Shuning Shang retweetledi

Jason Lee@jasondeanlee·25 May

When you discretize and reparametrized to theta=Bw, you no longer can use a constant lr. We found exactly this phenomenon in our work on scaling laws for neural nets that are not 1-homogeneous. arxiv.org/abs/2504.19983 and slides dropbox.com/scl/fi/n53x2pe… also YouTube youtube.com/live/2ELvfieIy… It is really easy and tempting to confuse continuous time with actual runtime /samples. But from the slides you can see the sqrt speedup in exponent is fake, it's only in cts time. Now if it were noiseless, it would not actually violate any Minimax lb however sgd/gd still can't be discretized but other algs probably can (eg for noiseless linear you can just solve ols/pcr/ridge and I would guess some similar method would work in the two layer net also). Though noiseless is a pretty niche setting and not our focus esp in pretrain.

YouTube

English

Shuning Shang retweetledi

Yongqi Chen@BrianChen112900·13 May

Huge thanks to the amazing team! 🚀 Excited about the progress we’ve made—and even more excited to keep pushing video generation to be faster and easier for everyone to use!

Hao AI Lab@haoailab

Announcing FastVideo V1, a unified framework for accelerating video generation. FastVideo V1 offers: - A simple, consistent Python API - State of the art model performance optimizations - Optimized implementations of popular models Blog: hao-ai-lab.github.io/blogs/fastvide…

English

552

Shuning Shang retweetledi

Zhiyuan Li@zhiyuanli_·9 May

Excited to share our new method ✏️PENCIL! It decouples space complexity from time complexity in LLM reasoning, by allowing model to recursively erase and generate thoughts. Joint work w. my student @chenxiao_yang_ , along with @BartomNati and @McAllesterDavid.

Chenxiao Yang@chenxiao_yang_

I've discovered a truly marvelous idea for building AGI, but Twitter's space limit won't let me explain it! Damn! 😫 Introducing ✏️PENCIL, a new LLM reasoning paradigm that generates and erases thoughts, enabling longer and deeper thinking with shorter context. #ICML2025 🧵1/n 🤯 Theoretically, ✏️PENCIL is Turing-complete with optimal space and time complexity, and thus can solve arbitrary computable problems efficiently. This is something fundamentally impossible for ✒️CoT. This post is based on the paper “PENCIL: Long thoughts with short memory” accepted in ICML 2025, a joint work with Nathan Srebro, David McAllester, and @zhiyuanli_ Paper: arxiv.org/pdf/2503.14337 Github: github.com/chr26195/PENCIL Expand thread for full details ⬇️

English

Shuning Shang retweetledi

Wei Hu@weihu_·11 Nis

Check out our new LLM quantization algorithm that is extremely fast, requires minimal calibration data, and enables flexible bit allocation! Led by @YongyiYang7

Yongyi Yang@YongyiYang7

We are excited to introduce our new paper RaanA: A Fast, Flexible, and Data-Efficient Post-Training Quantization Algorithm. RaanA is a novel PTQ Algorithm that is computationally efficient, calibration-light, and adaptable to diverse deployment scenarios. 🧵 (1/6)

English

2.4K

Keşfet

@chengyun01 @XingyuZhu_ @Princeton @subhabratasen90 @LenaicChizat @poseypaquet @HazanPrinceton @SuryaGanguli