Zeyu Zheng

30 posts

Zeyu Zheng

Zeyu Zheng

@regunivers

PhD Candidate @CarnegieMellon | Seed-Prover | Combinatorial Mathematician, AI Researcher. develop new paradigms for mathematical discovery

Katılım Ocak 2022
71 Takip Edilen113 Takipçiler
Zeyu Zheng retweetledi
Weihua Du
Weihua Du@StigLidu·
Check out our #ICLR2026 poster, “Generalizable End-to-End Tool-Use RL with Synthetic CodeGym”! We developed CodeGym, an automated pipeline that converts coding problems into multi-turn tool-use environments for agent RL training. We end up with 13K environments and 80K task configurations. Training on CodeGym can significantly improve LLM tool use and multi-turn interaction capabilities on OOD tasks (e.g., Tau-Bench, ALFWorld). Unfortunately, I can't make it to ICLR in person, so sharing our poster here! Paper: arxiv.org/abs/2509.17325 Dataset: huggingface.co/datasets/Vanis…
Weihua Du tweet media
Weihua Du@StigLidu

How can we boost LLM agents’ generalizability to OOD tasks and environments? Check out CodeGym, our new project for synthesizing environments for LLM agent RL training. CodeGym is a synthetic environment generation framework for reinforcement learning on multi-turn tool-use tasks. It automatically converts static coding problems into interactive and verifiable RL training environments. Training in CodeGym leads to strong OOD generalization — for example, a Qwen2.5-32B-Instruct model achieved an 8.7-point absolute accuracy gain on τ-Bench! We’ve just released the paper, synthesis pipeline, and dataset: 📄 Paper: arxiv.org/abs/2509.17325 💻 Project: github.com/StigLidu/CodeG… 📊 Dataset: huggingface.co/datasets/Vanis… 📷 More details in the thread👇

English
0
4
12
848
Zeyu Zheng retweetledi
Jared Duker Lichtman
Jared Duker Lichtman@jdlichtman·
Erdős was the most prolific mathematician in history, after Euler, But Erdős was by far the most prolific *conjecturer* in history, posing many problems big and small. Combinatorialist Joel Spencer said that Erdős was able generate these conjectures based on deeper meta-theories in his mind. So certainly some of these problems are more famous than others, but it may turn out that if enough of them are solved, it may lead to a more unified theory.
Thomas Bloom@thomasfbloom

Some dismiss Erdős problems as trivialities - this couldn't be further from the truth! While many are amusing novelties, some of them are the most central problems in number theory and combinatorics. A blog post with, in my view, the 10 most important: erdosproblems.com/forum/thread/b…

English
8
16
207
18K
Zeyu Zheng retweetledi
Thomas Bloom
Thomas Bloom@thomasfbloom·
Some dismiss Erdős problems as trivialities - this couldn't be further from the truth! While many are amusing novelties, some of them are the most central problems in number theory and combinatorics. A blog post with, in my view, the 10 most important: erdosproblems.com/forum/thread/b…
English
9
38
203
35.8K
Zeyu Zheng retweetledi
Jared Duker Lichtman
Jared Duker Lichtman@jdlichtman·
In my doctorate, I proved the Erdős Primitive Set Conjecture, showing that the primes themselves are maximal among all primitive sets. This problem will always be in my heart: I worked on it for 4 years (even when my mentors recommended against it!) and loved every minute of it. [Primitive sets are a vast generalization of the prime numbers: A set S is called primitive if no number in S divides another.] Now Erdős#1196 is an asymptotic version of Erdős' conjecture, for primitive sets of "large" numbers. It was posed in 1966 by the Hungarian legends Paul Erdős, András Sárközy, and Endre Szemerédi. I'd been working on it for many years, and consulted/badgered many experts about it, including my mentors Carl Pomerance and James Maynard. The the proof produced by GPT5.4 Pro was quite surprising, since it rejected the "gambit" that was implicit in all works on the subject since Erdős' original 1935 paper. The idea to pass from analysis to probability was so natural & tempting from a human-conceptual point of view, that it obscured a technical possibility to retain (efficient, yet counter-intuitve) analytic terminology throughout, by use of the von Mangoldt function \Lambda(n). The closest analogy I would give would be that the main openings in chess were well-studied, but AI discovers a new opening line that had been overlooked based on human aesthetics and convention. In fact, the von Mangoldt function itself is celebrated for it's connection to primes and the Riemann zeta function--but its piecewise definition appears to be odd and unmotivated to students seeing it for the first time. By the same token, in Erdős#1196, the von Mangoldt weights seem odd and unmotivated but turn out to cleverly encode a fundamental identity \sum_{q|n}\Lambda(q) = \log n, which is equivalent to unique factorization of n into primes. This is the exact trick that breaks the analytic issues arising in the "usual opening". Moreover, Terry Tao has long suspected that the applications of probability to number theory are unnecessarily complicated and this "trick" might actually clarify the general theory, which would have a broader impact than solving a single conjecture.
Boaz Barak@boazbaraktcs

This is one of the coolest such examples! See comments from Lichtman below, who proved the related primitive set conjecture arxiv.org/abs/2202.02384

English
56
379
2.9K
976K
Zeyu Zheng retweetledi
Chenyu Zhou
Chenyu Zhou@ZKBquanter·
🚀 New survey dropped! "Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering" LLM agents are evolving — not by changing weights, but by building better infrastructure around them. 🧵👇 1/4
Chenyu Zhou tweet media
English
1
2
4
171
Zeyu Zheng retweetledi
Mehtaab Sawhney
Mehtaab Sawhney@mehtaab_sawhney·
We’ve just released another paper solving five further Erdős problems with an internal model at OpenAI: arxiv.org/abs/2604.06609. Several of the proofs were especially enjoyable to digest while writing the paper. My personal favorite was the solution to Erdős Problem 1091. The question asks: if a graph G has chromatic number 4, while every small subgraph has chromatic number at most 3, must it contain an odd cycle with many diagonals? The internal model gives a very enlightening counterexample to this conjecture, and the proof was a pleasure to understand. For those so inclined, a really fun exercise is to try to reconstruct the proof from Figure 5 of the paper, which was of course produced by Codex.
Mehtaab Sawhney tweet media
English
20
151
901
202K
Zeyu Zheng retweetledi
Dan Roy
Dan Roy@roydanroy·
How are mathematicians facing the wave of rapidly advancing AI-for-math capabilities? Jeremy Avigad (CMU prof and co-author on the original 2015 system description paper for Lean) just posted a paper with his thoughts in the wake of the Math, Inc. announcement on sphere packing. andrew.cmu.edu/user/avigad/Pa… There are a lot of interesting passages in here, including a bit of the back story of the Math, Inc. bomb drop and how it was initially received by the humans working on the formalization project. But, as for how mathematics proceeds, here's the key last passage: "We need to remember our strengths: mathematicians are problem solvers and theory builders extraordinaire. Rather than fight the use of AI in mathematics, we should own it. It is not enough to keep up with current events and design benchmarks for AI researchers; we need to play an active role in deploying the technology and molding it to our purposes. We also need to learn how to raise our students with the wisdom to use the new technologies appropriately, and we need to be careful that we still manage to impart core mathematical intuitions and understanding. Figuring out how to use AI effectively to achieve our mathematical goals won’t be easy, but mathematicians have always embraced challenges—indeed, the harder, the better. If we face AI head-on and stay true to our values, mathematics will thrive. We just need to show up and get to work." The next few years should be a golden era for mathematics. For those of us working on the frontier, I hope we do well by our mathematician colleagues.
Dan Roy tweet media
English
23
192
851
108.2K
Zeyu Zheng retweetledi
Mehtaab Sawhney
Mehtaab Sawhney@mehtaab_sawhney·
We just posted a paper solving Erdos #846, which was solved by an internal model at OpenAI (cdn.openai.com/infinite-sets/…). While the problem can also be derived from an earlier paper in the literature, the proof by the internal model was one of the first instances where I smiled reading the proof.
English
8
38
313
75.4K
Zeyu Zheng retweetledi
Zhaopeng Tu
Zhaopeng Tu@tuzhaopeng·
Does an AI agent really need to "think hard" for every single step? 🤖🧠⚡ Introducing CogRouter, a framework grounded in ACT-R theory that trains agents to dynamically adapt cognitive depth across four hierarchical levels, from instinctive responses to strategic planning. 🧠 We identify a critical inefficiency in current agents where they either think too little (reflexive) or think too much (uniform deep reasoning). Real-world tasks demand step-wise heterogeneity: 1️⃣ Routine steps require instinctive responses; 2️⃣ Complex steps require strategic reasoning; 3⃣ Fixed patterns waste tokens or fail on hard tasks. 🎓 We propose a two-stage training strategy to enable dynamic cognitive adaptation: 1️⃣ Cognition-aware SFT (CogSFT) instills stable level-specific patterns; 2️⃣ Cognition-aware Policy Optimization (CoPO) enables step-level credit assignment via confidence-aware advantage reweighting. 📊 Experiments on ALFWorld and ScienceWorld show CogRouter achieves state-of-the-art performance with superior efficiency: 🏆 82.3% success rate with Qwen2.5-7B; 🔥 Outperforms GPT-4o (+40.3%) and OpenAI-o3 (+18.3%); ⚡ Uses 62% fewer tokens than GRPO by skipping unnecessary reasoning. 🧑‍💻 Code: github.com/rhyang2021/Cog… 📷 Paper: arxiv.org/abs/2602.12662
Zhaopeng Tu tweet media
English
0
10
39
9.2K
Zeyu Zheng retweetledi
Yujia Qin
Yujia Qin@TsingYoga·
Happy CNY! We are glad to introduce our latest language model Seed-2.0. We make great progress (agent, reasoning, vision understanding, etc.) since Seed-1.8 without any distillation Right now it's only available in CN now, and will soon be ready globally. seed.bytedance.com/en/seed2
Yujia Qin tweet media
English
8
24
184
14K
Zeyu Zheng
Zeyu Zheng@regunivers·
Really happy to share Seed 2.0, the strongest general-purpose model for formal mathematics to date. Vibe coding is already here. Vibe proofing is right around the corner. Model card: lf3-static.bytednsdoc.com/obj/eden-cn/la…
Thomas Zhu@hanwen_zhu

I want to point to an aspect of Seed2.0's ability—Seed2.0 is the first general LLM to incorporate agentic formal math capability, even surpassing Seed-Prover 1.5. I tested "vibe-proving" on several real-world problems in our Trae IDE.

English
0
4
15
2.3K
Zeyu Zheng
Zeyu Zheng@regunivers·
@AcerFur @fleetingbits That said, my personal perspective is still that the point at which the community largely converged on a clear, consensus formulation of the problem was around January 5. This is not meant to diminish the result at all, just to clarify how I was thinking about the timeline.
English
1
0
3
216
Zeyu Zheng
Zeyu Zheng@regunivers·
@AcerFur @fleetingbits Thanks for the clarification. I fully agree that this is a very meaningful result, and personally I think that if this contribution had been made by a human, it would absolutely be worth publishing.
English
1
0
4
209
Zeyu Zheng
Zeyu Zheng@regunivers·
@fleetingbits Whether this “counts” depends on perspective. One could say AI solved a genuinely new problem, but one could also note that the corrected statement had been open for only about a day. My personal view is that if this contribution were made by a human, it would be worth publishing
English
2
0
4
335
Zeyu Zheng
Zeyu Zheng@regunivers·
@fleetingbits Facts: The original statement of 728 was ambiguous, and AI quickly found a trivial counterexample. The community then discussed how to fix the statement and proposed an additional hypothesis. Only after this corrected formulation was identified did people use AI to solve it.
English
1
0
6
730