Bohan Lyu

79 posts

Bohan Lyu banner
Bohan Lyu

Bohan Lyu

@Lyubh22

Undergrad @Tsinghua_Uni.

Katılım Kasım 2023
432 Takip Edilen318 Takipçiler
Bohan Lyu retweetledi
Ziran Yang
Ziran Yang@__zrrr__·
We released Goedel-Prover-V2, a state-of-the-art model for formal theorem proving at launch. Remarkably, it has remained at the top of the open-source formal theorem proving leaderboard for over six months. We have been excited to see so many folks cooking with our models. Today, we are open-sourcing the full Goedel-Prover-V2 training datasets for the community: 📂 SFT (1.74M samples) huggingface.co/datasets/Goede… 📂 RL (whole proof generation + self-revision, 98k samples) huggingface.co/datasets/Goede… We hope this helps push formal theorem proving forward. Build on it! Amazing Collaborators: @Yong18850571 @sangertang1999 @Lyubh22 @juihuichung @thomaszhao1998 @_LaiJiang @thiiis_user @EmilyJge @JingruoS5931 @wujiayun12 @GesiJiri68334 @davidjesusacu @KaiyuYang4 @hongzhou__lin @YejinChoinka @danqi_chen @prfsanjeevarora @chijinML
English
3
39
201
29K
Bohan Lyu
Bohan Lyu@Lyubh22·
Goedel-Prover-V2 has been accepted to #ICLR2026! What’s more important is that, despite being released over half a year ago, it remains the #1🏆 Open-Source model (💚) on PutnamBench. Our models have reached a peak of 100K+ monthly downloads, demonstrating the demand for open math provers. We look forward to seeing more breakthroughs from open-source models/systems, especially in such a vital field that drives the future of mathematical research.
Bohan Lyu tweet media
English
0
1
12
540
Bohan Lyu retweetledi
Chi Jin
Chi Jin@chijinML·
This is a truly remarkable math theorem prover! — well ahead of competitors, near-saturating PutnamBench, and achieving much higher solve rates on the recent concluded Putnam 2025 with a suprisingly short amount of time.
Zheng Yuan@GanjinZero

Excited to announce Seed-Prover 1.5 which is trained via large-scale agentic RL with Lean. It proved 580/660 Putnam problems and proved 11/12 in Putnam 2025 within 9 hours. Check details at github.com/ByteDance-Seed…. We will work on autoformalize towards contributing to real math!

English
1
8
98
12.4K
Bohan Lyu
Bohan Lyu@Lyubh22·
@_vztu 我是cs+finance的dual major
中文
0
0
0
120
Zhengzhong Tu
Zhengzhong Tu@_vztu·
@Lyubh22 以前都是各种专业转经管,现在是经管转计算机了。。 what an era...
中文
2
0
2
781
Bohan Lyu retweetledi
Yong Lin
Yong Lin@Yong18850571·
[Life update] I’ve officially left @PrincetonPLI and joined Thinking Machines Lab @thinkymachines . It feels like the right time to look back on my journey at Princeton — one and a half years that were truly transformative. During this period, I made many friends, learned tremendously, and co-founded and co-led the Goedel Project. It was one of the most rewarding experiences of my life: a small, close-knit team of about ten people working with a clear purpose, moving fast, and ultimately building something impactful. In mid-July, we released Goedel-Prover-V2 (32B), a model that significantly outperformed the previous state-of-the-art DeepSeek-Prover-V2-671B on formal mathematical reasoning, using nearly 20× fewer parameters and dramatically less compute. Even now, four months after release, it still sits at the top of the open-source leaderboard. What makes this achievement especially meaningful is that we accomplished it entirely with academic resources. Competing against large industrial labs and still coming out ahead felt almost unreal. Seeing so many research teams now building on top of Goedel-Prover-V2 is deeply gratifying — it’s proof that open, academic AI can still make a real impact. Equally fulfilling was the journey itself. Unlike industrial teams with access to large-scale, off-the-shelf RL infrastructure, we — a group of students and researchers from academia with zero prior experience in massive model training — had to build almost everything from scratch. We learned quickly, identified problems as they emerged, and fixed them with remarkable speed. Designing, scaling, and successfully training a 32B-parameter model within just three months remains one of the things I’m most proud of. The Goedel Project began in October 2024. At that time, we had no serious experience training models that could compete with the best labs. DeepSeek-Prover-V1 and V1.5 looked unbeatable — they had started a year earlier and already set an incredibly high bar. We experimented with many ideas — agentic pipelines, divide-and-conquer methods — most of which turned out to be too costly or impractical given our limited resources. Eventually, we discovered a simple yet powerful iterative-training approach that allowed us to scale efficiently within our compute limits. Bit by bit, we caught up with DeepSeek-Prover-V1.5 — and then surpassed it. Princeton winters are brutally cold. It was the first time I’d ever seen snow last for weeks. I spent the entire winter break at home, running experiments, analyzing results, and adjusting training methods and data again and again. That persistence paid off: in February, we released Goedel-Prover-V1-7B, which captured the top spot on the leaderboard. It was our first major milestone — proof that an academic team could compete with frontier models. Our celebration was short-lived. In April, Kimi-Prover-72B and DeepSeek-Prover-V2-671B both arrived — and completely outperformed us. It was a tough moment. We couldn’t even host DeepSeek-Prover-V2-671B for inference locally; communication errors kept crashing our limited infrastructure. None of us had experience deploying or training models of that scale. Still, we decided to aim higher — to beat them in the next version. We began by identifying the bottlenecks in DeepSeek and Kimi’s provers, exploring every possible angle for improvement. We experimented with compiler-based feedback loops, curriculum data synthesis, self-improvement strategies, model distillation, and model merging to improve diversity during RL. But the most critical insight was about efficiency — optimizing how we allocated limited resources across training design, data generation, and scaling. Every GPU hour had to count. After two months of exploration and countless small-scale tests, we finally established a systematic framework for the next release: Goedel-Prover-V2. I led "The Big Run" — a nearly month-long sequence of two self-improvement fine-tuning cycles followed by one large-scale RL round. We completed training just a few days before our scheduled release, leaving barely enough time for evaluation. Those last nights were intense — running tests, fixing scripts, collecting metrics — but everything came together perfectly. When we saw the final results, we could hardly believe them: Goedel-Prover-V2 solved twice as many problems on PutnamBench as DeepSeek-Prover-V2-671B. Many people have since asked what the “key” was — how an academic team managed to outperform frontier labs using a fraction of their resources. There isn’t a single magic trick, but rather a combination of principles that guided us: * build solid infrastructure early * focus on real bottlenecks instead of chasing novelty * investigate broadly with small-scale experiments * fix problems in real time * optimize resource allocation carefully * execute the final big run with precision. Each of these steps sounds simple, but together they made all the difference. Now, at Thinking Machines Lab, I’m shifting focus beyond formal reasoning toward building general-purpose models. I’m deeply inspired by TML’s mission — developing interactive AI systems and advancing open science. I’m thrilled to begin this new chapter and look forward to sharing more in the future.
English
18
18
646
78.3K
Bohan Lyu retweetledi
Chi Jin
Chi Jin@chijinML·
Super proud of my fantastic postdocs and graduate students taking their next steps at frontier labs 🎉 • Yong Lin (@Yong18850571) → Thinking Machine • Zihan Ding (@Hanry65960814) → Bytedance • Ahmed Khaled → Google It’s always bittersweet to say goodbye😢 but I couldn’t be more excited to see what you achieve next!
Chi Jin tweet mediaChi Jin tweet media
English
3
7
244
39.7K
Bohan Lyu retweetledi
Danqi Chen
Danqi Chen@danqi_chen·
I am going to present two papers at #COLM2025 tomorrow from 4:30-6:30pm, as none of our leading authors can attend due to visa issues. Haven't done poster presentations for years 🤣🤣 .... so I will do my best! #76: LongProc #80: Goedel-Prover v1
Danqi Chen tweet mediaDanqi Chen tweet media
Chi Jin@chijinML

Our Goedel-Prover V1 will be presented at COLM 2025 in Montreal this Wednesday afternoon! I won’t be there in person, but my amazing and renowned colleague @danqi_chen will be around to help with the poster — feel free to stop by!

English
4
27
348
49K
Bohan Lyu retweetledi
Chi Jin
Chi Jin@chijinML·
Excited to share that I’ve been promoted to Associate Professor with tenure at Princeton!🎉 6 years may not be long, but AI research has evolved significantly during this period. Grateful to all my students, collaborators, colleagues for being with me on this remarkable journey!
Chi Jin tweet media
English
149
60
2.7K
114K
Bohan Lyu
Bohan Lyu@Lyubh22·
Building upon Goedel-Prover-V2, Hilbert Prover achieved 99.2% on Minif2f and solved over 70% PutnamBench problems😱 Amazing news from my old home @yuqirose's lab. At ICML this year, someone asked why the model struggled with Putnam problems. I said it was a matter of time, and now here we are! I still vividly remember explaining our V2 work to Sumanth over spaghetti and meatballs the day after the blog post went live. What a journey. Congrats! Paper: arxiv.org/abs/2509.22819
Bohan Lyu tweet mediaBohan Lyu tweet media
English
0
4
17
2.1K
Bohan Lyu
Bohan Lyu@Lyubh22·
I'm also more than delighted that this paper helped one of the co-authors to secure a PhD offer from Peking University!
English
0
0
1
148
Bohan Lyu
Bohan Lyu@Lyubh22·
Hopefully we will also feature this work at LAW@NeurIPS 2025, where the story had already become Demystify the Potential of Large Language Models as World Models of Code when I made the submission last month.
Bohan Lyu tweet media
Siqiao Huang@KnightNemo_

I want to quietly mention that we basicly came up with the same idea of code execution as world models six months ago, and it turned out to be a EMNLP'25 paper (top 0.5% meta score). Check out: arxiv.org/abs/2502.11167 . Glad to see META pushing it a lot further.

English
1
2
6
1.2K
Bohan Lyu retweetledi
Kaiyue Wen
Kaiyue Wen@wen_kaiyue·
(1/n) Check out our new paper: "Fantastic Pretraining Optimizers and Where to Find Them"! >4000 models to find the fastest optimizer! 2× speedups over AdamW? Unlikely. Beware under-tuned baseline or limited scale! E.g. Muon: ~40% speedups <0.5B & only 10% at 1.2B (8× Chinchilla)!
Kaiyue Wen tweet media
English
13
97
446
183.5K
Bohan Lyu retweetledi
Yuchen Jin
Yuchen Jin@Yuchenj_UW·
Ilya Sutskever: bald Demis Hassabis: bald Noam Shazeer: bald Greg Brockman: bald forget AGI. forget curing cancer. cure baldness now. My hairline is on gradient descent.
English
376
323
6.9K
605.3K