Bohan Lyu

79 posts

Bohan Lyu

@Lyubh22

Undergrad @Tsinghua_Uni.

Katılım Kasım 2023

432 Takip Edilen318 Takipçiler

Bohan Lyu retweetledi

Tim Cook@tim_cook·12 Mar

April 1st marks 50 years of Apple. Thank you to everyone who’s been a part of our journey. apple.com/50-years-of-th… #Apple50

English

1.9K

5.7K

47.5K

1.7M

Bohan Lyu retweetledi

Ziran Yang@__zrrr__·26 Şub

We released Goedel-Prover-V2, a state-of-the-art model for formal theorem proving at launch. Remarkably, it has remained at the top of the open-source formal theorem proving leaderboard for over six months. We have been excited to see so many folks cooking with our models. Today, we are open-sourcing the full Goedel-Prover-V2 training datasets for the community: 📂 SFT (1.74M samples) huggingface.co/datasets/Goede… 📂 RL (whole proof generation + self-revision, 98k samples) huggingface.co/datasets/Goede… We hope this helps push formal theorem proving forward. Build on it! Amazing Collaborators: @Yong18850571 @sangertang1999 @Lyubh22 @juihuichung @thomaszhao1998 @_LaiJiang @thiiis_user @EmilyJge @JingruoS5931 @wujiayun12 @GesiJiri68334 @davidjesusacu @KaiyuYang4 @hongzhou__lin @YejinChoinka @danqi_chen @prfsanjeevarora @chijinML

English

201

29K

Bohan Lyu retweetledi

Jason Lee@jasondeanlee·9 Şub

Let's formalize learning theory!

Fanghui Liu@Fanghui_SgrA

🚀 We present the first large-scale Lean 4 formulation of Statistical learning theory from scratch! Led by my student @yuanhezhang6 and collaborated with @jasondeanlee 📄 Paper: arxiv.org/abs/2602.02285 💻 GitHub: github.com/YuanheZ/lean-s… 🤗 Dataset: huggingface.co/collections/li…

English

178

15.5K

Bohan Lyu@Lyubh22·26 Oca

Goedel-Prover-V2 has been accepted to #ICLR2026! What’s more important is that, despite being released over half a year ago, it remains the #1🏆 Open-Source model (💚) on PutnamBench. Our models have reached a peak of 100K+ monthly downloads, demonstrating the demand for open math provers. We look forward to seeing more breakthroughs from open-source models/systems, especially in such a vital field that drives the future of mathematical research.

English

540

Bohan Lyu retweetledi

Chi Jin@chijinML·19 Ara

This is a truly remarkable math theorem prover! — well ahead of competitors, near-saturating PutnamBench, and achieving much higher solve rates on the recent concluded Putnam 2025 with a suprisingly short amount of time.

Zheng Yuan@GanjinZero

Excited to announce Seed-Prover 1.5 which is trained via large-scale agentic RL with Lean. It proved 580/660 Putnam problems and proved 11/12 in Putnam 2025 within 9 hours. Check details at github.com/ByteDance-Seed…. We will work on autoformalize towards contributing to real math!

English

12.4K

Bohan Lyu@Lyubh22·14 Kas

@_vztu 我是cs+finance的dual major

中文

120

Zhengzhong Tu@_vztu·14 Kas

@Lyubh22 以前都是各种专业转经管，现在是经管转计算机了。。 what an era...

中文

781

Bohan Lyu retweetledi

Yong Lin@Yong18850571·14 Kas

[Life update] I’ve officially left @PrincetonPLI and joined Thinking Machines Lab @thinkymachines . It feels like the right time to look back on my journey at Princeton — one and a half years that were truly transformative. During this period, I made many friends, learned tremendously, and co-founded and co-led the Goedel Project. It was one of the most rewarding experiences of my life: a small, close-knit team of about ten people working with a clear purpose, moving fast, and ultimately building something impactful. In mid-July, we released Goedel-Prover-V2 (32B), a model that significantly outperformed the previous state-of-the-art DeepSeek-Prover-V2-671B on formal mathematical reasoning, using nearly 20× fewer parameters and dramatically less compute. Even now, four months after release, it still sits at the top of the open-source leaderboard. What makes this achievement especially meaningful is that we accomplished it entirely with academic resources. Competing against large industrial labs and still coming out ahead felt almost unreal. Seeing so many research teams now building on top of Goedel-Prover-V2 is deeply gratifying — it’s proof that open, academic AI can still make a real impact. Equally fulfilling was the journey itself. Unlike industrial teams with access to large-scale, off-the-shelf RL infrastructure, we — a group of students and researchers from academia with zero prior experience in massive model training — had to build almost everything from scratch. We learned quickly, identified problems as they emerged, and fixed them with remarkable speed. Designing, scaling, and successfully training a 32B-parameter model within just three months remains one of the things I’m most proud of. The Goedel Project began in October 2024. At that time, we had no serious experience training models that could compete with the best labs. DeepSeek-Prover-V1 and V1.5 looked unbeatable — they had started a year earlier and already set an incredibly high bar. We experimented with many ideas — agentic pipelines, divide-and-conquer methods — most of which turned out to be too costly or impractical given our limited resources. Eventually, we discovered a simple yet powerful iterative-training approach that allowed us to scale efficiently within our compute limits. Bit by bit, we caught up with DeepSeek-Prover-V1.5 — and then surpassed it. Princeton winters are brutally cold. It was the first time I’d ever seen snow last for weeks. I spent the entire winter break at home, running experiments, analyzing results, and adjusting training methods and data again and again. That persistence paid off: in February, we released Goedel-Prover-V1-7B, which captured the top spot on the leaderboard. It was our first major milestone — proof that an academic team could compete with frontier models. Our celebration was short-lived. In April, Kimi-Prover-72B and DeepSeek-Prover-V2-671B both arrived — and completely outperformed us. It was a tough moment. We couldn’t even host DeepSeek-Prover-V2-671B for inference locally; communication errors kept crashing our limited infrastructure. None of us had experience deploying or training models of that scale. Still, we decided to aim higher — to beat them in the next version. We began by identifying the bottlenecks in DeepSeek and Kimi’s provers, exploring every possible angle for improvement. We experimented with compiler-based feedback loops, curriculum data synthesis, self-improvement strategies, model distillation, and model merging to improve diversity during RL. But the most critical insight was about efficiency — optimizing how we allocated limited resources across training design, data generation, and scaling. Every GPU hour had to count. After two months of exploration and countless small-scale tests, we finally established a systematic framework for the next release: Goedel-Prover-V2. I led "The Big Run" — a nearly month-long sequence of two self-improvement fine-tuning cycles followed by one large-scale RL round. We completed training just a few days before our scheduled release, leaving barely enough time for evaluation. Those last nights were intense — running tests, fixing scripts, collecting metrics — but everything came together perfectly. When we saw the final results, we could hardly believe them: Goedel-Prover-V2 solved twice as many problems on PutnamBench as DeepSeek-Prover-V2-671B. Many people have since asked what the “key” was — how an academic team managed to outperform frontier labs using a fraction of their resources. There isn’t a single magic trick, but rather a combination of principles that guided us: * build solid infrastructure early * focus on real bottlenecks instead of chasing novelty * investigate broadly with small-scale experiments * fix problems in real time * optimize resource allocation carefully * execute the final big run with precision. Each of these steps sounds simple, but together they made all the difference. Now, at Thinking Machines Lab, I’m shifting focus beyond formal reasoning toward building general-purpose models. I’m deeply inspired by TML’s mission — developing interactive AI systems and advancing open science. I’m thrilled to begin this new chapter and look forward to sharing more in the future.

English

646

78.3K

Bohan Lyu retweetledi

Chi Jin@chijinML·7 Kas

Super proud of my fantastic postdocs and graduate students taking their next steps at frontier labs 🎉 • Yong Lin (@Yong18850571) → Thinking Machine • Zihan Ding (@Hanry65960814) → Bytedance • Ahmed Khaled → Google It’s always bittersweet to say goodbye😢 but I couldn’t be more excited to see what you achieve next!

English

244

39.7K

Bohan Lyu retweetledi

Danqi Chen@danqi_chen·8 Eki

I am going to present two papers at #COLM2025 tomorrow from 4:30-6:30pm, as none of our leading authors can attend due to visa issues. Haven't done poster presentations for years 🤣🤣 .... so I will do my best! #76: LongProc #80: Goedel-Prover v1

Chi Jin@chijinML

Our Goedel-Prover V1 will be presented at COLM 2025 in Montreal this Wednesday afternoon! I won’t be there in person, but my amazing and renowned colleague @danqi_chen will be around to help with the poster — feel free to stop by!

English

348

49K

Bohan Lyu retweetledi

Christopher Manning@chrmanning·7 Eki

This paper by Ivan Lee (@ivn1e) & @BergKirkpatrick was great! Best thing I’ve seen at #COLM2025 so far! Readability ≠ Learnability: Rethinking the Role of Simplicity in Training Small Language Models openreview.net/forum?id=AFMGb…

English

272

24.2K

Bohan Lyu retweetledi

Chi Jin@chijinML·2 Eki

Excited to share that I’ve been promoted to Associate Professor with tenure at Princeton!🎉 6 years may not be long, but AI research has evolved significantly during this period. Grateful to all my students, collaborators, colleagues for being with me on this remarkable journey!

English

149

2.7K

114K

Bohan Lyu@Lyubh22·30 Eyl

Building upon Goedel-Prover-V2, Hilbert Prover achieved 99.2% on Minif2f and solved over 70% PutnamBench problems😱 Amazing news from my old home @yuqirose's lab. At ICML this year, someone asked why the model struggled with Putnam problems. I said it was a matter of time, and now here we are! I still vividly remember explaining our V2 work to Sumanth over spaghetti and meatballs the day after the blog post went live. What a journey. Congrats! Paper: arxiv.org/abs/2509.22819

English

2.1K

Bohan Lyu@Lyubh22·25 Eyl

I'm also more than delighted that this paper helped one of the co-authors to secure a PhD offer from Peking University!

English

148

Bohan Lyu@Lyubh22·25 Eyl

Hopefully we will also feature this work at LAW@NeurIPS 2025, where the story had already become Demystify the Potential of Large Language Models as World Models of Code when I made the submission last month.

Siqiao Huang@KnightNemo_

I want to quietly mention that we basicly came up with the same idea of code execution as world models six months ago, and it turned out to be a EMNLP'25 paper (top 0.5% meta score). Check out: arxiv.org/abs/2502.11167 . Glad to see META pushing it a lot further.

English

1.2K

Bohan Lyu@Lyubh22·17 Eyl

@theomitsa @Khulood_Almani

QAM

Dr. Theophano Mitsa ☦️🇬🇷🇺🇸@theomitsa·6 Kas

linkedin.com/posts/pascalbi… Adapting While Learning Combining External Tools and LLMs to Solve Scientific Problems.

English

907

Bohan Lyu retweetledi

Kaiyue Wen@wen_kaiyue·4 Eyl

(1/n) Check out our new paper: "Fantastic Pretraining Optimizers and Where to Find Them"! >4000 models to find the fastest optimizer! 2× speedups over AdamW? Unlikely. Beware under-tuned baseline or limited scale! E.g. Muon: ~40% speedups <0.5B & only 10% at 1.2B (8× Chinchilla)!

English

446

183.5K

Bohan Lyu retweetledi

Yuchen Jin@Yuchenj_UW·30 Ağu

Ilya Sutskever: bald Demis Hassabis: bald Noam Shazeer: bald Greg Brockman: bald forget AGI. forget curing cancer. cure baldness now. My hairline is on gradient descent.

English

376

323

6.9K

605.3K

Keşfet

@Yong18850571 @sangertang1999 @juihuichung @thomaszhao1998 @_LaiJiang @thiiis_user @EmilyJge @JingruoS5931