Lei Cui

335 posts

Lei Cui banner
Lei Cui

Lei Cui

@wolfshowme

NLP Researcher at Microsoft Research, retrogaming and console fans

Beijing Katılım Kasım 2009
354 Takip Edilen321 Takipçiler
Lei Cui
Lei Cui@wolfshowme·
Just got the N64 x7, should I solder it myself? @krikzz
Lei Cui tweet media
English
1
1
3
959
Lei Cui retweetledi
krikzz
krikzz@krikzz·
Black Friday sales begins! 20% off on all products at krikzz.com. RT this message and you will have a chance to get EverDrive or FXPAK for free! Winner will be chosen 02.12.2025 #GIVEAWAY
krikzz tweet media
English
94
1.4K
1.2K
104.5K
Lei Cui
Lei Cui@wolfshowme·
#AesCoder on Design Arena
Grace Li@grx_xce

Congrats to @MSFTResearch on AesCoder-4B, a tiny model that introduces GRPO-AR to jointly optimize functionality and code aesthetics, holding its own against models 100x its size Our team is proud to see SOTA cite Design Arena as baseline Official Design Arena results coming soon

English
0
0
1
149
Lei Cui
Lei Cui@wolfshowme·
Thrilled to introduce #PART, a new method to protect LLM reasoning from unauthorized distillation while keeping it transparent for users. By removing self-talk and reordering conclusions, we disrupt illicit training and preserve valuable information. arxiv.org/abs/2510.11545
English
0
1
2
221
Lei Cui
Lei Cui@wolfshowme·
Beyond just text quality! We're introducing #DocReward, a model that evaluates and improves the visual structure and style of documents. In our tests, DOCREWARD achieved a 60.8% win rate in generating human-preferred documents, compared to GPT-5's 37.7%. arxiv.org/abs/2510.11391
English
1
4
7
1.4K
Lei Cui
Lei Cui@wolfshowme·
#Kosmos-2.5 now in Hugging Face
Niels Rogge@NielsRogge

KOSMOS 2.5 by @Microsoft has finally been integrated into @huggingface Transformers 🙌🔥 End-to-end document AI model similar to Donut/Pix2Struct, pre-trained on 357.4 million documents Handles image-to-markdown, OCR with spatial coordinates and chatting with documents!

English
1
0
2
246
Lei Cui retweetledi
机器之心 JIQIZHIXIN
机器之心 JIQIZHIXIN@jiqizhixin·
How do you stop an LLM from getting “thrown off” by a few wild tokens? UCAS, CUHK, HKUST, and Microsoft Research researchers think they’ve cracked it with Geometric-Mean Policy Optimization (GMPO) — a twist on GRPO that tames outliers by optimizing the geometric mean of token-level rewards. Result: more stable training, +4.1% on math benchmarks, +1.4% on multimodal reasoning.
机器之心 JIQIZHIXIN tweet media
English
5
34
164
12.4K
Lei Cui retweetledi
DAIR.AI
DAIR.AI@dair_ai·
Top AI Papers of The Week (July 28 - August 3): - GEPA - Graph-R1 - AlphaEarth - Self-Evolving Agents - Hierarchical Reasoning Model - Efficient Attention Mechanisms - Geometric-Mean Policy Optimization Read on for more:
English
13
134
1.2K
144.1K
Lei Cui retweetledi
DailyPapers
DailyPapers@HuggingPapers·
Microsoft Research introduces Geometric-Mean Policy Optimization (GMPO)! A new RL method that stabilizes LLM reasoning by maximizing the geometric mean of token-level rewards. No more unstable updates!
DailyPapers tweet media
English
10
111
994
61.8K
Lei Cui retweetledi
Remek Kinas
Remek Kinas@KinasRemek·
RL(LLM) - Pisałem ostatnio o GSPO. A dzisiaj publikacje na temat -> GMPO - Geometric-Mean Policy Optimization, ARPO - Agentic Reinforced Policy Optimization, IRL - Inverse RL … Chyba najbardziej kwitnący obszar treningowy LLM. U nas Bielik-v3 też już trenowany RL (GRPO, DR-GRPO, DAPO, GSPO - przygotowane) … czekamy na nową bazę. Wczoraj zakończyłem pracować nad największym polskim matematycznym datasetem treningowym RL - blisko 500k unikalnych i weryfikowalnych polskich zdań. Będzie moc 😁 Team - Krzysiek Ociepa @ChrisOciepa , Łukasz Flis, Adrian Gwoździej, Krzysiek Wróbel i moje wsparcie - pracuje teraz na pełnych obrotach. Dream team🤩Praca sama idzie. Nowe pomysły wdrażane w kilka minut, nie trzeba za wiele mówić - delivery najważniejsze. Ekstra pracuje się w takiej ekipie. Codziennie mamy postęp! Ogromne wsparcie @Cyfronet ❤️🔥
Polski
6
8
86
4.7K
Lei Cui retweetledi
AI Native Foundation
AI Native Foundation@AINativeF·
8. Geometric-Mean Policy Optimization 🔑 Keywords: Geometric-Mean Policy Optimization, Policy Updates, Token-Level Rewards, Multimodal Reasoning, AI Native 💡 Category: Natural Language Processing 🌟 Research Objective: - The research aims to stabilize policy updates in large language models through Geometric-Mean Policy Optimization (GMPO), enhancing the performance on mathematical and multimodal reasoning benchmarks. 🛠️ Research Methods: - GMPO introduces the use of geometric mean for token-level rewards to provide a less sensitive approach to outliers and maintain stable importance sampling ratios. Comprehensive theoretical and experimental analyses are conducted to validate GMPO's design and stability benefits. 💬 Research Conclusions: - GMPO demonstrates improved stability and a performance increase, surpassing GRPO by 4.1% on mathematical benchmarks and 1.4% on multimodal reasoning benchmarks like AIME24, AMC, MATH500, OlympiadBench, Minerva, and Geometry3K. 👉 Paper link: huggingface.co/papers/2507.20…
AI Native Foundation tweet media
English
1
2
2
98
Lei Cui retweetledi
fly51fly
fly51fly@fly51fly·
[CL] Geometric-Mean Policy Optimization Y Zhao, Y Liu, J Liu, J Chen... [Microsoft Research] (2025) arxiv.org/abs/2507.20673
fly51fly tweet mediafly51fly tweet mediafly51fly tweet mediafly51fly tweet media
English
0
3
8
690
Lei Cui retweetledi
Rosinality
Rosinality@rosinality·
Geometric-Mean Policy Optimization Using geometric mean for the importance ratio, similar to GSPO (arxiv.org/abs/2507.18071).
Rosinality tweet media
English
6
30
245
15.7K
Lei Cui
Lei Cui@wolfshowme·
New paper: #GMPO beats GRPO by simply switching from arithmetic → geometric mean for token rewards! ✅ More stable training (no extreme importance sampling ratios) ✅ Better exploration (higher entropy throughout training) huggingface.co/papers/2507.20…
Lei Cui tweet media
English
0
2
5
312