Lei Cui

335 posts

Lei Cui

@wolfshowme

NLP Researcher at Microsoft Research, retrogaming and console fans

Beijing Katılım Kasım 2009

354 Takip Edilen321 Takipçiler

Lei Cui@wolfshowme·10 Mar

MisterClaw: Run PicoClaw AI on Your MiSTer FPGA youtu.be/CK4210q3k4Y?si… via @YouTube @MiSTerFPGA @SipeedIO #MisterClaw #PicoClaw #MisterFPGA #OpenClaw #Retrogaming

YouTube

English

276

Lei Cui@wolfshowme·10 Mar

github.com/wolfshow/Miste… MisterClaw meme designed by Nano Banana @MiSTerFPGA @SipeedIO

English

Lei Cui@wolfshowme·10 Mar

MisterClaw: World's first @SipeedIO PicoClaw AI assistant on MiSTer FPGA (DE10-Nano) #MisterFPGA #OpenClaw #PicoClaw #RetroGaming #MisterClaw github.com/wolfshow/Miste…

English

3.5K

Lei Cui@wolfshowme·10 Mar

Built a tiny AI assistant powered by @SipeedIO PicoClaw 🦞 on my MiSTer FPGA. 10MB RAM, 600+ arcade games, all managed via Telegram. Retro gaming meets AI. 🎮🤖 #MiSTerFPGA #OpenClaw #PicoClaw #RetroGaming

English

209

Lei Cui@wolfshowme·22 Oca

Just got the N64 x7, should I solder it myself? @krikzz

English

959

Lei Cui retweetledi

krikzz@krikzz·22 Kas

Black Friday sales begins! 20% off on all products at krikzz.com. RT this message and you will have a chance to get EverDrive or FXPAK for free! Winner will be chosen 02.12.2025 #GIVEAWAY

English

1.4K

1.2K

104.5K

Lei Cui@wolfshowme·11 Kas

#AesCoder on Design Arena

Grace Li@grx_xce

Congrats to @MSFTResearch on AesCoder-4B, a tiny model that introduces GRPO-AR to jointly optimize functionality and code aesthetics, holding its own against models 100x its size Our team is proud to see SOTA cite Design Arena as baseline Official Design Arena results coming soon

English

149

Lei Cui retweetledi

FW@thegenerality·18 Eki

BitDistill finetunes any full-precision LLMs into 1.58-bit for specific tasks with the same peformance

AK@_akhaliq

Microsoft presents BitNet Distillation

English

443

Lei Cui@wolfshowme·15 Eki

Thrilled to introduce #PART, a new method to protect LLM reasoning from unauthorized distillation while keeping it transparent for users. By removing self-talk and reordering conclusions, we disrupt illicit training and preserve valuable information. arxiv.org/abs/2510.11545

English

221

Lei Cui@wolfshowme·14 Eki

Beyond just text quality! We're introducing #DocReward, a model that evaluates and improves the visual structure and style of documents. In our tests, DOCREWARD achieved a 60.8% win rate in generating human-preferred documents, compared to GPT-5's 37.7%. arxiv.org/abs/2510.11391

English

1.4K

Lei Cui@wolfshowme·29 Ağu

#Kosmos-2.5 now in Hugging Face

Niels Rogge@NielsRogge

KOSMOS 2.5 by @Microsoft has finally been integrated into @huggingface Transformers 🙌🔥 End-to-end document AI model similar to Donut/Pix2Struct, pre-trained on 357.4 million documents Handles image-to-markdown, OCR with spatial coordinates and chatting with documents!

English

246

Lei Cui retweetledi

机器之心 JIQIZHIXIN@jiqizhixin·14 Ağu

How do you stop an LLM from getting “thrown off” by a few wild tokens? UCAS, CUHK, HKUST, and Microsoft Research researchers think they’ve cracked it with Geometric-Mean Policy Optimization (GMPO) — a twist on GRPO that tames outliers by optimizing the geometric mean of token-level rewards. Result: more stable training, +4.1% on math benchmarks, +1.4% on multimodal reasoning.

English

164

12.4K

Lei Cui retweetledi

DAIR.AI@dair_ai·3 Ağu

Top AI Papers of The Week (July 28 - August 3): - GEPA - Graph-R1 - AlphaEarth - Self-Evolving Agents - Hierarchical Reasoning Model - Efficient Attention Mechanisms - Geometric-Mean Policy Optimization Read on for more:

English

134

1.2K

144.1K

Lei Cui retweetledi

DailyPapers@HuggingPapers·2 Ağu

Microsoft Research introduces Geometric-Mean Policy Optimization (GMPO)! A new RL method that stabilizes LLM reasoning by maximizing the geometric mean of token-level rewards. No more unstable updates!

English

111

994

61.8K

Lei Cui retweetledi

DailyPapers@HuggingPapers·2 Ağu

GMPO outperforms GRPO by 4.1% on math & 1.4% on multimodal reasoning benchmarks. It achieves better stability and performance, moving us closer to reliable AI. Learn more & get the code: Paper: huggingface.co/papers/2507.20… Code: github.com/callsys/GMPO

English

2.4K

Lei Cui retweetledi

Remek Kinas@KinasRemek·29 Tem

RL(LLM) - Pisałem ostatnio o GSPO. A dzisiaj publikacje na temat -> GMPO - Geometric-Mean Policy Optimization, ARPO - Agentic Reinforced Policy Optimization, IRL - Inverse RL … Chyba najbardziej kwitnący obszar treningowy LLM. U nas Bielik-v3 też już trenowany RL (GRPO, DR-GRPO, DAPO, GSPO - przygotowane) … czekamy na nową bazę. Wczoraj zakończyłem pracować nad największym polskim matematycznym datasetem treningowym RL - blisko 500k unikalnych i weryfikowalnych polskich zdań. Będzie moc 😁 Team - Krzysiek Ociepa @ChrisOciepa , Łukasz Flis, Adrian Gwoździej, Krzysiek Wróbel i moje wsparcie - pracuje teraz na pełnych obrotach. Dream team🤩Praca sama idzie. Nowe pomysły wdrażane w kilka minut, nie trzeba za wiele mówić - delivery najważniejsze. Ekstra pracuje się w takiej ekipie. Codziennie mamy postęp! Ogromne wsparcie @Cyfronet ❤️🔥

Polski

4.7K

Lei Cui retweetledi

AI Native Foundation@AINativeF·30 Tem

8. Geometric-Mean Policy Optimization 🔑 Keywords: Geometric-Mean Policy Optimization, Policy Updates, Token-Level Rewards, Multimodal Reasoning, AI Native 💡 Category: Natural Language Processing 🌟 Research Objective: - The research aims to stabilize policy updates in large language models through Geometric-Mean Policy Optimization (GMPO), enhancing the performance on mathematical and multimodal reasoning benchmarks. 🛠️ Research Methods: - GMPO introduces the use of geometric mean for token-level rewards to provide a less sensitive approach to outliers and maintain stable importance sampling ratios. Comprehensive theoretical and experimental analyses are conducted to validate GMPO's design and stability benefits. 💬 Research Conclusions: - GMPO demonstrates improved stability and a performance increase, surpassing GRPO by 4.1% on mathematical benchmarks and 1.4% on multimodal reasoning benchmarks like AIME24, AMC, MATH500, OlympiadBench, Minerva, and Geometry3K. 👉 Paper link: huggingface.co/papers/2507.20…

English

Lei Cui retweetledi

fly51fly@fly51fly·30 Tem

[CL] Geometric-Mean Policy Optimization Y Zhao, Y Liu, J Liu, J Chen... [Microsoft Research] (2025) arxiv.org/abs/2507.20673

English

690

Lei Cui retweetledi

Rosinality@rosinality·29 Tem

Geometric-Mean Policy Optimization Using geometric mean for the importance ratio, similar to GSPO (arxiv.org/abs/2507.18071).

English

245

15.7K

Lei Cui@wolfshowme·29 Tem

New paper: #GMPO beats GRPO by simply switching from arithmetic → geometric mean for token rewards! ✅ More stable training (no extreme importance sampling ratios) ✅ Better exploration (higher entropy throughout training) huggingface.co/papers/2507.20…

English

312

Keşfet

@YouTube @MiSTerFPGA @SipeedIO @krikzz @ChrisOciepa @Cyfronet @elonmusk @BarackObama