Zhepei Wei

134 posts

Zhepei Wei

@weizhepei

Ph.D. Student @CS_UVA | Prev @AIatMeta, @AmazonScience. Research interest: ML/NLP/LLM.

Charlottesville, VA Katılım Ocak 2016

701 Takip Edilen318 Takipçiler

Sabitlenmiş Tweet

Zhepei Wei@weizhepei·1d

😢RLVR is powerful but expensive 🤯Imagine using <20% RLVR training while achieving 100% performance? Sounds surprising? We show that minimal RLVR training is enough to know where training is going, and predict future ckpts at no training cost! 📃tinyurl.com/minimal-rlvr 🧵[1/n]

English

174

15.7K

Zhepei Wei@weizhepei·1d

Kudos to our amazing collaborators: Xinyu @tianhongzxy , Wei-Lin @WeiLin__Chen , Chengsong @ChengsongH31219 , Jiaxin @jiaxinhuang0229 , and Yu @yumeng0818 🎉🎉🥳

English

240

Zhepei Wei@weizhepei·1d

📢 Takeaway: You only need minimal RLVR training to know where the model is heading. Observe the early training dynamics, then go extrapolate future checkpoints at no training cost! Blogpost👇 weizhepei.notion.site/you-only-need-… 🧵[10/n]

English

267

Zhepei Wei@weizhepei·1d

English

174

15.7K

Zhepei Wei retweetledi

ChengSong Huang@ChengsongH31219·3d

"How do you self-improve a model on open-ended tasks where you can't take a majority vote?" I got asked this in nearly every research interview I did last year. None of my answers felt clean. So we built something that doesn't need a vote, a verifier, or a judge. Meet G-Zero. 👇 paper: arxiv.org/abs/2605.09959 huggingface: huggingface.co/papers/2605.09… code: github.com/Chengsong-Huan… All experiments are done via api by @thinkymachines (1/n)

English

242

14.3K

Zhepei Wei retweetledi

Yu Zhang@yuz9yuz·8 May

Our new work led by @zhuofengli96475. Claude Code (or even simpler 𝘁𝗲𝗿𝗺𝗶𝗻𝗮𝗹-𝗯𝗮𝘀𝗲𝗱 𝗰𝗼𝗱𝗶𝗻𝗴 𝗵𝗮𝗿𝗻𝗲𝘀𝘀𝗲𝘀) + 𝗴𝗿𝗲𝗽, 𝗳𝗶𝗻𝗱, 𝗵𝗲𝗮𝗱, etc. can beat competitive search agents! No BM25/BERT/index! "The best retriever for agentic search is no retriever."

Zhuofeng Li@zhuofengli96475

🔥 Introducing Direct Corpus Interaction (DCI)! The best retriever for agentic search is no retriever. 🚀 We replaced the entire agentic search pipeline — embedding model, vector index, top-k retrieval — with only `grep` and `bash`. 🔧 📄 Paper: huggingface.co/papers/2605.05… DCI unlocks the full agentic potential of any Claude Sonnet 4.6: 69.0% → 80.0% on BrowseComp-Plus (+11.0, −$424). 💡The Magic: The agent searches the raw corpus directly — `grep`, `find`, `bash`, shell pipelines — exactly like a coding agent navigating a codebase. No preprocess. No embedding model. No vector index. No offline indexing. 📊The Results: DCI outperforms top baselines across 13 benchmarks, with average gains of: 🔍 Agentic Search: +11.0% 🧠 Multi-hop QA: +30.7% 📈 IR Ranking: +21.5% 💡 Insights: Beyond accuracy, we conduct a series of controlled ablation studies to pinpoint the sources of DCI’s gains. Specifically, we examine trajectory-level search, evidence utilization corpus, context management, and tool usage (RQ2-RQ6). Try it yourself! 🛠️Code: github.com/DCI-Agent/DCI-… 🤖 Demo: huggingface.co/spaces/DCI-Age… 🔎 Eval logs: huggingface.co/datasets/DCI-A…

English

3.4K

Zhepei Wei retweetledi

Langlin Huang@shrangoh·6d

Can adding pure nonsense to a prompt make an LLM reason better? YES! 🤯 Introducing LoPE (Lorem Perturbation for Exploration). We found that prepending meaningless pseudo-Latin placeholder text (Lorem Ipsum) acts like magic—helping GRPO escape the "zero-advantage" trap on hard math questions! Paper page📄:huggingface.co/papers/2605.05… Broadens LLM exploration: increases resample success rate by ~3x📈 Strengthens LLM reasoning: improves average math reasoning by up to 13%📊 Here’s how it works👇

English

25.7K

Zhepei Wei@weizhepei·1 May

🎉TruthRL is accepted to #ICML2026! A simple ternary reward (correct: +1; abstention: 0; incorrect: −1) helps LLMs answer more accurately and know when not to answer, significantly reducing hallucinations! Paper + code 👇 📄 arxiv.org/abs/2509.25760 💻 github.com/facebookresear…

Zhepei Wei@weizhepei

🤔Ever wondered why your post-training methods (SFT/RL) make LLMs reluctant to say “I don't know?” 🤩Introducing TruthRL — a truthfulness-driven RL method that significantly reduces hallucinations while achieving accuracy and proper abstention! 📃arxiv.org/abs/2509.25760 🧵[1/n]

English

1.9K

Zhepei Wei retweetledi

Jiaxin Huang@jiaxinhuang0229·22 Nis

Heading to #ICLR2026? My amazing student, @ChengsongH31219 has TWO posters on LLM post-training, self-evolving, and test-time scaling! 🔥 If you are interested in these topics, please check out the work! 👀👇 📌 R-Zero: Self-Evolving Reasoning LLM from Zero Data 📅 Thu, Apr 23 • 3:15 PM – 5:45 PM 📌 CaTS: Calibrated Test-Time Scaling for Efficient LLM Reasoning 📅 Fri, Apr 24 • 10:30 AM – 1:00 PM Sadly, I won't be able to make it in person this year, but feel free to reach out to me online!

English

2.5K

Zhepei Wei@weizhepei·7 Nis

@zhangbinchi3 Congrats, Binchi!

English

Binchi Zhang@zhangbinchi3·7 Nis

See you in San Diego #ACL2026

Zhengzhang (Zach) Chen@zhengzhang

Happy to share that four of our papers have been accepted to the main conference of #ACL2026 🎉 Three are part of our ongoing work on Customized and #TrustworthyLLMs. See you in San Diego 🌴 #ACL #LLMs

English

829

Zhepei Wei retweetledi

Wei-Lin Chen@WeiLin__Chen·17 Şub

🚀 New paper from my internship at @Google! LLMs can “think” for a long time only to get the answer wrong — more tokens do not always help and may be overthinking 😵‍💫 We introduce Deep-Thinking Ratio (DTR), a new way to measure LLM reasoning effort. The idea: Count the tokens models had to think deeply to produce. 🧵

English

625

46.2K

Zhepei Wei@weizhepei·14 Şub

@tyler_griggs_ Great work!

English

229

Tyler Griggs@tyler_griggs_·13 Şub

SkyRL now implements the Tinker API. Now, training scripts written for Tinker can run on your own GPUs with zero code changes using SkyRL's FSDP2, Megatron, and vLLM backends. Blog: novasky-ai.notion.site/skyrl-tinker 🧵

English

235

57K

Zhepei Wei retweetledi

ChengSong Huang@ChengsongH31219·30 Oca

Self-Calibration is accepted by #ICLR2026 !!! Thank you for all collaborators! @shrangoh @JixuanLeng @liujc1998 and my advisor @jiaxinhuang0229

ChengSong Huang@ChengsongH31219

🚀🚀New Research Alert: Efficient Test-Time Scaling via Self-Calibration! ❓How to dynamically allocate computational resources in repeated sampling methods? 💡We propose an efficient test-time scaling method by using model confidence for dynamically sampling adjustment, since confidence can be seen as an intrinsic measure that directly reflects model uncertainty on different tasks. The confidence-weighted Self-Consistency can save 94.2% samples to achieve an accuracy of 85.0, compared to standard Self-Consistency. Paper: arxiv.org/abs/2503.00031 Code: github.com/Chengsong-Huan… [1/n]

English

2.1K

Keşfet

@tianhongzxy @WeiLin__Chen @ChengsongH31219 @jiaxinhuang0229 @yumeng0818 @thinkymachines @zhuofengli96475 @zhangbinchi3