Zhepei Wei

134 posts

Zhepei Wei banner
Zhepei Wei

Zhepei Wei

@weizhepei

Ph.D. Student @CS_UVA | Prev @AIatMeta, @AmazonScience. Research interest: ML/NLP/LLM.

Charlottesville, VA Katılım Ocak 2016
701 Takip Edilen318 Takipçiler
Sabitlenmiş Tweet
Zhepei Wei
Zhepei Wei@weizhepei·
😢RLVR is powerful but expensive 🤯Imagine using <20% RLVR training while achieving 100% performance? Sounds surprising? We show that minimal RLVR training is enough to know where training is going, and predict future ckpts at no training cost! 📃tinyurl.com/minimal-rlvr 🧵[1/n]
Zhepei Wei tweet media
English
3
28
174
15.7K
Zhepei Wei
Zhepei Wei@weizhepei·
📢 Takeaway: You only need minimal RLVR training to know where the model is heading. Observe the early training dynamics, then go extrapolate future checkpoints at no training cost! Blogpost👇 weizhepei.notion.site/you-only-need-… 🧵[10/n]
English
1
1
3
267
Zhepei Wei
Zhepei Wei@weizhepei·
😢RLVR is powerful but expensive 🤯Imagine using <20% RLVR training while achieving 100% performance? Sounds surprising? We show that minimal RLVR training is enough to know where training is going, and predict future ckpts at no training cost! 📃tinyurl.com/minimal-rlvr 🧵[1/n]
Zhepei Wei tweet media
English
3
28
174
15.7K
Zhepei Wei retweetledi
ChengSong Huang
ChengSong Huang@ChengsongH31219·
"How do you self-improve a model on open-ended tasks where you can't take a majority vote?" I got asked this in nearly every research interview I did last year. None of my answers felt clean. So we built something that doesn't need a vote, a verifier, or a judge. Meet G-Zero. 👇 paper: arxiv.org/abs/2605.09959 huggingface: huggingface.co/papers/2605.09… code: github.com/Chengsong-Huan… All experiments are done via api by @thinkymachines (1/n)
ChengSong Huang tweet media
English
6
45
242
14.3K
Zhepei Wei retweetledi
Yu Zhang
Yu Zhang@yuz9yuz·
Our new work led by @zhuofengli96475. Claude Code (or even simpler 𝘁𝗲𝗿𝗺𝗶𝗻𝗮𝗹-𝗯𝗮𝘀𝗲𝗱 𝗰𝗼𝗱𝗶𝗻𝗴 𝗵𝗮𝗿𝗻𝗲𝘀𝘀𝗲𝘀) + 𝗴𝗿𝗲𝗽, 𝗳𝗶𝗻𝗱, 𝗵𝗲𝗮𝗱, etc. can beat competitive search agents! No BM25/BERT/index! "The best retriever for agentic search is no retriever."
Zhuofeng Li@zhuofengli96475

🔥 Introducing Direct Corpus Interaction (DCI)! The best retriever for agentic search is no retriever. 🚀 We replaced the entire agentic search pipeline — embedding model, vector index, top-k retrieval — with only `grep` and `bash`. 🔧 📄 Paper: huggingface.co/papers/2605.05… DCI unlocks the full agentic potential of any Claude Sonnet 4.6: 69.0% → 80.0% on BrowseComp-Plus (+11.0, −$424). 💡The Magic: The agent searches the raw corpus directly — `grep`, `find`, `bash`, shell pipelines — exactly like a coding agent navigating a codebase. No preprocess. No embedding model. No vector index. No offline indexing. 📊The Results: DCI outperforms top baselines across 13 benchmarks, with average gains of: 🔍 Agentic Search: +11.0% 🧠 Multi-hop QA: +30.7% 📈 IR Ranking: +21.5% 💡 Insights: Beyond accuracy, we conduct a series of controlled ablation studies to pinpoint the sources of DCI’s gains. Specifically, we examine trajectory-level search, evidence utilization corpus, context management, and tool usage (RQ2-RQ6). Try it yourself! 🛠️Code: github.com/DCI-Agent/DCI-… 🤖 Demo: huggingface.co/spaces/DCI-Age… 🔎 Eval logs: huggingface.co/datasets/DCI-A…

English
1
5
25
3.4K
Zhepei Wei retweetledi
Langlin Huang
Langlin Huang@shrangoh·
Can adding pure nonsense to a prompt make an LLM reason better? YES! 🤯 Introducing LoPE (Lorem Perturbation for Exploration). We found that prepending meaningless pseudo-Latin placeholder text (Lorem Ipsum) acts like magic—helping GRPO escape the "zero-advantage" trap on hard math questions! Paper page📄:huggingface.co/papers/2605.05… Broadens LLM exploration: increases resample success rate by ~3x📈 Strengthens LLM reasoning: improves average math reasoning by up to 13%📊 Here’s how it works👇
Langlin Huang tweet media
English
1
8
35
25.7K
Zhepei Wei
Zhepei Wei@weizhepei·
🎉TruthRL is accepted to #ICML2026! A simple ternary reward (correct: +1; abstention: 0; incorrect: −1) helps LLMs answer more accurately and know when not to answer, significantly reducing hallucinations! Paper + code 👇 📄 arxiv.org/abs/2509.25760 💻 github.com/facebookresear…
Zhepei Wei@weizhepei

🤔Ever wondered why your post-training methods (SFT/RL) make LLMs reluctant to say “I don't know?” 🤩Introducing TruthRL — a truthfulness-driven RL method that significantly reduces hallucinations while achieving accuracy and proper abstention! 📃arxiv.org/abs/2509.25760 🧵[1/n]

English
2
7
30
1.9K
Zhepei Wei retweetledi
Jiaxin Huang
Jiaxin Huang@jiaxinhuang0229·
Heading to #ICLR2026? My amazing student, @ChengsongH31219 has TWO posters on LLM post-training, self-evolving, and test-time scaling! 🔥 If you are interested in these topics, please check out the work! 👀👇 📌 R-Zero: Self-Evolving Reasoning LLM from Zero Data 📅 Thu, Apr 23 • 3:15 PM – 5:45 PM 📌 CaTS: Calibrated Test-Time Scaling for Efficient LLM Reasoning 📅 Fri, Apr 24 • 10:30 AM – 1:00 PM Sadly, I won't be able to make it in person this year, but feel free to reach out to me online!
Jiaxin Huang tweet media
English
1
8
33
2.5K
Zhepei Wei retweetledi
Wei-Lin Chen
Wei-Lin Chen@WeiLin__Chen·
🚀 New paper from my internship at @Google! LLMs can “think” for a long time only to get the answer wrong — more tokens do not always help and may be overthinking 😵‍💫 We introduce Deep-Thinking Ratio (DTR), a new way to measure LLM reasoning effort. The idea: Count the tokens models had to think deeply to produce. 🧵
Wei-Lin Chen tweet media
English
19
71
625
46.2K
Tyler Griggs
Tyler Griggs@tyler_griggs_·
SkyRL now implements the Tinker API. Now, training scripts written for Tinker can run on your own GPUs with zero code changes using SkyRL's FSDP2, Megatron, and vLLM backends. Blog: novasky-ai.notion.site/skyrl-tinker 🧵
Tyler Griggs tweet media
English
6
53
235
57K
Zhepei Wei retweetledi