Ran Xu

49 posts

Ran Xu

Ran Xu

@ritaranx

Research Scientist @GoogleDeepMind | Prev: CS PhD @EmoryUniversity | LLM, RAG, Agent, Tool-integrated reasoning, RL

Mountain View, CA Katılım Eylül 2022
295 Takip Edilen634 Takipçiler
Ran Xu
Ran Xu@ritaranx·
Excited to be at #NeurIPS through Dec 8 — happy to connect! I’ll be presenting our Spotlight paper on complex QA and reasoning with search: 🗓️ Dec 5, 11:00–2:00pm PST 📍 Exhibit C/D/E — Poster #1908 Also exploring full-time opportunities—DMs open if you’d like to chat!
Ran Xu@ritaranx

🚨 Happy to share AceSearcher accepted to #NeurIPS2025 #Spotlight! 🔹 One LLM, two roles: Decomposer (split queries) + Solver (combine context) 🔹 +7.6% on QA & fact verification 🔹 32B ≈ DeepSeek-V3 on DocMath 📂 Code: github.com/ritaranx/AceSe… 📑 arXiv: arxiv.org/abs/2509.24193

English
0
2
23
4K
Alp
Alp@alpniks·
@ritaranx @Google @GoogleDeepMind @googlecloud Great work! Tool integration is definitely an essential part to train LLM judges on par with human evaluation level. How does TIR-Judge performance compare to human performance for more reasoning-heavy non-verifiable domains?
English
1
0
1
175
Ran Xu
Ran Xu@ritaranx·
@Google @GoogleDeepMind @googlecloud 6/n 📊Best-of-N on Policy Models TIR-Judge is not only a better judge — it makes other models better. When selecting responses in best-of-N inference, TIR-Judge improves policy accuracy by +3.9~6.7% on AIME, BigCodeBench, IFEval, etc. → Better downstream reasoning too🎯
Ran Xu tweet media
English
1
3
9
950
Ran Xu
Ran Xu@ritaranx·
6/n Takeaways: ✅ With self-play frameworks: Smaller LLMs can rival giant proprietary models ✅ We can borrow the treasure from reasoning datasets to assist search in LLM and better couple search and reasoning ✅ Have the great potential for domains: finance, health, science
English
1
1
3
287