Jiarui Yao

28 posts

Jiarui Yao

@ExplainMiracles

UIUC CS PhD, 24

Katılım Mayıs 2023

621 Takip Edilen101 Takipçiler

Jiarui Yao retweetledi

Pengcheng Wang@PengchengWang19·27 Nis

🚀 Wanna build your own customized agent with controllable workflow? Introducing AgentSPEX — a declarative DSL for building LLM agents. - Customizable agentic workflow with GUI builder - Reproducible State-of-the-Art in SWE-bench verified - Controllable workflow with YAML specification + sandboxed VM + Lean4 verification 🌐 Demo: agentspex.ai 💻 Code: github.com/ScaleML/AgentS… 📄 Paper: huggingface.co/papers/2604.13… 💡 (One of the) Applications: researchguide.work (1/n) #LLM #AIAgents #AgenticAI #OpenSource #AIResearch #AgentHarness

English

13K

Jiarui Yao retweetledi

Microsoft Research@MSFTResearch·10 Mar

PlugMem transforms AI agents’ interaction histories into structured, reusable knowledge. It integrates with any agent, supports diverse tasks and memory types, and maximizes decision quality while significantly reducing memory token use: msft.it/6017Qc9vv

English

Jiarui Yao retweetledi

Ke Yang@EmpathYang·23 Şub

📰New preprint: How can we build a task-agnostic plug-and-play memory module for LLM agents that supports multiple memory types? We present PlugMem🔌🧠, a plugin memory module that works across tasks by turning heterogeneous experience into knowledge. Evaluated unchanged on long-term dialogue🗣️, multi-hop QA🕵️, and web agents🕸️🤖, PlugMem improves performance while using far fewer memory tokens. 📜Paper: empathyang.github.io/files/PlugMem.… 🔨Code: github.com/TIMAN-group/Pl…

English

169

12.2K

Jiarui Yao retweetledi

Cheng Qian@qiancheng1231·13 Oca

🔮 Can a world model (simulator) give today’s AI agents foresight? We tested “world model as a tool”… and found it often doesn’t help—sometimes it hurts. Check our newest paper here: arxiv.org/pdf/2601.03905… #AIagents #WorldModel #ToolUse

English

11.4K

Jiarui Yao@ExplainMiracles·8 Kas

Thrilled to share our paper MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning (arxiv.org/pdf/2505.24846) won an EMNLP 2025 Outstanding Paper Award! 🎉🎉 Huge congrats to the team @evangelinejy99 @RuiYang70669025 @YifanSun99 @FengLuo895614 @rui4research, and big thanks to our advisors Prof. Tong Zhang and @hanzhao_ml!

English

315

Jiarui Yao retweetledi

Rui Yang@RuiYang70669025·7 Kas

Thrilled to share our paper (arxiv.org/pdf/2505.24846) won an EMNLP 2025 Outstanding Paper Award! 🎉🎉 Huge congrats to the team @evangelinejy99 @ExplainMiracles @YifanSun99 @FengLuo895614 @rui4research, and big thanks to our advisors Prof. Tong Zhang and @hanzhao_ml!

English

1.9K

Jiarui Yao@ExplainMiracles·5 Kas

I am at EMNLP 2025 HPC-AI! #emnlp2025 #hpcai

English

256

Jiarui Yao@ExplainMiracles·19 Eyl

Glad that our paper has been accepted to Neurips 2025! By gradient variance minimization (GVM), we balance the training data by difficulties and their contribution to the model. We achieve improvement on math reasoning. Please check the original post for more details.

Jiarui Yao@ExplainMiracles

We introduce Gradient Variance Minimization (GVM)-RAFT, a principled dynamic sampling strategy that minimizes gradient variance to improve the efficiency of chain-of-thought (CoT) training in LLMs. – Achieves 2–4× faster convergence than RAFT – Improves accuracy on math reasoning benchmarks – Generalizes to reinforcement learning methods such as GRPO – Comes with theoretical convergence guarantees 📄 Paper: arxiv.org/abs/2505.02391 🔗 Code: (expected in a few hours) github.com/RLHFlow/GVM #LLM #MachineLearning #ReinforcementLearning #ChainOfThought #AIResearch

English

231

Jiarui Yao retweetledi

Peixuan Han@peixuanhakhan·11 Eyl

(1/5) Super excited to release our new paper on Reinforcement Learning: "Self-Aligned Reward: Towards Effective and Efficient Reasoners"! Preprint: arxiv.org/pdf/2509.05489

English

7.1K

Jiarui Yao retweetledi

Cheng Qian@qiancheng1231·1 Ağu

🤝 Can LLM agents really understand us? We introduce UserBench: a user-centric gym environment for benchmarking how well agents align with nuanced human intent, not just follow commands. 📄 arxiv.org/pdf/2507.22034 💻 github.com/SalesforceAIRe…

English

118

14.3K

Jiarui Yao retweetledi

Yong Lin@Yong18850571·15 Tem

(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B model matches DeepSeek-671B on MiniF2F. 📚 Leading on MathOlympiadBench (IMO-level problems) * Solves 73 vs 50 over 671B DeepSeek Prover 🔓 Website: blog.goedel-prover.com 🔓 Model 32B: huggingface.co/Goedel-LM/Goed… 🔓 Model 8B huggingface.co/Goedel-LM/Goed… 🔓Data and training pipeline will be released soon. Amazing Collaborators: @sangertang1999 @Lyubh22 @__zrrr__ @juihuichung @thomaszhao1998 @pero733858111 @thiiis_user @EmilyJge @JingruoS5931 @wujiayun12 @GesiJiri68334 @davidjesusacu @KaiyuYang4 @hongzhou__lin @YejinChoinka @danqi_chen @prfsanjeevarora @chijinML

English

264

95.3K

Jiarui Yao retweetledi

Noam Razin@noamrazin·11 Tem

Reward models (RMs) are key to language model post-training and inference pipelines. But, little is known about the relative pros and cons of different RM types. 📰 We investigate why RMs implicitly defined by language models (LMs) often generalize worse than explicit RMs 🧵 1/6

English

165

11.5K

Jiarui Yao retweetledi

Shulin Tian@shulin_tian·17 Haz

🎥 Video is already a tough modality for reasoning. Egocentric video? Even tougher! It is longer, messier, and harder. 💡 How do we tackle these extremely long, information-dense sequences without exhausting GPU memory or hitting API limits? We introduce 👓Ego-R1: A framework for reasoning over ultra-long (i.e., in days and weeks) egocentric videos, with the support from Chain-of-Tool-Thought (CoTT) that decomposes complex reasoning tasks into modular steps. At its core is Ego-R1-Agent-3B, an orchestrating language model trained to dynamically invoke specialized tools at each step, based on the previous actions and observations, to collect the necessary information and solve the tasks gradually, step-by-step. All code and data are fully open-sourced :) 🌐 Project: egolife-ai.github.io/Ego-R1 📄 Paper: arxiv.org/abs/2506.13654 💻 Code: github.com/egolife-ai/Ego…

English

6.1K

Jiarui Yao retweetledi

Xiusi Chen@xiusi_chen·4 Haz

Can LLMs make rational decisions like human experts? 📖Introducing DecisionFlow: Advancing Large Language Model as Principled Decision Maker We introduce a novel framework that constructs a semantically grounded decision space to evaluate trade-offs in hard decision-making scenarios transparently. 📑Paper: arxiv.org/abs/2505.21397 💻Code: github.com/xiusic/Decisio… 🧵👇

English

8.2K

Jiarui Yao retweetledi

Peixuan Han@peixuanhakhan·30 May

(1/5) Want to make your LLM a skilled persuader? Check out our latest paper: "ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind"! For details: 📄Arxiv: arxiv.org/pdf/2505.22961 🛠️GitHub: github.com/ulab-uiuc/ToMAP

English

2.5K

Jiarui Yao retweetledi

Cheng Qian@qiancheng1231·27 May

📢 New Paper Drop: From Solving to Modeling! LLMs can solve math problems — but can they model the real world? 🌍 📄 arXiv: arxiv.org/pdf/2505.15068 💻 Code: github.com/qiancheng0/Mod… Introducing ModelingAgent, a breakthrough system for real-world mathematical modeling with LLMs.

English

103

13.4K

Jiarui Yao retweetledi

Hanze Dong@hendrydong·9 May

How to improve the test-time scalability? - Separate thinking & solution phases to control performance under budget constraint - Budget-Constrained Rollout + GRPO - Outperforms baselines on math/code. - Cuts token 30% usage without hurting performance huggingface.co/papers/2505.05…

English

6.7K

Jiarui Yao retweetledi

Xiusi Chen@xiusi_chen·6 May

🚀 Can we cast reward modeling as a reasoning task? 📖 Introducing our new paper: RM-R1: Reward Modeling as Reasoning 📑 Paper: arxiv.org/pdf/2505.02387 💻 Code: github.com/RM-R1-UIUC/RM-… Inspired by recent advances of long chain-of-thought (CoT) on reasoning-intensive tasks, we hypothesize and validate that integrating reasoning capabilities into reward modeling significantly enhances RM's interpretability and performance. RM-R1 achieves state-of-the-art or near state-of-the-art performance of generative RMs on RewardBench, RM-Bench and RMB. 🧵👇

English

201

42.4K

Jiarui Yao@ExplainMiracles·6 May

English

6.4K

Keşfet

@evangelinejy99 @RuiYang70669025 @YifanSun99 @FengLuo895614 @rui4research @hanzhao_ml @sangertang1999 @Lyubh22