Jiarui Yao

28 posts

Jiarui Yao banner
Jiarui Yao

Jiarui Yao

@ExplainMiracles

UIUC CS PhD, 24

Katılım Mayıs 2023
621 Takip Edilen101 Takipçiler
Jiarui Yao retweetledi
Pengcheng Wang
Pengcheng Wang@PengchengWang19·
🚀 Wanna build your own customized agent with controllable workflow? Introducing AgentSPEX — a declarative DSL for building LLM agents. - Customizable agentic workflow with GUI builder - Reproducible State-of-the-Art in SWE-bench verified - Controllable workflow with YAML specification + sandboxed VM + Lean4 verification 🌐 Demo: agentspex.ai 💻 Code: github.com/ScaleML/AgentS… 📄 Paper: huggingface.co/papers/2604.13… 💡 (One of the) Applications: researchguide.work (1/n) #LLM #AIAgents #AgenticAI #OpenSource #AIResearch #AgentHarness
Pengcheng Wang tweet media
Pengcheng Wang tweet media
English
1
9
18
13K
Jiarui Yao retweetledi
Microsoft Research
Microsoft Research@MSFTResearch·
PlugMem transforms AI agents’ interaction histories into structured, reusable knowledge. It integrates with any agent, supports diverse tasks and memory types, and maximizes decision quality while significantly reducing memory token use: msft.it/6017Qc9vv
Microsoft Research tweet media
English
2
34
39
9K
Jiarui Yao retweetledi
Ke Yang
Ke Yang@EmpathYang·
📰New preprint: How can we build a task-agnostic plug-and-play memory module for LLM agents that supports multiple memory types? We present PlugMem🔌🧠, a plugin memory module that works across tasks by turning heterogeneous experience into knowledge. Evaluated unchanged on long-term dialogue🗣️, multi-hop QA🕵️, and web agents🕸️🤖, PlugMem improves performance while using far fewer memory tokens. 📜Paper: empathyang.github.io/files/PlugMem.… 🔨Code: github.com/TIMAN-group/Pl…
Ke Yang tweet media
English
13
64
169
12.2K
Jiarui Yao retweetledi
Cheng Qian
Cheng Qian@qiancheng1231·
🔮 Can a world model (simulator) give today’s AI agents foresight? We tested “world model as a tool”… and found it often doesn’t help—sometimes it hurts. Check our newest paper here: arxiv.org/pdf/2601.03905… #AIagents #WorldModel #ToolUse
Cheng Qian tweet media
English
1
19
52
11.4K
Jiarui Yao retweetledi
Peixuan Han
Peixuan Han@peixuanhakhan·
(1/5) Super excited to release our new paper on Reinforcement Learning: "Self-Aligned Reward: Towards Effective and Efficient Reasoners"! Preprint: arxiv.org/pdf/2509.05489
Peixuan Han tweet media
English
2
15
33
7.1K
Jiarui Yao retweetledi
Cheng Qian
Cheng Qian@qiancheng1231·
🤝 Can LLM agents really understand us? We introduce UserBench: a user-centric gym environment for benchmarking how well agents align with nuanced human intent, not just follow commands. 📄 arxiv.org/pdf/2507.22034 💻 github.com/SalesforceAIRe…
Cheng Qian tweet media
English
6
36
118
14.3K
Jiarui Yao retweetledi
Yong Lin
Yong Lin@Yong18850571·
(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B model matches DeepSeek-671B on MiniF2F. 📚 Leading on MathOlympiadBench (IMO-level problems) * Solves 73 vs 50 over 671B DeepSeek Prover 🔓 Website: blog.goedel-prover.com 🔓 Model 32B: huggingface.co/Goedel-LM/Goed… 🔓 Model 8B huggingface.co/Goedel-LM/Goed… 🔓Data and training pipeline will be released soon. Amazing Collaborators: @sangertang1999 @Lyubh22 @__zrrr__ @juihuichung @thomaszhao1998 @pero733858111 @thiiis_user @EmilyJge @JingruoS5931 @wujiayun12 @GesiJiri68334 @davidjesusacu @KaiyuYang4 @hongzhou__lin @YejinChoinka @danqi_chen @prfsanjeevarora @chijinML
Yong Lin tweet mediaYong Lin tweet media
English
9
91
264
95.3K
Jiarui Yao retweetledi
Noam Razin
Noam Razin@noamrazin·
Reward models (RMs) are key to language model post-training and inference pipelines. But, little is known about the relative pros and cons of different RM types. 📰 We investigate why RMs implicitly defined by language models (LMs) often generalize worse than explicit RMs 🧵 1/6
Noam Razin tweet media
English
3
18
165
11.5K
Jiarui Yao retweetledi
Shulin Tian
Shulin Tian@shulin_tian·
🎥 Video is already a tough modality for reasoning. Egocentric video? Even tougher! It is longer, messier, and harder. 💡 How do we tackle these extremely long, information-dense sequences without exhausting GPU memory or hitting API limits? We introduce 👓Ego-R1: A framework for reasoning over ultra-long (i.e., in days and weeks) egocentric videos, with the support from Chain-of-Tool-Thought (CoTT) that decomposes complex reasoning tasks into modular steps. At its core is Ego-R1-Agent-3B, an orchestrating language model trained to dynamically invoke specialized tools at each step, based on the previous actions and observations, to collect the necessary information and solve the tasks gradually, step-by-step. All code and data are fully open-sourced :) 🌐 Project: egolife-ai.github.io/Ego-R1 📄 Paper: arxiv.org/abs/2506.13654 💻 Code: github.com/egolife-ai/Ego…
English
7
8
37
6.1K
Jiarui Yao retweetledi
Xiusi Chen
Xiusi Chen@xiusi_chen·
Can LLMs make rational decisions like human experts? 📖Introducing DecisionFlow: Advancing Large Language Model as Principled Decision Maker We introduce a novel framework that constructs a semantically grounded decision space to evaluate trade-offs in hard decision-making scenarios transparently. 📑Paper: arxiv.org/abs/2505.21397 💻Code: github.com/xiusic/Decisio… 🧵👇
Xiusi Chen tweet mediaXiusi Chen tweet media
English
3
15
54
8.2K
Jiarui Yao retweetledi
Peixuan Han
Peixuan Han@peixuanhakhan·
(1/5) Want to make your LLM a skilled persuader? Check out our latest paper: "ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind"! For details: 📄Arxiv: arxiv.org/pdf/2505.22961 🛠️GitHub: github.com/ulab-uiuc/ToMAP
Peixuan Han tweet media
English
2
7
25
2.5K
Jiarui Yao retweetledi
Cheng Qian
Cheng Qian@qiancheng1231·
📢 New Paper Drop: From Solving to Modeling! LLMs can solve math problems — but can they model the real world? 🌍 📄 arXiv: arxiv.org/pdf/2505.15068 💻 Code: github.com/qiancheng0/Mod… Introducing ModelingAgent, a breakthrough system for real-world mathematical modeling with LLMs.
Cheng Qian tweet mediaCheng Qian tweet media
English
3
31
103
13.4K
Jiarui Yao retweetledi
Hanze Dong
Hanze Dong@hendrydong·
How to improve the test-time scalability? - Separate thinking & solution phases to control performance under budget constraint - Budget-Constrained Rollout + GRPO - Outperforms baselines on math/code. - Cuts token 30% usage without hurting performance huggingface.co/papers/2505.05…
English
5
20
81
6.7K
Jiarui Yao retweetledi
Xiusi Chen
Xiusi Chen@xiusi_chen·
🚀 Can we cast reward modeling as a reasoning task? 📖 Introducing our new paper: RM-R1: Reward Modeling as Reasoning 📑 Paper: arxiv.org/pdf/2505.02387 💻 Code: github.com/RM-R1-UIUC/RM-… Inspired by recent advances of long chain-of-thought (CoT) on reasoning-intensive tasks, we hypothesize and validate that integrating reasoning capabilities into reward modeling significantly enhances RM's interpretability and performance. RM-R1 achieves state-of-the-art or near state-of-the-art performance of generative RMs on RewardBench, RM-Bench and RMB. 🧵👇
Xiusi Chen tweet mediaXiusi Chen tweet media
English
3
44
201
42.4K
Jiarui Yao
Jiarui Yao@ExplainMiracles·
We introduce Gradient Variance Minimization (GVM)-RAFT, a principled dynamic sampling strategy that minimizes gradient variance to improve the efficiency of chain-of-thought (CoT) training in LLMs. – Achieves 2–4× faster convergence than RAFT – Improves accuracy on math reasoning benchmarks – Generalizes to reinforcement learning methods such as GRPO – Comes with theoretical convergence guarantees 📄 Paper: arxiv.org/abs/2505.02391 🔗 Code: (expected in a few hours) github.com/RLHFlow/GVM #LLM #MachineLearning #ReinforcementLearning #ChainOfThought #AIResearch
Jiarui Yao tweet mediaJiarui Yao tweet mediaJiarui Yao tweet mediaJiarui Yao tweet media
English
0
27
88
6.4K