Xiusi Chen

74 posts

Xiusi Chen

Xiusi Chen

@xiusi_chen

Postdoc @UofIllinois @uiuc_nlp, Ph.D. @UCLA, BS @PKU1898. RM-R1. Ex-Intern @AmazonScience (x2),@NECLabsAmerica. LLM, Neuro-Symbolic AI.

Urbana-Champaign, IL Katılım Haziran 2012
474 Takip Edilen663 Takipçiler
Sabitlenmiş Tweet
Xiusi Chen
Xiusi Chen@xiusi_chen·
🚀 Can we cast reward modeling as a reasoning task? 📖 Introducing our new paper: RM-R1: Reward Modeling as Reasoning 📑 Paper: arxiv.org/pdf/2505.02387 💻 Code: github.com/RM-R1-UIUC/RM-… Inspired by recent advances of long chain-of-thought (CoT) on reasoning-intensive tasks, we hypothesize and validate that integrating reasoning capabilities into reward modeling significantly enhances RM's interpretability and performance. RM-R1 achieves state-of-the-art or near state-of-the-art performance of generative RMs on RewardBench, RM-Bench and RMB. 🧵👇
Xiusi Chen tweet mediaXiusi Chen tweet media
English
3
44
202
42.2K
Xiusi Chen retweetledi
Gaotang Li
Gaotang Li@GaotangLi·
When we introduced RM-R1, reasoning reward models and rubric-based supervision were still niche. Today, they’re everywhere: from various downstream applications to the post-training of Grok-4. Excited to share that RM-R1 is accepted to ICLR. What’s next?
Xiusi Chen@xiusi_chen

📣 Our RM-R1 paper is accepted to ICLR 2026! @iclr_conf 📷 We hypothesized and validated that integrating reasoning capabilities into reward modeling significantly enhances the interpretability and performance of reward models. 📑 Paper: arxiv.org/pdf/2505.02387 📷 Code: github.com/RM-R1-UIUC/RM-… Many thanks to all the co-authors! @GaotangLi @wzq016 @BowenJin13 @qiancheng1231 @__YuWang__ @HongruWang007 @yuz9yuz @denghui_zhang Prof. Tong Zhang @hanghangtong @hengjinlp

English
0
1
8
882
Xiusi Chen
Xiusi Chen@xiusi_chen·
📣 Our RM-R1 paper is accepted to ICLR 2026! @iclr_conf 📷 We hypothesized and validated that integrating reasoning capabilities into reward modeling significantly enhances the interpretability and performance of reward models. 📑 Paper: arxiv.org/pdf/2505.02387 📷 Code: github.com/RM-R1-UIUC/RM-… Many thanks to all the co-authors! @GaotangLi @wzq016 @BowenJin13 @qiancheng1231 @__YuWang__ @HongruWang007 @yuz9yuz @denghui_zhang Prof. Tong Zhang @hanghangtong @hengjinlp
Xiusi Chen@xiusi_chen

🚀 Can we cast reward modeling as a reasoning task? 📖 Introducing our new paper: RM-R1: Reward Modeling as Reasoning 📑 Paper: arxiv.org/pdf/2505.02387 💻 Code: github.com/RM-R1-UIUC/RM-… Inspired by recent advances of long chain-of-thought (CoT) on reasoning-intensive tasks, we hypothesize and validate that integrating reasoning capabilities into reward modeling significantly enhances RM's interpretability and performance. RM-R1 achieves state-of-the-art or near state-of-the-art performance of generative RMs on RewardBench, RM-Bench and RMB. 🧵👇

English
0
5
36
5.4K
Xiusi Chen retweetledi
Cheng Qian
Cheng Qian@qiancheng1231·
🔮 Can a world model (simulator) give today’s AI agents foresight? We tested “world model as a tool”… and found it often doesn’t help—sometimes it hurts. Check our newest paper here: arxiv.org/pdf/2601.03905… #AIagents #WorldModel #ToolUse
Cheng Qian tweet media
English
1
18
51
8.1K
Xiusi Chen retweetledi
Pengrun Huang
Pengrun Huang@pengrun_huang·
How does **Property Inference attacks** look on an LLM? 🤔 Our new work **PropInfer** reveals a previously unrecognized training data confidentiality risk in LLMs! 🚨 We show how attackers can successfully infer confidential, *dataset-level* properties (like patient demographics) of fine-tuning dataset. We do this by: —Introducing the first benchmark task for Property inference in LLM —Proposing two tailored attacks targeting LLM Come check out our spotlight paper "Can We Infer Confidential Properties of Training Data from LLMs?" at #NeurIPS2025 during Thur, 4 Dec 4:30 p.m. PST — 7:30 p.m. PST Exhibit Hall C,D,E #1313 📍 @chhaviyadav_ @kamalikac @ruihan_w
Pengrun Huang tweet media
English
1
4
9
842
Xiusi Chen retweetledi
Gaotang Li
Gaotang Li@GaotangLi·
Negative Log-Likelihood (NLL) has long been the go-to objective for classification and SFT, but is it universally optimal? We explore when alternative objectives outperform NLL and when they don't, based on two key factors: the objective's prior-leaningness and the model's capability. 📄 Paper: arxiv.org/abs/2510.00526 💻 Code: github.com/GaotangLi/Beyo… (1/n)
Gaotang Li tweet mediaGaotang Li tweet media
English
3
22
124
18.8K
Xiusi Chen
Xiusi Chen@xiusi_chen·
📣 Our paper is accepted to Findings of EMNLP 2025! 📷 Decision Modeling is the process of formulating an abstract representation of a decision scenario by identifying key variables, their attributes, relevant constraints, and possible courses of action, in order to evaluate trade-offs and arrive at the most rational and explainable outcome. Many thanks to all the co-authors! @Swimmingwang04 @qiancheng1231 @HongruWang007 @peixuanhakhan @hengjinlp Come and check how we do it: arxiv.org/pdf/2505.21397
Xiusi Chen@xiusi_chen

Can LLMs make rational decisions like human experts? 📖Introducing DecisionFlow: Advancing Large Language Model as Principled Decision Maker We introduce a novel framework that constructs a semantically grounded decision space to evaluate trade-offs in hard decision-making scenarios transparently. 📑Paper: arxiv.org/abs/2505.21397 💻Code: github.com/xiusic/Decisio… 🧵👇

English
0
6
21
3K
Xiusi Chen retweetledi
Cheng Qian
Cheng Qian@qiancheng1231·
🤝 Can LLM agents really understand us? We introduce UserBench: a user-centric gym environment for benchmarking how well agents align with nuanced human intent, not just follow commands. 📄 arxiv.org/pdf/2507.22034 💻 github.com/SalesforceAIRe…
Cheng Qian tweet media
English
6
36
118
14.2K
Xiusi Chen retweetledi
Zhenhailong Wang
Zhenhailong Wang@zhenhailongW·
Learning to perceive while learning to reason! We introduce PAPO: Perception-Aware Policy Optimization, a direct upgrade to GRPO for multimodal reasoning. PAPO relies on internal supervision signals. No extra annotations, reward models, or teacher models needed. 🧵1/3
Zhenhailong Wang tweet media
English
1
17
38
5K
Xiusi Chen retweetledi
Yangyi Chen
Yangyi Chen@YangyiChen6666·
🚀 I'm looking for full-time research scientist jobs on foundation models! I study pre-training and post-training of foundation models, and LLM-based coding agents. The figure highlights my research/publications. Please DM me if there is any good fit! Highly appreciated!
Yangyi Chen tweet media
English
10
24
127
18K
Xiusi Chen retweetledi
Gaotang Li
Gaotang Li@GaotangLi·
😲 Not only reasoning?! Inference scaling can now boost LLM safety! 🚀 Introducing Saffron-1: - Reduces attack success rate from 66% to 17.5% - Uses only 59.7 TFLOP compute - Counters latest jailbreak attacks - No model finetuning On the AI2 Refusals benchmark. 📖 Paper: huggingface.co/papers/2506.06… 🖥️ Code: github.com/q-rz/saffron 🌐 Webpage: q-rz.github.io/p/saffron
Gaotang Li tweet media
English
2
20
74
15.6K
Xiusi Chen
Xiusi Chen@xiusi_chen·
Paradigm shift toward trustworthy AI: From black-box output ➜ Explainable process From language imitation ➜ Symbolic + numerical reasoning From prompt control ➜ Modular, extensible pipeline Future integration with fine-tuning/RL will enhance robustness.
English
0
0
3
212
Xiusi Chen
Xiusi Chen@xiusi_chen·
📊 Empirical Results Explosive effect: accuracy +30%, bias reduced Medical scenario: accuracy from 22%➜68% Agricultural seed selection: from 30%➜76.67% Stock investment: from 19%➜68.75% At the same time, preference bias is reduced by more than 3 times, and ethical alignment is more stable!
Xiusi Chen tweet mediaXiusi Chen tweet mediaXiusi Chen tweet media
English
1
0
3
427
Xiusi Chen
Xiusi Chen@xiusi_chen·
Can LLMs make rational decisions like human experts? 📖Introducing DecisionFlow: Advancing Large Language Model as Principled Decision Maker We introduce a novel framework that constructs a semantically grounded decision space to evaluate trade-offs in hard decision-making scenarios transparently. 📑Paper: arxiv.org/abs/2505.21397 💻Code: github.com/xiusic/Decisio… 🧵👇
Xiusi Chen tweet mediaXiusi Chen tweet media
English
3
15
54
8.1K
Xiusi Chen retweetledi
Cheng Qian
Cheng Qian@qiancheng1231·
📢 New Paper Drop: From Solving to Modeling! LLMs can solve math problems — but can they model the real world? 🌍 📄 arXiv: arxiv.org/pdf/2505.15068 💻 Code: github.com/qiancheng0/Mod… Introducing ModelingAgent, a breakthrough system for real-world mathematical modeling with LLMs.
Cheng Qian tweet mediaCheng Qian tweet media
English
3
31
102
13.4K