Xiusi Chen

44

202

42.2K

Xiusi Chen retweetledi

Gaotang Li@GaotangLi·26 Oca

When we introduced RM-R1, reasoning reward models and rubric-based supervision were still niche. Today, they’re everywhere: from various downstream applications to the post-training of Grok-4. Excited to share that RM-R1 is accepted to ICLR. What’s next?

Xiusi Chen@xiusi_chen

📣 Our RM-R1 paper is accepted to ICLR 2026! @iclr_conf 📷 We hypothesized and validated that integrating reasoning capabilities into reward modeling significantly enhances the interpretability and performance of reward models. 📑 Paper: arxiv.org/pdf/2505.02387 📷 Code: github.com/RM-R1-UIUC/RM-… Many thanks to all the co-authors! @GaotangLi @wzq016 @BowenJin13 @qiancheng1231 @__YuWang__ @HongruWang007 @yuz9yuz @denghui_zhang Prof. Tong Zhang @hanghangtong @hengjinlp

English

1

8

882

Xiusi Chen@xiusi_chen·26 Oca

📣 Our RM-R1 paper is accepted to ICLR 2026! @iclr_conf 📷 We hypothesized and validated that integrating reasoning capabilities into reward modeling significantly enhances the interpretability and performance of reward models. 📑 Paper: arxiv.org/pdf/2505.02387 📷 Code: github.com/RM-R1-UIUC/RM-… Many thanks to all the co-authors! @GaotangLi @wzq016 @BowenJin13 @qiancheng1231 @__YuWang__ @HongruWang007 @yuz9yuz @denghui_zhang Prof. Tong Zhang @hanghangtong @hengjinlp

Xiusi Chen@xiusi_chen

🚀 Can we cast reward modeling as a reasoning task? 📖 Introducing our new paper: RM-R1: Reward Modeling as Reasoning 📑 Paper: arxiv.org/pdf/2505.02387 💻 Code: github.com/RM-R1-UIUC/RM-… Inspired by recent advances of long chain-of-thought (CoT) on reasoning-intensive tasks, we hypothesize and validate that integrating reasoning capabilities into reward modeling significantly enhances RM's interpretability and performance. RM-R1 achieves state-of-the-art or near state-of-the-art performance of generative RMs on RewardBench, RM-Bench and RMB. 🧵👇

English

5

36

5.4K

Xiusi Chen retweetledi

Cheng Qian@qiancheng1231·13 Oca

🔮 Can a world model (simulator) give today’s AI agents foresight? We tested “world model as a tool”… and found it often doesn’t help—sometimes it hurts. Check our newest paper here: arxiv.org/pdf/2601.03905… #AIagents #WorldModel #ToolUse

English

18

51

8.1K

Xiusi Chen retweetledi

Pengrun Huang@pengrun_huang·1 Ara

How does **Property Inference attacks** look on an LLM? 🤔 Our new work **PropInfer** reveals a previously unrecognized training data confidentiality risk in LLMs! 🚨 We show how attackers can successfully infer confidential, *dataset-level* properties (like patient demographics) of fine-tuning dataset. We do this by: —Introducing the first benchmark task for Property inference in LLM —Proposing two tailored attacks targeting LLM Come check out our spotlight paper "Can We Infer Confidential Properties of Training Data from LLMs?" at #NeurIPS2025 during Thur, 4 Dec 4:30 p.m. PST — 7:30 p.m. PST Exhibit Hall C,D,E #1313 📍 @chhaviyadav_ @kamalikac @ruihan_w

English

4

9

842

Xiusi Chen retweetledi

Gaotang Li@GaotangLi·2 Eki

Negative Log-Likelihood (NLL) has long been the go-to objective for classification and SFT, but is it universally optimal? We explore when alternative objectives outperform NLL and when they don't, based on two key factors: the objective's prior-leaningness and the model's capability. 📄 Paper: arxiv.org/abs/2510.00526 💻 Code: github.com/GaotangLi/Beyo… (1/n)

English

Can LLMs make rational decisions like human experts? 📖Introducing DecisionFlow: Advancing Large Language Model as Principled Decision Maker We introduce a novel framework that constructs a semantically grounded decision space to evaluate trade-offs in hard decision-making scenarios transparently. 📑Paper: arxiv.org/abs/2505.21397 💻Code: github.com/xiusic/Decisio… 🧵👇

22

124

18.8K

Xiusi Chen@xiusi_chen·22 Ağu

📣 Our paper is accepted to Findings of EMNLP 2025! 📷 Decision Modeling is the process of formulating an abstract representation of a decision scenario by identifying key variables, their attributes, relevant constraints, and possible courses of action, in order to evaluate trade-offs and arrive at the most rational and explainable outcome. Many thanks to all the co-authors! @Swimmingwang04 @qiancheng1231 @HongruWang007 @peixuanhakhan @hengjinlp Come and check how we do it: arxiv.org/pdf/2505.21397

Xiusi Chen@xiusi_chen

English

6

21

3K

Xiusi Chen retweetledi

Cheng Qian@qiancheng1231·1 Ağu

🤝 Can LLM agents really understand us? We introduce UserBench: a user-centric gym environment for benchmarking how well agents align with nuanced human intent, not just follow commands. 📄 arxiv.org/pdf/2507.22034 💻 github.com/SalesforceAIRe…

English

6

36

118

14.2K

Xiusi Chen retweetledi

Zhenhailong Wang@zhenhailongW·10 Tem

Learning to perceive while learning to reason! We introduce PAPO: Perception-Aware Policy Optimization, a direct upgrade to GRPO for multimodal reasoning. PAPO relies on internal supervision signals. No extra annotations, reward models, or teacher models needed. 🧵1/3

English

17

38

5K

Xiusi Chen retweetledi

Yangyi Chen@YangyiChen6666·15 Haz

🚀 I'm looking for full-time research scientist jobs on foundation models! I study pre-training and post-training of foundation models, and LLM-based coding agents. The figure highlights my research/publications. Please DM me if there is any good fit! Highly appreciated!

English

10

24

127

18K

Xiusi Chen retweetledi

Gaotang Li@GaotangLi·10 Haz

😲 Not only reasoning?! Inference scaling can now boost LLM safety! 🚀 Introducing Saffron-1: - Reduces attack success rate from 66% to 17.5% - Uses only 59.7 TFLOP compute - Counters latest jailbreak attacks - No model finetuning On the AI2 Refusals benchmark. 📖 Paper: huggingface.co/papers/2506.06… 🖥️ Code: github.com/q-rz/saffron 🌐 Webpage: q-rz.github.io/p/saffron

English

2

20

74

15.6K

Xiusi Chen@xiusi_chen·4 Haz

Paradigm shift toward trustworthy AI: From black-box output ➜ Explainable process From language imitation ➜ Symbolic + numerical reasoning From prompt control ➜ Modular, extensible pipeline Future integration with fine-tuning/RL will enhance robustness.

English

3

212

Xiusi Chen@xiusi_chen·4 Haz

📊 Empirical Results Explosive effect: accuracy +30%, bias reduced Medical scenario: accuracy from 22%➜68% Agricultural seed selection: from 30%➜76.67% Stock investment: from 19%➜68.75% At the same time, preference bias is reduced by more than 3 times, and ethical alignment is more stable!

English

0

3

427

Xiusi Chen@xiusi_chen·4 Haz

Can LLMs make rational decisions like human experts? 📖Introducing DecisionFlow: Advancing Large Language Model as Principled Decision Maker We introduce a novel framework that constructs a semantically grounded decision space to evaluate trade-offs in hard decision-making scenarios transparently. 📑Paper: arxiv.org/abs/2505.21397 💻Code: github.com/xiusic/Decisio… 🧵👇

English

15

54

8.1K

Xiusi Chen retweetledi

Cheng Qian@qiancheng1231·27 May

📢 New Paper Drop: From Solving to Modeling! LLMs can solve math problems — but can they model the real world? 🌍 📄 arXiv: arxiv.org/pdf/2505.15068 💻 Code: github.com/qiancheng0/Mod… Introducing ModelingAgent, a breakthrough system for real-world mathematical modeling with LLMs.

English