Becky Xiangyu Peng

33 posts

Becky Xiangyu Peng banner
Becky Xiangyu Peng

Becky Xiangyu Peng

@beckypeng6

Senior Research Scientist @SFResearch, PhD @GeorgiaTech. She/her. NLG + MM + Agents + RL

Palo Alto, CA Katılım Ocak 2020
55 Takip Edilen136 Takipçiler
Becky Xiangyu Peng retweetledi
Salesforce AI Research
Salesforce AI Research@SFResearch·
Introducing UniDoc-Bench: The First Unified Benchmark for Document-Centric Multimodal RAG 📄 Paper: bit.ly/47caWlP Real documents mix text, tables, and charts—but most RAG benchmarks test them in isolation. We built UniDoc-Bench to change that. 📊 What's inside: ➡️ 70K PDF pages across 8 domains ➡️ 1,600 QA pairs grounding text, tables & images ➡️ Fair comparison across 4 RAG paradigms 🔍 Key finding: Text-image fusion RAG (68.4%) beats both multimodal joint retrieval (64.1%) and single-modality approaches. Current multimodal embeddings still lag behind combining strong unimodal retrievers. 💻 Code: bit.ly/47eIG1V 📊 Data: bit.ly/3LfmNbi ➡️ Work by Xiangyu Peng @beckypeng6, Can Qin @canqin001, Zeyuan Chen @ZeyuanChen, Ran Xu @stanleyran, Caiming Xiong @CaimingXiong, and Chien-Sheng Wu @jasonwu0731. #FutureOfAI #EnterpriseAI #MultimodalAI #DocumentIntelligence
Salesforce AI Research tweet media
English
0
6
14
3.7K
Becky Xiangyu Peng
Becky Xiangyu Peng@beckypeng6·
@cyjustinchen Soooo happy to work with you this summer ❤️❤️❤️❤️❤️You are soooo self motivated!
English
0
0
1
33
Becky Xiangyu Peng
Becky Xiangyu Peng@beckypeng6·
🚀 NuRL: Pushing LLM Reasoning to the Next Level! 💡 Hard problems? No problem! GRPO struggles with 0% pass-rate tasks, but NuRL nudges LLMs with self-generated hints → expands the learning zone. 📈 Results across 6 benchmarks & 3 models: +0.8–1.8% over GRPO on pass@1
Justin Chih-Yao Chen@cyjustinchen

🚨 NuRL: Nudging the Boundaries of LLM Reasoning GRPO improves LLM reasoning, but often within the model's "comfort zone": hard samples (w/ 0% pass rate) remain unsolvable and contribute zero learning signals. In NuRL, we show that "nudging" the LLM with self-generated hints effectively expands the model's learning zone 👉consistent gains in pass@1 on 6 benchmarks w/ 3 models & raises pass@1024 on challenging tasks! Key takeaways: 1⃣GRPO can't learn from problems the model never solves correctly, but NuRL uses self-generated "hints" to make hard problems learnable 2⃣Abstract, high-level hints work best—revealing too much about the answer can actually hurt performance! 3⃣NuRL improves performance across 6 benchmarks and 3 models (+0.8-1.8% over GRPO), while using fewer rollouts during training 4⃣NuRL works with self-generated hints (no external model needed) and shows larger gains when combined with test-time scaling 5⃣NuRL raises the upper limit: it boosts pass@1024 up to +7.6% on challenging datasets (e.g., GPQA, Date Understanding) 🧵

English
3
12
66
6.2K
Becky Xiangyu Peng retweetledi
Justin Chih-Yao Chen
Justin Chih-Yao Chen@cyjustinchen·
🚨 NuRL: Nudging the Boundaries of LLM Reasoning GRPO improves LLM reasoning, but often within the model's "comfort zone": hard samples (w/ 0% pass rate) remain unsolvable and contribute zero learning signals. In NuRL, we show that "nudging" the LLM with self-generated hints effectively expands the model's learning zone 👉consistent gains in pass@1 on 6 benchmarks w/ 3 models & raises pass@1024 on challenging tasks! Key takeaways: 1⃣GRPO can't learn from problems the model never solves correctly, but NuRL uses self-generated "hints" to make hard problems learnable 2⃣Abstract, high-level hints work best—revealing too much about the answer can actually hurt performance! 3⃣NuRL improves performance across 6 benchmarks and 3 models (+0.8-1.8% over GRPO), while using fewer rollouts during training 4⃣NuRL works with self-generated hints (no external model needed) and shows larger gains when combined with test-time scaling 5⃣NuRL raises the upper limit: it boosts pass@1024 up to +7.6% on challenging datasets (e.g., GPQA, Date Understanding) 🧵
Justin Chih-Yao Chen tweet media
English
12
77
324
45K
Becky Xiangyu Peng retweetledi
Salesforce AI Research
Salesforce AI Research@SFResearch·
🧠 RAG systems excel at answering questions—but what happens when there's no answer? We introduce UAEval4RAG, a framework to evaluate how well RAG models handle unanswerable queries. 📄 Paper: bit.ly/3SeY9Ic 🔗 Code: bit.ly/4jejpZw By categorizing unanswerable types and synthesizing challenging queries on your own data, we reveal trade-offs—and help you choose the best RAG components for your data. #RAG #NLP #AIEvaluation #ACL25 📝 Thanks to the UAEval4RAG team: Xiangyu Peng @beckypeng6 Prafulla Kumar Choubey Caiming Xiong @CaimingXiong Chien-Sheng Wu @jasonwu0731
Salesforce AI Research tweet media
English
1
4
22
2.9K
Becky Xiangyu Peng
Becky Xiangyu Peng@beckypeng6·
@SFResearch This is a pretty solid work to enhance LLM reasoning by self-synthesizing reasoning paths without any human supervision and task specific examples.
English
0
0
3
143
Becky Xiangyu Peng retweetledi
Salesforce AI Research
Salesforce AI Research@SFResearch·
🔉 New advances in LLM reasoning capabilities accepted for oral presentation at #ICLR2025! 📎 Paper: arxiv.org/abs/2410.02108 ReGenesis introduces a novel approach where models self-improve their reasoning through abstraction-to-concrete progression - no human supervision needed. Key findings: ▶️ Self-synthesized reasoning paths ▶️ Superior generalization to new tasks ▶️ 6.1% improvement in OOD performance ▶️ Validated across multiple model architectures Our work opens new possibilities for developing more robust and generalizable AI systems. Stay tuned for the full presentation and see you in Singapore! #AIResearch #AIReasoning @iclr_conf
Salesforce AI Research tweet media
English
3
15
86
6.9K
Becky Xiangyu Peng
Becky Xiangyu Peng@beckypeng6·
We evaluate ReGenesis against baseline methods on 6 OOD tasks, which include math, logic, common sense, and natural language inference. We show that ReGenesis generalize well on OOD reasoning tasks, achieving a 6.1% improvement compared to the model without fine-tuning.
Becky Xiangyu Peng tweet media
English
0
1
2
120
Becky Xiangyu Peng
Becky Xiangyu Peng@beckypeng6·
🚨🆕🚨Introducing ReGenesis: Reasoning Generalists via Self-Improvement! Our method self-synthesizes reasoning paths, moving from abstract to concrete. 🔥While others see a 4.6% drop in OOD performance, ReGenesis delivers a 6.1% boost! 🚀 🔗arxiv.org/abs/2410.02108
Becky Xiangyu Peng tweet media
English
6
4
19
3.4K
Becky Xiangyu Peng
Becky Xiangyu Peng@beckypeng6·
For in-domain settings, across five in-domain tasks involving math, logic, and common sense, ReGenesis outperforms all 5 baselines by 7.1% to 18.9%.
Becky Xiangyu Peng tweet media
English
0
1
1
81
Becky Xiangyu Peng
Becky Xiangyu Peng@beckypeng6·
@SFResearch In other words, existing methods don't generalize well on OOD tasks and cannot make given LLMs reasoning generalists. To fill this gap, we explore how to self-synthesize reasoning paths as post training data to make LLMs generalize well across various OOD reasoning tasks.
Becky Xiangyu Peng tweet media
English
0
1
1
81
Becky Xiangyu Peng
Becky Xiangyu Peng@beckypeng6·
Existing self-synthesizing methods suffer from poor generalization to out-of-domain (OOD) reasoning tasks. Our results indicate that further post-training using data produced by these methods leads to a 4.6% decline on the given LLM's performance on such OOD tasks.
English
0
1
2
68
Becky Xiangyu Peng
Becky Xiangyu Peng@beckypeng6·
@SFResearch But, the acquisition of high-quality reasoning trajectory data in the post-training phase demands meticulous supervision for each reasoning step, either from humans or superior models.
English
0
1
2
68
Becky Xiangyu Peng
Becky Xiangyu Peng@beckypeng6·
Recent research has demonstrated that post-training with explicit intermediate reasoning trajectories can improve the performance of large language models (LLMs) across a wide range of complicated reasoning tasks, such as mathematical reasoning, commonsense reasoning, etc.
English
0
1
2
112
Becky Xiangyu Peng
Becky Xiangyu Peng@beckypeng6·
Recent research has demonstrated that post-training with explicit intermediate reasoning trajectories can improve the performance of large language models (LLMs) across a wide range of complicated reasoning tasks, such as mathematical reasoning, commonsense reasoning, etc.
Becky Xiangyu Peng@beckypeng6

🚨🆕🚨Introducing ReGenesis: Reasoning Generalists via Self-Improvement! Our method self-synthesizes reasoning paths, moving from abstract to concrete. 🔥While others see a 4.6% drop in OOD performance, ReGenesis delivers a 6.1% boost! 🚀 🔗arxiv.org/abs/2410.02108

English
0
1
1
50
Becky Xiangyu Peng
Becky Xiangyu Peng@beckypeng6·
So happy to graduate as Dr. No. 9 from this best AI lab @mark_riedl~ Time flies! Thanks sooo much Mark! I am sooo lucky to spend these five years with you and so many amazing labmates!
Becky Xiangyu Peng tweet media
English
0
0
5
146
Becky Xiangyu Peng retweetledi
Mark Riedl
Mark Riedl@mark_riedl·
Congratulations to Dr. Ashutosh Baheti, Dr. Zhiyu Lin, and Dr. Xiangyu Peng (and Dr. Peng’s husband, Dr. Yifan Lin, from another lab)
Mark Riedl tweet media
English
4
4
102
6.8K
Becky Xiangyu Peng retweetledi
Upol Ehsan
Upol Ehsan@UpolEhsan·
As DJ Khaled would say: "ANOTHER ONE!" ☝️ 🚀 The Human-centered Explainable AI workshop is back at #CHI2024! 🎯 LLMs, LLMs, everywhere. What does it mean for Explainable AI & humans? 📍Submit to #HCXAI & find your community! 🔗 hcxai.jimdosite(.)com Pls RT. Pro tips ⤵️ 1/3
Upol Ehsan tweet media
English
1
19
39
13.7K