Becky Xiangyu Peng

33 posts

Becky Xiangyu Peng

@beckypeng6

Senior Research Scientist @SFResearch, PhD @GeorgiaTech. She/her. NLG + MM + Agents + RL

Palo Alto, CA Katılım Ocak 2020

55 Takip Edilen136 Takipçiler

Becky Xiangyu Peng retweetledi

Salesforce AI Research@SFResearch·17 Eki

Introducing UniDoc-Bench: The First Unified Benchmark for Document-Centric Multimodal RAG 📄 Paper: bit.ly/47caWlP Real documents mix text, tables, and charts—but most RAG benchmarks test them in isolation. We built UniDoc-Bench to change that. 📊 What's inside: ➡️ 70K PDF pages across 8 domains ➡️ 1,600 QA pairs grounding text, tables & images ➡️ Fair comparison across 4 RAG paradigms 🔍 Key finding: Text-image fusion RAG (68.4%) beats both multimodal joint retrieval (64.1%) and single-modality approaches. Current multimodal embeddings still lag behind combining strong unimodal retrievers. 💻 Code: bit.ly/47eIG1V 📊 Data: bit.ly/3LfmNbi ➡️ Work by Xiangyu Peng @beckypeng6, Can Qin @canqin001, Zeyuan Chen @ZeyuanChen, Ran Xu @stanleyran, Caiming Xiong @CaimingXiong, and Chien-Sheng Wu @jasonwu0731. #FutureOfAI #EnterpriseAI #MultimodalAI #DocumentIntelligence

English

3.7K

Becky Xiangyu Peng@beckypeng6·2 Eki

@cyjustinchen Soooo happy to work with you this summer ❤️❤️❤️❤️❤️You are soooo self motivated!

English

Justin Chih-Yao Chen@cyjustinchen·2 Eki

@beckypeng6 Thanks Becky for being such a nice mentor!! 😊

English

116

Becky Xiangyu Peng@beckypeng6·2 Eki

🚀 NuRL: Pushing LLM Reasoning to the Next Level! 💡 Hard problems? No problem! GRPO struggles with 0% pass-rate tasks, but NuRL nudges LLMs with self-generated hints → expands the learning zone. 📈 Results across 6 benchmarks & 3 models: +0.8–1.8% over GRPO on pass@1

Justin Chih-Yao Chen@cyjustinchen

🚨 NuRL: Nudging the Boundaries of LLM Reasoning GRPO improves LLM reasoning, but often within the model's "comfort zone": hard samples (w/ 0% pass rate) remain unsolvable and contribute zero learning signals. In NuRL, we show that "nudging" the LLM with self-generated hints effectively expands the model's learning zone 👉consistent gains in pass @1 on 6 benchmarks w/ 3 models & raises pass@1024 on challenging tasks! Key takeaways: 1⃣GRPO can't learn from problems the model never solves correctly, but NuRL uses self-generated "hints" to make hard problems learnable 2⃣Abstract, high-level hints work best—revealing too much about the answer can actually hurt performance! 3⃣NuRL improves performance across 6 benchmarks and 3 models (+0.8-1.8% over GRPO), while using fewer rollouts during training 4⃣NuRL works with self-generated hints (no external model needed) and shows larger gains when combined with test-time scaling 5⃣NuRL raises the upper limit: it boosts pass@1024 up to +7.6% on challenging datasets (e.g., GPQA, Date Understanding) 🧵

English

6.2K

Becky Xiangyu Peng retweetledi

Justin Chih-Yao Chen@cyjustinchen·1 Eki

🚨 NuRL: Nudging the Boundaries of LLM Reasoning GRPO improves LLM reasoning, but often within the model's "comfort zone": hard samples (w/ 0% pass rate) remain unsolvable and contribute zero learning signals. In NuRL, we show that "nudging" the LLM with self-generated hints effectively expands the model's learning zone 👉consistent gains in pass@1 on 6 benchmarks w/ 3 models & raises pass@1024 on challenging tasks! Key takeaways: 1⃣GRPO can't learn from problems the model never solves correctly, but NuRL uses self-generated "hints" to make hard problems learnable 2⃣Abstract, high-level hints work best—revealing too much about the answer can actually hurt performance! 3⃣NuRL improves performance across 6 benchmarks and 3 models (+0.8-1.8% over GRPO), while using fewer rollouts during training 4⃣NuRL works with self-generated hints (no external model needed) and shows larger gains when combined with test-time scaling 5⃣NuRL raises the upper limit: it boosts pass@1024 up to +7.6% on challenging datasets (e.g., GPQA, Date Understanding) 🧵

English

324

45K

Becky Xiangyu Peng@beckypeng6·4 Eyl

🚀 Big step forward for Video LLMs! Strefer brings space-time grounding to AI—unlocking agents that see, track, and respond in the flow of our world.

Salesforce AI Research@SFResearch

(Thread 1/8) 🚨 Strefer: Empowering Video LLMs with Space-Time Referring and Reasoning via Synthetic Instruction Data 🚨 Introducing Strefer: a novel data engine for auto-generating instruction data that enables Video LLMs to excel at spatiotemporal video understanding 🎬🧩⏳ Key Contributions: ▶️ Automated Pipeline: Eliminates dependence on legacy annotations through fully automatic instruction generation ▶️ Fine-grained Spatiotemporal Information: Produces temporally aligned, object-centric metadata with instruction-response pairs and multimodal prompts ▶️ Data-Efficient: Achieves improvements in space-time referring and reasoning with only 545 extra videos and no proprietary model dependencies 📄 Paper: bit.ly/427rVnw 🌐 Project: bit.ly/4gi4Owr 💻 Code: bit.ly/3I5HbdO 🎥 YouTube (10-min video): bit.ly/4lZL1TS How does Strefer lay the foundation for perceptually grounded, instruction-tuned Video LLMs? Dive into the researchers' walk-through below! 🧵

English

108

Becky Xiangyu Peng retweetledi

Salesforce AI Research@SFResearch·22 May

🧠 RAG systems excel at answering questions—but what happens when there's no answer? We introduce UAEval4RAG, a framework to evaluate how well RAG models handle unanswerable queries. 📄 Paper: bit.ly/3SeY9Ic 🔗 Code: bit.ly/4jejpZw By categorizing unanswerable types and synthesizing challenging queries on your own data, we reveal trade-offs—and help you choose the best RAG components for your data. #RAG #NLP #AIEvaluation #ACL25 📝 Thanks to the UAEval4RAG team: Xiangyu Peng @beckypeng6 Prafulla Kumar Choubey Caiming Xiong @CaimingXiong Chien-Sheng Wu @jasonwu0731

English

2.9K

Becky Xiangyu Peng@beckypeng6·12 Şub

@SFResearch This is a pretty solid work to enhance LLM reasoning by self-synthesizing reasoning paths without any human supervision and task specific examples.

English

143

Becky Xiangyu Peng retweetledi

Salesforce AI Research@SFResearch·12 Şub

🔉 New advances in LLM reasoning capabilities accepted for oral presentation at #ICLR2025! 📎 Paper: arxiv.org/abs/2410.02108 ReGenesis introduces a novel approach where models self-improve their reasoning through abstraction-to-concrete progression - no human supervision needed. Key findings: ▶️ Self-synthesized reasoning paths ▶️ Superior generalization to new tasks ▶️ 6.1% improvement in OOD performance ▶️ Validated across multiple model architectures Our work opens new possibilities for developing more robust and generalizable AI systems. Stay tuned for the full presentation and see you in Singapore! #AIResearch #AIReasoning @iclr_conf

English

6.9K

Becky Xiangyu Peng@beckypeng6·10 Eki

We evaluate ReGenesis against baseline methods on 6 OOD tasks, which include math, logic, common sense, and natural language inference. We show that ReGenesis generalize well on OOD reasoning tasks, achieving a 6.1% improvement compared to the model without fine-tuning.

English

120

Becky Xiangyu Peng@beckypeng6·9 Eki

🚨🆕🚨Introducing ReGenesis: Reasoning Generalists via Self-Improvement! Our method self-synthesizes reasoning paths, moving from abstract to concrete. 🔥While others see a 4.6% drop in OOD performance, ReGenesis delivers a 6.1% boost! 🚀 🔗arxiv.org/abs/2410.02108

English

3.4K

Becky Xiangyu Peng@beckypeng6·10 Eki

For in-domain settings, across five in-domain tasks involving math, logic, and common sense, ReGenesis outperforms all 5 baselines by 7.1% to 18.9%.

English

Becky Xiangyu Peng@beckypeng6·10 Eki

@SFResearch In other words, existing methods don't generalize well on OOD tasks and cannot make given LLMs reasoning generalists. To fill this gap, we explore how to self-synthesize reasoning paths as post training data to make LLMs generalize well across various OOD reasoning tasks.

English

Becky Xiangyu Peng@beckypeng6·10 Eki

Existing self-synthesizing methods suffer from poor generalization to out-of-domain (OOD) reasoning tasks. Our results indicate that further post-training using data produced by these methods leads to a 4.6% decline on the given LLM's performance on such OOD tasks.

English

Becky Xiangyu Peng@beckypeng6·10 Eki

@SFResearch But, the acquisition of high-quality reasoning trajectory data in the post-training phase demands meticulous supervision for each reasoning step, either from humans or superior models.

English

Becky Xiangyu Peng@beckypeng6·10 Eki

Recent research has demonstrated that post-training with explicit intermediate reasoning trajectories can improve the performance of large language models (LLMs) across a wide range of complicated reasoning tasks, such as mathematical reasoning, commonsense reasoning, etc.

English

112

Becky Xiangyu Peng@beckypeng6·10 Eki

Becky Xiangyu Peng@beckypeng6

English

Becky Xiangyu Peng@beckypeng6·3 May

So happy to graduate as Dr. No. 9 from this best AI lab @mark_riedl~ Time flies! Thanks sooo much Mark! I am sooo lucky to spend these five years with you and so many amazing labmates!

English

146

Becky Xiangyu Peng@beckypeng6·3 May

@mark_riedl Haha, Dr No. 9 is here🥳

English

Becky Xiangyu Peng retweetledi

Mark Riedl@mark_riedl·3 May

Congratulations to Dr. Ashutosh Baheti, Dr. Zhiyu Lin, and Dr. Xiangyu Peng (and Dr. Peng’s husband, Dr. Yifan Lin, from another lab)

English

102

6.8K

Becky Xiangyu Peng retweetledi

Upol Ehsan@UpolEhsan·19 Ara

As DJ Khaled would say: "ANOTHER ONE!" ☝️ 🚀 The Human-centered Explainable AI workshop is back at #CHI2024! 🎯 LLMs, LLMs, everywhere. What does it mean for Explainable AI & humans? 📍Submit to #HCXAI & find your community! 🔗 hcxai.jimdosite(.)com Pls RT. Pro tips ⤵️ 1/3

English

13.7K

Keşfet

@canqin001 @ZeyuanChen @stanleyran @CaimingXiong @jasonwu0731 @cyjustinchen @SFResearch @iclr_conf