Haolin Chen

34 posts

Haolin Chen banner
Haolin Chen

Haolin Chen

@HaolinChen11

Salesforce AI Research @SFResearch, Ph.D. in Applied Mathematics @ucdavis

Palo Alto, CA Katılım Mayıs 2022
101 Takip Edilen136 Takipçiler
Haolin Chen retweetledi
AK
AK@_akhaliq·
xRouter Training Cost-Aware LLMs Orchestration System via Reinforcement Learning huggingface.co/Salesforce/xRo…
English
4
21
118
20.6K
Haolin Chen retweetledi
Weiran Yao
Weiran Yao@iscreamnearby·
Stop restarting your long-running agents. Enterprise Deep Research (EDR) lets you steer mid-run—like driving a car. It can save you hours or even days of work. Open-source, enterprise-ready, built by @SFResearch. Try it & drop your use case below 👇 🤖GitHub: github.com/SalesforceAIRe…
English
4
5
22
9.2K
Haolin Chen retweetledi
Caiming Xiong
Caiming Xiong@CaimingXiong·
🎇🎇🎇Our Webscale dataset is currently the No. 1 trending dataset on @huggingface. huggingface.co/datasets/Sales… @ZhepengCen @iscreamnearby @SFResearch
Caiming Xiong tweet media
Caiming Xiong@CaimingXiong

🚀🚀🚀 Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels We at @SFResearch build an automated pipeline that converts raw web text into verifiable QA pairs, filtered and verified by LLMs, then use Group Relative Policy Optimization (GRPO) to train models directly on this reward-driven data. The result: models trained on Webscale-RL outperform continual pretraining and data-refinement baselines — while using up to 100× fewer tokens. The gains are most pronounced in reasoning, math, and factual QA tasks. Beyond benchmarks, the key shift is conceptual: RL is no longer just a post-training alignment trick — it’s becoming a core optimization stage inside the LLM pretraining loop. This points toward a future of mid-training RL, where large-scale synthetic or automatically verified datasets provide structured reward signals long before human feedback fine-tuning. 🧩 Webscale-RL hints at a new pretraining paradigm — one that learns not just from text, but from reward. Paper: bit.ly/3IFuMhf Code: bit.ly/42AVpdX Data: bit.ly/4h5lVBS

English
0
10
43
20.3K
Haolin Chen
Haolin Chen@HaolinChen11·
Checkout ours latest work of RL data pipeline 👀
Salesforce AI Research@SFResearch

📣 Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels 📣 RL for LLMs faces a critical data bottleneck: existing RL datasets are <10B tokens while pretraining uses >1T tokens. Our Webscale-RL pipeline solves this by automatically converting pretraining documents into 1.2M verifiable QA pairs across 9+ domains. 📄 Paper: bit.ly/3IFuMhf 💻 Code: bit.ly/42AVpdX 📊 Dataset: bit.ly/4h5lVBS Results: 100× more token-efficient than continual pretraining with significant performance gains on MMLU-pro, BigBench, and mathematical reasoning benchmarks 📈 Work by Zhepeng Cen (@zhepengcen), Haolin Chen (@HaolinChen11), Shiyu Wang (@shiyu04490786), Zuxin Liu (@LiuZuxin), Zhiwei Liu, Ding Zhao, Silvio Savarese, Caiming Xiong (@CaimingXiong), Huan Wang (@huan__wang), Weiran Yao (@iscreamnearby) #FutureOfAI #EnterpriseAI #ReinforcementLearning #MachineLearning

English
0
0
2
209
Haolin Chen
Haolin Chen@HaolinChen11·
Open source the secret sauce here👀
Zhepeng Cen@ZhepengCen

🚀 Scaling RL to Pretraining Levels with Webscale-RL RL for LLMs has been bottlenecked by tiny datasets (<10B tokens) vs pretraining (>1T). Our Webscale-RL pipeline converts pretraining text into diverse RL-ready QA data — scaling RL to pretraining levels! All codes and datasets are open-source! Paper: arxiv.org/abs/2510.06499 ✨ Key features: • Converts web-scale corpus into millions of verifiable QA pairs • Preserves pretraining-level diversity across 9 domains • Trains up to 100× more token-efficient than continual pretraining • Powers the Webscale-RL dataset (1.2 M examples) for scalable RL Also special thanks to my colleagues in Salesforce AI Research @SFResearch! @HaolinChen11, Shiyu, @LiuZuxin, @huan__wang, @CaimingXiong, @iscreamnearby

English
0
0
1
127
Haolin Chen retweetledi
Zhepeng Cen
Zhepeng Cen@ZhepengCen·
🚀 Scaling RL to Pretraining Levels with Webscale-RL RL for LLMs has been bottlenecked by tiny datasets (<10B tokens) vs pretraining (>1T). Our Webscale-RL pipeline converts pretraining text into diverse RL-ready QA data — scaling RL to pretraining levels! All codes and datasets are open-source! Paper: arxiv.org/abs/2510.06499 ✨ Key features: • Converts web-scale corpus into millions of verifiable QA pairs • Preserves pretraining-level diversity across 9 domains • Trains up to 100× more token-efficient than continual pretraining • Powers the Webscale-RL dataset (1.2 M examples) for scalable RL Also special thanks to my colleagues in Salesforce AI Research @SFResearch! @HaolinChen11, Shiyu, @LiuZuxin, @huan__wang, @CaimingXiong, @iscreamnearby
Zhepeng Cen tweet media
English
14
35
233
57.9K
Haolin Chen retweetledi
Salesforce AI Research
Salesforce AI Research@SFResearch·
CoDA-1.7B: A diffusion language model for code generation with bidirectional context understanding 🔄 📄 Technical Report: bit.ly/3IBlzGG 🤗 Model: bit.ly/48dX1xN 💻 Code: bit.ly/3VSXKwT The model achieves 54.3% pass@1 on HumanEval 📊 while matching performance of diffusion models up to 7B parameters. Fully reproducible training pipeline included—from TPU pre-training through instruction tuning. Key features: ➡️ Discrete diffusion processes enable understanding of both past and future tokens ➡️ Confidence-guided sampling maintains competitive inference latency ⚡ ➡️ Complete reproducible training pipeline from pre-training to fine-tuning 🔬 #FutureOfAI #EnterpriseAI #DiffusionModels #OpenScience
Salesforce AI Research tweet media
English
1
3
16
2.3K
Haolin Chen retweetledi
Weiran Yao
Weiran Yao@iscreamnearby·
Today my team at @SFResearch drops CoDA-1.7B: a text diffusion coding model that outputs tokens bidirectionally in parallel. ⚡️ Faster inference, 1.7B rivaling 7B. 📊 54.3% HumanEval | 47.6% HumanEval+ | 55.4% EvalPlus 🤗HF: huggingface.co/Salesforce/CoD… Any questions, lmk!
Weiran Yao tweet media
English
11
58
337
29.4K
Haolin Chen
Haolin Chen@HaolinChen11·
Today, Salesforce 𝗔𝗜 𝗥𝗲𝘀𝗲𝗮𝗿𝗰𝗵 is releasing 𝗖𝗼𝗗𝗔-𝟭.𝟳𝗕—a diffusion-based language model that rethinks how AI understands code. The 𝗯𝗿𝗲𝗮𝗸𝘁𝗵𝗿𝗼𝘂𝗴𝗵? While most models generate code left-to-right, token by token, CoDA reads and generates in both directions simultaneously and outputs 𝘁𝗼𝗸𝗲𝗻𝘀 𝘁𝗼𝗴𝗲𝘁𝗵𝗲𝗿 𝗶𝗻 𝗽𝗮𝗿𝗮𝗹𝗹𝗲𝗹. The result? 𝗗𝗿𝗮𝗺𝗮𝘁𝗶𝗰𝗮𝗹𝗹𝘆 𝗳𝗮𝘀𝘁𝗲𝗿 𝗶𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲. 📊 The numbers: • 𝟱𝟰.𝟯% 𝗼𝗻 𝗛𝘂𝗺𝗮𝗻𝗘𝘃𝗮𝗹 (matching 7B models at 1.7B parameters) • 𝟰𝟳.𝟲% 𝗼𝗻 𝗛𝘂𝗺𝗮𝗻𝗘𝘃𝗮𝗹+ • 𝟱𝟱.𝟰% 𝗼𝗻 𝗘𝘃𝗮𝗹𝗣𝗹𝘂𝘀 𝗮𝗴𝗴𝗿𝗲𝗴𝗮𝘁𝗲 1.7B parameters doing the work of 7B. That's the efficiency gain. 💡 What makes it different: ✓ Bidirectional context through discrete diffusion ✓ Confidence-guided sampling for faster inference ✓ Complete open training pipeline (pre-training → fine-tuning → deployment) ✓ Runs on accessible hardware 🔓 Fully open. Code, weights, pre-training / post-training recipes—everything you need to reproduce, customize, or build on top of CoDA. OpenAI-Compatible endpoints are provided in Github. → 🤗 𝗛𝘂𝗴𝗴𝗶𝗻𝗴 𝗙𝗮𝗰𝗲: huggingface.co/Salesforce/CoD… → 🤖 𝗚𝗶𝘁𝗛𝘂𝗯: github.com/SalesforceAIRe… → 📑 𝗧𝗲𝗰𝗵 𝗥𝗲𝗽𝗼𝗿𝘁: github.com/SalesforceAIRe… Faster models. Big ideas. Open science. Let's see what you build with it. 🚀 #MachineLearning #AI #OpenSource #CodeGeneration #Research
Haolin Chen tweet media
English
0
0
2
127
Haolin Chen retweetledi
Zuxin Liu
Zuxin Liu@LiuZuxin·
Agents shouldn’t guess what you want—they should ask when necessary. With UserRL, we built user-centric RL envs so agents clarify first, act second. After RL, even 4B models reliably infer preferences. We trained agents to understand you. Code/envs/data are open—Check it out! 🚀
Cheng Qian@qiancheng1231

🚀 Introducing UserRL: a new framework to train agents that truly assist users through proactive interaction, not just chase static benchmarking scores. 📄 Paper: arxiv.org/pdf/2509.19736 💻 Code: github.com/SalesforceAIRe…

English
1
17
68
7.2K
Haolin Chen retweetledi
Salesforce AI Research
Salesforce AI Research@SFResearch·
⚡ Introducing MCPEval: the first automated evaluation framework for AI agents built on Model Context Protocol: 🔗 Paper: bit.ly/3TKXpLR 🔗 Code: bit.ly/44ZnUSN ✅ End-to-end task generation & verification ✅ Deep evaluation across 5 real-world domains ✅ Standardized metrics for reproducible research ✅ Open-source & eliminates manual bottlenecks Our evaluation of 10+ models (GPT-4o, O3, Qwen3, etc.) reveals surprising insights: smaller tool-enhanced models can match larger ones in specific domains! Perfect for researchers & developers building reliable AI agents. #AIAgents #FutureOfAI #EnterpriseAI
English
1
5
26
3.3K
Haolin Chen retweetledi
Zuxin Liu
Zuxin Liu@LiuZuxin·
We just released 5k high quality multi turn agent data synthesized by our pipeline. We have done human examinations on a subset (200) of them and more than 99% are deemed clearly understandable and successful. Check it out!
Salesforce AI Research@SFResearch

Introducing APIGen-MT: Our agentic pipeline for multi-turn synthetic data generation that produces high-quality training data for tuning AI agents! Try our open-sourced dataset today! 📊 Paper: bit.ly/44tORzx 🤗 Dataset: bit.ly/3GHuQM5 We used APIGen-MT to train our xLAM-2 model family, including xLAM-2-70b-fc-r — still #1 on the BFCL leaderboard with 78.2% accuracy, outperforming frontier models like GPT-4o and Claude 3.5 in function-calling tasks —especially in challenging multi-turn scenarios. 🤝 We're open-sourcing 5K high-quality trajectories and trained models to advance AI agent research. 🧠 xLAM Model Family: bit.ly/4jyj2tu 🔍 BFCL: bit.ly/3WIZdY3

English
3
4
37
6.1K
Haolin Chen
Haolin Chen@HaolinChen11·
Come and join us at NeurIPS!
Salesforce AI Research@SFResearch

🇨🇦🇨🇦🇨🇦 Welcome to Vancouver! 🇨🇦🇨🇦🇨🇦 13 Paper links below! 👇 The @Salesforce AI Research team brought a baker's dozen AI Research advancements to #NeurIPS2024 this year -- from revolutionizing multimodal agents and time series forecasting to tackling responsible AI evaluation and deployment! 🎯 Attending? Follow us for poster sessions & presentation schedules! 📚 Can't make it? We've curated our complete research collection being showcased this week—bookmark and dive into the work that interests you most! ---- ⭐ Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models 📄 arxiv.org/pdf/2406.14852 ⭐ INDICT: Code Generation with Internal Dialogues of Critiques for Both Security and Helpfulness 📄 arxiv.org/pdf/2407.02518 ⭐ MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens 📄 arxiv.org/pdf/2406.11271 ⭐ APIGen: An Automated Pipeline for Generating Verifiable and Diversity Function-Calling Datasets 📄 arxiv.org/pdf/2406.18518 ⭐ Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference 📄 openreview.net/pdf?id=C2xCLze… ⭐ Online Iterative Reinforcement Learning from Human Feedback with General Preference Model 📄 bit.ly/3ZuDC5N ⭐ ThinK: Thinner Key Cache by Query-Driven Pruning 📄 arxiv.org/pdf/2407.21018 ⭐Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts 📄 arxiv.org/pdf/2410.10469 ⭐ GIFT-Eval: A Benchmark For General Time Series Forecasting Model Evaluation 📄 arxiv.org/pdf/2410.10393 ⭐ UniTST: Effectively Modeling Inter-Series and Intra-Series Dependencies for Multivariate Time Series Forecasting 📄 arxiv.org/pdf/2406.04975 ⭐ Consent in Crisis: The Rapid Decline of the AI Data Commons 📄 arxiv.org/pdf/2407.14933 ⭐ OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments 📄 arxiv.org/pdf/2404.07972 ⭐ Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? 📄 arxiv.org/pdf/2407.10956

English
0
0
3
304