Haolin Chen

34 posts

Haolin Chen

@HaolinChen11

Salesforce AI Research @SFResearch, Ph.D. in Applied Mathematics @ucdavis

Palo Alto, CA Katılım Mayıs 2022

101 Takip Edilen136 Takipçiler

Haolin Chen retweetledi

AK@_akhaliq·25 Kas

xRouter Training Cost-Aware LLMs Orchestration System via Reinforcement Learning huggingface.co/Salesforce/xRo…

English

118

20.6K

Haolin Chen@HaolinChen11·31 Eki

One question jumps into my mind - can we use RL to make LLM learn binary search? In figure 9 a binary search algorithm is abstracted into 3 actions: observe, look_up_pos, and done; If an agent is only exposed to the actions, can we design a reward/training algo such that the original binary search is recovered as the (most probable) agent trajectory?

Weihua Du@StigLidu

How can we boost LLM agents’ generalizability to OOD tasks and environments? Check out CodeGym, our new project for synthesizing environments for LLM agent RL training. CodeGym is a synthetic environment generation framework for reinforcement learning on multi-turn tool-use tasks. It automatically converts static coding problems into interactive and verifiable RL training environments. Training in CodeGym leads to strong OOD generalization — for example, a Qwen2.5-32B-Instruct model achieved an 8.7-point absolute accuracy gain on τ-Bench! We’ve just released the paper, synthesis pipeline, and dataset: 📄 Paper: arxiv.org/abs/2509.17325 💻 Project: github.com/StigLidu/CodeG… 📊 Dataset: huggingface.co/datasets/Vanis… 📷 More details in the thread👇

English

118

Haolin Chen retweetledi

Weiran Yao@iscreamnearby·25 Eki

Stop restarting your long-running agents. Enterprise Deep Research (EDR) lets you steer mid-run—like driving a car. It can save you hours or even days of work. Open-source, enterprise-ready, built by @SFResearch. Try it & drop your use case below 👇 🤖GitHub: github.com/SalesforceAIRe…

English

9.2K

Haolin Chen@HaolinChen11·23 Eki

Absolutely absurd

Yuandong Tian@tydsh

Several of my team members + myself are impacted by this layoff today. Welcome to connect :)

English

286

Haolin Chen retweetledi

Zuxin Liu@LiuZuxin·23 Eki

Unbelievable 🙂

Yuandong Tian@tydsh

Several of my team members + myself are impacted by this layoff today. Welcome to connect :)

English

5.9K

Haolin Chen retweetledi

Caiming Xiong@CaimingXiong·17 Eki

🎇🎇🎇Our Webscale dataset is currently the No. 1 trending dataset on @huggingface. huggingface.co/datasets/Sales… @ZhepengCen @iscreamnearby @SFResearch

Caiming Xiong@CaimingXiong

🚀🚀🚀 Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels We at @SFResearch build an automated pipeline that converts raw web text into verifiable QA pairs, filtered and verified by LLMs, then use Group Relative Policy Optimization (GRPO) to train models directly on this reward-driven data. The result: models trained on Webscale-RL outperform continual pretraining and data-refinement baselines — while using up to 100× fewer tokens. The gains are most pronounced in reasoning, math, and factual QA tasks. Beyond benchmarks, the key shift is conceptual: RL is no longer just a post-training alignment trick — it’s becoming a core optimization stage inside the LLM pretraining loop. This points toward a future of mid-training RL, where large-scale synthetic or automatically verified datasets provide structured reward signals long before human feedback fine-tuning. 🧩 Webscale-RL hints at a new pretraining paradigm — one that learns not just from text, but from reward. Paper: bit.ly/3IFuMhf Code: bit.ly/42AVpdX Data: bit.ly/4h5lVBS

English

20.3K

Haolin Chen@HaolinChen11·12 Eki

The more (RL), the merrier. Let’s scale RL to a scale that it never reached before! It is a pleasure to collaborate with @ZhepengCen. Let’s roll it 🚀🚀🚀🔥🔥🔥

Weiran Yao@iscreamnearby

This signals a training paradigm shift —Compared with continual pretraining, turning the pretraining text into RL tasks is a more effective approach with up to 100x token savings! Breakthrough work led by @ZhepengCen

English

138

Haolin Chen@HaolinChen11·12 Eki

Let us know any questions you have!

Weiran Yao@iscreamnearby

Today, my team at @SFResearch released Webscale-RL — a data-synthesis pipeline + 1.1M RL tasks that turn any web text into RL environments. 🎯 Same performance as pretraining using only 1% of tokens. 100x cost savings! HF🤗: huggingface.co/datasets/Sales… Any questions, let us know!

English

105

Haolin Chen@HaolinChen11·11 Eki

Checkout ours latest work of RL data pipeline 👀

Salesforce AI Research@SFResearch

📣 Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels 📣 RL for LLMs faces a critical data bottleneck: existing RL datasets are <10B tokens while pretraining uses >1T tokens. Our Webscale-RL pipeline solves this by automatically converting pretraining documents into 1.2M verifiable QA pairs across 9+ domains. 📄 Paper: bit.ly/3IFuMhf 💻 Code: bit.ly/42AVpdX 📊 Dataset: bit.ly/4h5lVBS Results: 100× more token-efficient than continual pretraining with significant performance gains on MMLU-pro, BigBench, and mathematical reasoning benchmarks 📈 Work by Zhepeng Cen (@zhepengcen), Haolin Chen (@HaolinChen11), Shiyu Wang (@shiyu04490786), Zuxin Liu (@LiuZuxin), Zhiwei Liu, Ding Zhao, Silvio Savarese, Caiming Xiong (@CaimingXiong), Huan Wang (@huan__wang), Weiran Yao (@iscreamnearby) #FutureOfAI #EnterpriseAI #ReinforcementLearning #MachineLearning

English

209

Haolin Chen@HaolinChen11·11 Eki

Open source the secret sauce here👀

Zhepeng Cen@ZhepengCen

🚀 Scaling RL to Pretraining Levels with Webscale-RL RL for LLMs has been bottlenecked by tiny datasets (<10B tokens) vs pretraining (>1T). Our Webscale-RL pipeline converts pretraining text into diverse RL-ready QA data — scaling RL to pretraining levels! All codes and datasets are open-source! Paper: arxiv.org/abs/2510.06499 ✨ Key features: • Converts web-scale corpus into millions of verifiable QA pairs • Preserves pretraining-level diversity across 9 domains • Trains up to 100× more token-efficient than continual pretraining • Powers the Webscale-RL dataset (1.2 M examples) for scalable RL Also special thanks to my colleagues in Salesforce AI Research @SFResearch! @HaolinChen11, Shiyu, @LiuZuxin, @huan__wang, @CaimingXiong, @iscreamnearby

English

127

Haolin Chen retweetledi

Zhepeng Cen@ZhepengCen·11 Eki

English

233

57.9K

Haolin Chen retweetledi

Salesforce AI Research@SFResearch·7 Eki

CoDA-1.7B: A diffusion language model for code generation with bidirectional context understanding 🔄 📄 Technical Report: bit.ly/3IBlzGG 🤗 Model: bit.ly/48dX1xN 💻 Code: bit.ly/3VSXKwT The model achieves 54.3% pass@1 on HumanEval 📊 while matching performance of diffusion models up to 7B parameters. Fully reproducible training pipeline included—from TPU pre-training through instruction tuning. Key features: ➡️ Discrete diffusion processes enable understanding of both past and future tokens ➡️ Confidence-guided sampling maintains competitive inference latency ⚡ ➡️ Complete reproducible training pipeline from pre-training to fine-tuning 🔬 #FutureOfAI #EnterpriseAI #DiffusionModels #OpenScience

English

2.3K

Haolin Chen retweetledi

Weiran Yao@iscreamnearby·4 Eki

Today my team at @SFResearch drops CoDA-1.7B: a text diffusion coding model that outputs tokens bidirectionally in parallel. ⚡️ Faster inference, 1.7B rivaling 7B. 📊 54.3% HumanEval | 47.6% HumanEval+ | 55.4% EvalPlus 🤗HF: huggingface.co/Salesforce/CoD… Any questions, lmk!

English

337

29.4K

Haolin Chen@HaolinChen11·4 Eki

Today, Salesforce 𝗔𝗜 𝗥𝗲𝘀𝗲𝗮𝗿𝗰𝗵 is releasing 𝗖𝗼𝗗𝗔-𝟭.𝟳𝗕—a diffusion-based language model that rethinks how AI understands code. The 𝗯𝗿𝗲𝗮𝗸𝘁𝗵𝗿𝗼𝘂𝗴𝗵? While most models generate code left-to-right, token by token, CoDA reads and generates in both directions simultaneously and outputs 𝘁𝗼𝗸𝗲𝗻𝘀 𝘁𝗼𝗴𝗲𝘁𝗵𝗲𝗿 𝗶𝗻 𝗽𝗮𝗿𝗮𝗹𝗹𝗲𝗹. The result? 𝗗𝗿𝗮𝗺𝗮𝘁𝗶𝗰𝗮𝗹𝗹𝘆 𝗳𝗮𝘀𝘁𝗲𝗿 𝗶𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲. 📊 The numbers: • 𝟱𝟰.𝟯% 𝗼𝗻 𝗛𝘂𝗺𝗮𝗻𝗘𝘃𝗮𝗹 (matching 7B models at 1.7B parameters) • 𝟰𝟳.𝟲% 𝗼𝗻 𝗛𝘂𝗺𝗮𝗻𝗘𝘃𝗮𝗹+ • 𝟱𝟱.𝟰% 𝗼𝗻 𝗘𝘃𝗮𝗹𝗣𝗹𝘂𝘀 𝗮𝗴𝗴𝗿𝗲𝗴𝗮𝘁𝗲 1.7B parameters doing the work of 7B. That's the efficiency gain. 💡 What makes it different: ✓ Bidirectional context through discrete diffusion ✓ Confidence-guided sampling for faster inference ✓ Complete open training pipeline (pre-training → fine-tuning → deployment) ✓ Runs on accessible hardware 🔓 Fully open. Code, weights, pre-training / post-training recipes—everything you need to reproduce, customize, or build on top of CoDA. OpenAI-Compatible endpoints are provided in Github. → 🤗 𝗛𝘂𝗴𝗴𝗶𝗻𝗴 𝗙𝗮𝗰𝗲: huggingface.co/Salesforce/CoD… → 🤖 𝗚𝗶𝘁𝗛𝘂𝗯: github.com/SalesforceAIRe… → 📑 𝗧𝗲𝗰𝗵 𝗥𝗲𝗽𝗼𝗿𝘁: github.com/SalesforceAIRe… Faster models. Big ideas. Open science. Let's see what you build with it. 🚀 #MachineLearning #AI #OpenSource #CodeGeneration #Research

English

127

Haolin Chen@HaolinChen11·2 Eki

@LiuZuxin @OpenAI Congratulations!!

English

Haolin Chen retweetledi

Zuxin Liu@LiuZuxin·27 Eyl

Agents shouldn’t guess what you want—they should ask when necessary. With UserRL, we built user-centric RL envs so agents clarify first, act second. After RL, even 4B models reliably infer preferences. We trained agents to understand you. Code/envs/data are open—Check it out! 🚀

Cheng Qian@qiancheng1231

🚀 Introducing UserRL: a new framework to train agents that truly assist users through proactive interaction, not just chase static benchmarking scores. 📄 Paper: arxiv.org/pdf/2509.19736 💻 Code: github.com/SalesforceAIRe…

English

7.2K

Haolin Chen retweetledi

Salesforce AI Research@SFResearch·19 Tem

⚡ Introducing MCPEval: the first automated evaluation framework for AI agents built on Model Context Protocol: 🔗 Paper: bit.ly/3TKXpLR 🔗 Code: bit.ly/44ZnUSN ✅ End-to-end task generation & verification ✅ Deep evaluation across 5 real-world domains ✅ Standardized metrics for reproducible research ✅ Open-source & eliminates manual bottlenecks Our evaluation of 10+ models (GPT-4o, O3, Qwen3, etc.) reveals surprising insights: smaller tool-enhanced models can match larger ones in specific domains! Perfect for researchers & developers building reliable AI agents. #AIAgents #FutureOfAI #EnterpriseAI

English

3.3K

Haolin Chen retweetledi

Zuxin Liu@LiuZuxin·9 May

We just released 5k high quality multi turn agent data synthesized by our pipeline. We have done human examinations on a subset (200) of them and more than 99% are deemed clearly understandable and successful. Check it out!

Salesforce AI Research@SFResearch

Introducing APIGen-MT: Our agentic pipeline for multi-turn synthetic data generation that produces high-quality training data for tuning AI agents! Try our open-sourced dataset today! 📊 Paper: bit.ly/44tORzx 🤗 Dataset: bit.ly/3GHuQM5 We used APIGen-MT to train our xLAM-2 model family, including xLAM-2-70b-fc-r — still #1 on the BFCL leaderboard with 78.2% accuracy, outperforming frontier models like GPT-4o and Claude 3.5 in function-calling tasks —especially in challenging multi-turn scenarios. 🤝 We're open-sourcing 5K high-quality trajectories and trained models to advance AI agent research. 🧠 xLAM Model Family: bit.ly/4jyj2tu 🔍 BFCL: bit.ly/3WIZdY3

English

6.1K

Haolin Chen@HaolinChen11·10 Ara

Come and join us at NeurIPS!

Salesforce AI Research@SFResearch

🇨🇦🇨🇦🇨🇦 Welcome to Vancouver! 🇨🇦🇨🇦🇨🇦 13 Paper links below! 👇 The @Salesforce AI Research team brought a baker's dozen AI Research advancements to #NeurIPS2024 this year -- from revolutionizing multimodal agents and time series forecasting to tackling responsible AI evaluation and deployment! 🎯 Attending? Follow us for poster sessions & presentation schedules! 📚 Can't make it? We've curated our complete research collection being showcased this week—bookmark and dive into the work that interests you most! ---- ⭐ Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models 📄 arxiv.org/pdf/2406.14852 ⭐ INDICT: Code Generation with Internal Dialogues of Critiques for Both Security and Helpfulness 📄 arxiv.org/pdf/2407.02518 ⭐ MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens 📄 arxiv.org/pdf/2406.11271 ⭐ APIGen: An Automated Pipeline for Generating Verifiable and Diversity Function-Calling Datasets 📄 arxiv.org/pdf/2406.18518 ⭐ Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference 📄 openreview.net/pdf?id=C2xCLze… ⭐ Online Iterative Reinforcement Learning from Human Feedback with General Preference Model 📄 bit.ly/3ZuDC5N ⭐ ThinK: Thinner Key Cache by Query-Driven Pruning 📄 arxiv.org/pdf/2407.21018 ⭐Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts 📄 arxiv.org/pdf/2410.10469 ⭐ GIFT-Eval: A Benchmark For General Time Series Forecasting Model Evaluation 📄 arxiv.org/pdf/2410.10393 ⭐ UniTST: Effectively Modeling Inter-Series and Intra-Series Dependencies for Multivariate Time Series Forecasting 📄 arxiv.org/pdf/2406.04975 ⭐ Consent in Crisis: The Rapid Decline of the AI Data Commons 📄 arxiv.org/pdf/2407.14933 ⭐ OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments 📄 arxiv.org/pdf/2404.07972 ⭐ Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? 📄 arxiv.org/pdf/2407.10956

English

304

Keşfet

@SFResearch @huggingface @ZhepengCen @iscreamnearby @LiuZuxin @huan__wang @CaimingXiong @OpenAI