verl project

110 posts

verl project

@verl_project

Open RL library for LLMs. https://t.co/Xpaq0thhgi Join us on https://t.co/uWI5Zbd6IH

Katılım Ocak 2025

7 Takip Edilen1.3K Takipçiler

verl project retweetledi

Macaron Official@Macaron0fficial·18 Ara

Mind Lab, together with our community partners NVIDIA @Nvidia @NVIDIAAIDev, VeRL @verl_project, and vLLM @vllm_project, is making LoRA RL practical at trillion-parameter scale. We mapped the failure modes of LoRA RL on trillion-parameter MoE reasoning models and built a hybrid-parallel engine—VeRL for RL orchestration + NVIDIA NeMo’s Megatron-Bridge for MoE parallelism + vLLM for efficient rollouts/inference—that enables stable, cost-efficient RL at ~10% GPU usage. With comparable RL compute, trillion-parameter LoRA RL beats full-parameter RL on smaller models—underscoring that strong priors are crucial for reasoning-level RL. Read the full post: macaron.im/mindlab/resear…

English

3.5K

verl project retweetledi

Macaron Official@Macaron0fficial·10 Ara

And because we want this capability to scale beyond any one lab, we’re contributing the system back to the ecosystem through major open-source collaborations with @nvidia Megatron-Bridge and Volcengine’s verl. Why RL on trillion-parameter models? Our experiments show a consistent pattern: RL is prior-limited. Under matched RL FLOPs, “large prior + small LoRA” outperforms full-parameter RL on small models (1.5B) on AIME and GPQA. A strong prior generates higher-quality trajectories; RL amplifies signal, not noise. This is why trillion-scale LoRA RL is not indulgence; it is efficiency.

English

534

verl project@verl_project·30 Eki

verl highlighted as one of the top 10 open source AI infra projects. Thank you @github for supporting the oss projects and communities #octoverse2025

Ripu@Ripuhiring

GitHub #Octoverse2025: A New Era for Developers & AI! : must-read for anyone in tech, recruitment, or product strategy GitHub’s latest Octoverse report is a goldmine of insights into how software development is evolving—and it’s happening fast! github.blog/news-insights/…

English

729

verl project retweetledi

Lightning AI ⚡️@LightningAI·27 Eki

Reinforcement learning is becoming essential for training coding agents beyond what supervised fine-tuning can do. This template uses @verl_project from the @BytedanceTalk Seed team to train LLMs with RL, the flexible architecture that scales from single GPU to distributed setups. Wrap Python sandboxes as tools, run multi-turn ReAct loops and train LLMs end-to-end with PPO. Train your own coding agent using this tutorial + notebook to build your first RL-powered coding agent → go.lightning.ai/4qsCdZQ

English

2.4K

verl project retweetledi

Ning Ding@stingning·24 Eki

𝗣𝗵𝘆𝘀𝗶𝗰𝘀 is one of the sharpest testbeds when reasoning models move from puzzle-solving to realistic scientific reasoning. We have trained 𝗣𝟭 – a serie of models that achieved 𝗴𝗼𝗹𝗱 𝗺𝗲𝗱𝗮𝗹-𝗹𝗲𝘃𝗲𝗹 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 in the Physics Olympiad.🥇

English

503

55.6K

verl project retweetledi

Pokee AI@Pokee_AI·22 Eki

Open. Source. SOTA. Deep. Research. 🚀 Today, we’re releasing PokeeResearch-7B, a SOTA open-source deep research agent that outperforms all other 7B deep research agents. And, we are open-sourcing both the weights and inference code on @huggingface! We're additionally excited to have partnered with vLLM @vllm_project, SGLang @sgl_project, and verl @verl_project on training and inference pipelines. We can’t wait to see what you all build with PokeeResearch! 🤗• Hugging Face Model: huggingface.co/PokeeAI/pokee_… 🌐• Webpage: pokee.ai/deepresearch-p… 📋• ArXiv: arxiv.org/pdf/2510.15862 🔗• Github Repo: github.com/Pokee-AI/Pokee…

English

188

1.1K

583.6K

verl project@verl_project·22 Eki

Get started agentic RL with verl, checkout tutorial @LightningAI lightning.ai/lightning-ai/e…

English

735

verl project retweetledi

Yotta Labs@YottaLabs·16 Eki

We pushed Reinforcement Learning to new speeds on @AMD's MI300X 🔥 By tuning 3D parallelism with Verl, we hit: ⚡ 4.3× faster rollout 🧠 72% shorter training 💾 192GB HBM3 = zero bottlenecks Smarter orchestration > more GPUs. Read the full study here: yottalabs.ai/post/performan…

English

1.5K

verl project retweetledi

Shijie Xia@ShijieX60925·14 Eki

🔥 Announcing our new paper: "SR-Scientist: Scientific Equation Discovery With Agentic AI" Most current work using LLMs for scientific discovery, like AlphaEvolve, follows a rigid "generate → evaluate → refine" loop. We challenge this paradigm for equation discovery. Our work, SR-Scientist, empowers an LLM to act as an autonomous agent, discovering scientific equations through long-horizon, tool-driven data analysis and equation evaluation—much like a human scientist. We further enhance its capabilities with multi-turn RL. 📈 Key Results: 1️⃣Consistently outperforms SOTA methods by a 6% to 35% absolute margin. 2️⃣Achieves significant performance gains after RL training. 3️⃣Demonstrates robustness to noise and generalization to out-of-domain data. 💡 Key Insights: 1️⃣ Long-horizon exploration is vital for performance. 2️⃣ Enabling agents to conduct their own data analysis is crucial. 3️⃣ An experience buffer is key for continuous optimization. 📄 Paper: arxiv.org/abs/2510.11661 💻 Code: github.com/GAIR-NLP/SR-Sc…

English

17.3K

verl project@verl_project·14 Eki

Agent training using RL, on any compute infrastructure!

SkyPilot@skypilot_org

How to train an AI agent -- not just prompt one? 🤖 We just dropped a deep dive on building agents: 🌋Train with RL using @verl_project 🖥️Monitor runs with @wandb 🚀Run & scale training on any AI compute (k8s or clouds) with SkyPilot. blog.skypilot.co/verl-rl-traini…

English

1.3K

verl project retweetledi

Infini-AI-Lab@InfiniAILab·7 Eki

🤔Can we train RL on LLMs with extremely stale data? 🚀Our latest study says YES! Stale data can be as informative as on-policy data, unlocking more scalable, efficient asynchronous RL for LLMs. We introduce M2PO, an off-policy RL algorithm that keeps training stable and performant even when using data stale by 256 model updates. 🔗 Notion Blog: m2po.notion.site/rl-stale-m2po 📄 Paper: arxiv.org/abs/2510.01161 💻 GitHub: github.com/Infini-AI-Lab/… 🧵 1/4

English

233

62.6K

verl project retweetledi

Shirley Wu@ShirleyYXWu·25 Eyl

With the help from Bytedance @verl_project team, we have integrated collabllm as a recipe to veRL, a to-be-most-popular open RL library for LLMs. Now you are only one step away from making your LLM a great collaborator in multiturn conversations. verl.readthedocs.io/en/latest/algo…

Shirley Wu@ShirleyYXWu

CollabLLM won #ICML2025 ✨Outstanding Paper Award along with 6 other works! icml.cc/virtual/2025/a… 🫂 Absolutey honored and grateful for coauthors @MSFTResearch @StanfordAILab and friends who made this happen! 🗣️ Welcome people to our presentations about CollabLLM tomorrow (Tuesday): - Oral 1A (icml.cc/virtual/2025/s…) - Poster Session 1 East (icml.cc/virtual/2025/s…) - Multiagent Social (icml.cc/virtual/2025/4…) Please check out: Website: aka.ms/CollabLLM Github: github.com/Wuyxin/collabl… Paper: arxiv.org/pdf/2502.00640 Blog: #blog" target="_blank" rel="nofollow noopener">wuyxin.github.io/collabllm/#blog

English

127

17.7K

verl project@verl_project·18 Eyl

verl deployed in real world RL systems 🚀🚀🚀

Amazon Science@AmazonScience

At Amazon's AGI Lab, XJ Wang's Reinforcement Learning Pod team trains AI agents using RL gyms for real-world applications. His team handles everything from model training to product integration across Amazon. Learn more:

English

1.5K

verl project@verl_project·22 Ağu

Join us to discuss RL and Agentic AI with core contributors and developers from @verl_project, @sgl_project, Zilliz, and Creao AI, alongside researchers from OpenAI, Anthropic, MSL, Bytedance Seed and xAI 9.13@SJC lu.ma/bl21t8q4

English

779

verl project@verl_project·9 Ağu

Slides available: github.com/eric-haibin-li…

PyTorch@PyTorch

Learn how #verl simplifies reinforcement learning for advanced #LLM reasoning and tool use. Our Aug 6 webinar with Haibin Lin of ByteDance covers PPO/GRPO/DAPO, async rollout, MoE expert parallelism & more. 🔗 hubs.la/Q03BrqFC0 #PyTorch #ReinforcementLearning #OpenSourceAI

English

933

verl project@verl_project·2 Ağu

Nice OpenAI gym-like environment interface, compatible with verl

Zichen Liu@zzlccc

In the era of experience, we're training LLM agents with RL — but something's missing... We miss the good old Gym! So we built 💎GEM: a suite of environments for training LLM 𝚐𝚎𝚗𝚎𝚛𝚊𝚕𝚒𝚜𝚝𝚜. Let’s build the Gym for LLMs, together: axon-rl.notion.site/gem

English

1.8K

verl project retweetledi

SkyPilot@skypilot_org·1 Ağu

VeRL now officially supports launching via SkyPilot! Let SkyPilot deal with infra heavylifting for @verl_project: 🚀Spin up VeRL workers on your k8s or clouds 🔧Set up ray 🤖Ignite your agentic RL training Check out the VeRL doc: #step-1-setup-skypilot" target="_blank" rel="nofollow noopener">verl.readthedocs.io/en/latest/star…

English

verl project retweetledi

Chujie Zheng@ChujieZheng·25 Tem

Proud to introduce Group Sequence Policy Optimization (GSPO), our stable, efficient, and performant RL algorithm that powers the large-scale RL training of the latest Qwen3 models (Instruct, Coder, Thinking) 🚀 📄 huggingface.co/papers/2507.18…

English

244

1.7K

325.1K

verl project retweetledi

Chujie Zheng@ChujieZheng·31 Tem

GSPO has been integrated into @verl_project (github.com/volcengine/ver…) and TRL (github.com/huggingface/tr…). Thanks for the prompt support from the community 🚀

Chujie Zheng@ChujieZheng

English

200

14.4K

verl project retweetledi

girish@googrish·29 Tem

To push the open source frontier for RL + LLMs, we need scalable, modular environments with real-world complexity, beyond math benchmarks. Today, we’re releasing *benchmax*. An open-source framework to build, run, & scale useful RL envs for LLM fine-tuning, with integrations to verl & verifiers (more coming soon!).

English

16.1K

Keşfet

@Nvidia @NVIDIAAIDev @vllm_project @nvidia @github @BytedanceTalk @huggingface @sgl_project