verl project

110 posts

verl project

verl project

@verl_project

Open RL library for LLMs. https://t.co/Xpaq0thhgi Join us on https://t.co/uWI5Zbd6IH

Katılım Ocak 2025
7 Takip Edilen1.3K Takipçiler
verl project retweetledi
Macaron Official
Macaron Official@Macaron0fficial·
Mind Lab, together with our community partners NVIDIA @Nvidia @NVIDIAAIDev, VeRL @verl_project, and vLLM @vllm_project, is making LoRA RL practical at trillion-parameter scale. We mapped the failure modes of LoRA RL on trillion-parameter MoE reasoning models and built a hybrid-parallel engine—VeRL for RL orchestration + NVIDIA NeMo’s Megatron-Bridge for MoE parallelism + vLLM for efficient rollouts/inference—that enables stable, cost-efficient RL at ~10% GPU usage. With comparable RL compute, trillion-parameter LoRA RL beats full-parameter RL on smaller models—underscoring that strong priors are crucial for reasoning-level RL. Read the full post: macaron.im/mindlab/resear…
English
2
13
29
3.5K
verl project retweetledi
Macaron Official
Macaron Official@Macaron0fficial·
And because we want this capability to scale beyond any one lab, we’re contributing the system back to the ecosystem through major open-source collaborations with @nvidia Megatron-Bridge and Volcengine’s verl. Why RL on trillion-parameter models? Our experiments show a consistent pattern: RL is prior-limited. Under matched RL FLOPs, “large prior + small LoRA” outperforms full-parameter RL on small models (1.5B) on AIME and GPQA. A strong prior generates higher-quality trajectories; RL amplifies signal, not noise. This is why trillion-scale LoRA RL is not indulgence; it is efficiency.
Macaron Official tweet media
English
1
2
5
534
verl project
verl project@verl_project·
verl highlighted as one of the top 10 open source AI infra projects. Thank you @github for supporting the oss projects and communities #octoverse2025
verl project tweet media
Ripu@Ripuhiring

GitHub #Octoverse2025: A New Era for Developers & AI! : must-read for anyone in tech, recruitment, or product strategy GitHub’s latest Octoverse report is a goldmine of insights into how software development is evolving—and it’s happening fast! github.blog/news-insights/…

English
0
0
9
729
verl project retweetledi
Lightning AI ⚡️
Lightning AI ⚡️@LightningAI·
Reinforcement learning is becoming essential for training coding agents beyond what supervised fine-tuning can do. This template uses @verl_project from the @BytedanceTalk Seed team to train LLMs with RL, the flexible architecture that scales from single GPU to distributed setups. Wrap Python sandboxes as tools, run multi-turn ReAct loops and train LLMs end-to-end with PPO. Train your own coding agent using this tutorial + notebook to build your first RL-powered coding agent → go.lightning.ai/4qsCdZQ
Lightning AI ⚡️ tweet media
English
2
3
25
2.4K
verl project retweetledi
Ning Ding
Ning Ding@stingning·
𝗣𝗵𝘆𝘀𝗶𝗰𝘀 is one of the sharpest testbeds when reasoning models move from puzzle-solving to realistic scientific reasoning. We have trained 𝗣𝟭 – a serie of models that achieved 𝗴𝗼𝗹𝗱 𝗺𝗲𝗱𝗮𝗹-𝗹𝗲𝘃𝗲𝗹 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 in the Physics Olympiad.🥇
Ning Ding tweet media
English
12
49
503
55.6K
verl project retweetledi
Pokee AI
Pokee AI@Pokee_AI·
Open. Source. SOTA. Deep. Research. 🚀 Today, we’re releasing PokeeResearch-7B, a SOTA open-source deep research agent that outperforms all other 7B deep research agents. And, we are open-sourcing both the weights and inference code on @huggingface! We're additionally excited to have partnered with vLLM @vllm_project, SGLang @sgl_project, and verl @verl_project on training and inference pipelines. We can’t wait to see what you all build with PokeeResearch! 🤗• Hugging Face Model: huggingface.co/PokeeAI/pokee_… 🌐• Webpage: pokee.ai/deepresearch-p… 📋• ArXiv: arxiv.org/pdf/2510.15862 🔗• Github Repo: github.com/Pokee-AI/Pokee…
English
60
188
1.1K
583.6K
verl project retweetledi
Yotta Labs
Yotta Labs@YottaLabs·
We pushed Reinforcement Learning to new speeds on @AMD's MI300X 🔥 By tuning 3D parallelism with Verl, we hit: ⚡ 4.3× faster rollout 🧠 72% shorter training 💾 192GB HBM3 = zero bottlenecks Smarter orchestration > more GPUs. Read the full study here: yottalabs.ai/post/performan…
Yotta Labs tweet media
English
0
3
8
1.5K
verl project retweetledi
Shijie Xia
Shijie Xia@ShijieX60925·
🔥 Announcing our new paper: "SR-Scientist: Scientific Equation Discovery With Agentic AI" Most current work using LLMs for scientific discovery, like AlphaEvolve, follows a rigid "generate → evaluate → refine" loop. We challenge this paradigm for equation discovery. Our work, SR-Scientist, empowers an LLM to act as an autonomous agent, discovering scientific equations through long-horizon, tool-driven data analysis and equation evaluation—much like a human scientist. We further enhance its capabilities with multi-turn RL. 📈 Key Results: 1️⃣Consistently outperforms SOTA methods by a 6% to 35% absolute margin. 2️⃣Achieves significant performance gains after RL training. 3️⃣Demonstrates robustness to noise and generalization to out-of-domain data. 💡 Key Insights: 1️⃣ Long-horizon exploration is vital for performance. 2️⃣ Enabling agents to conduct their own data analysis is crucial. 3️⃣ An experience buffer is key for continuous optimization. 📄 Paper: arxiv.org/abs/2510.11661 💻 Code: github.com/GAIR-NLP/SR-Sc…
Shijie Xia tweet mediaShijie Xia tweet media
English
5
29
99
17.3K
verl project retweetledi
Infini-AI-Lab
Infini-AI-Lab@InfiniAILab·
🤔Can we train RL on LLMs with extremely stale data? 🚀Our latest study says YES! Stale data can be as informative as on-policy data, unlocking more scalable, efficient asynchronous RL for LLMs. We introduce M2PO, an off-policy RL algorithm that keeps training stable and performant even when using data stale by 256 model updates. 🔗 Notion Blog: m2po.notion.site/rl-stale-m2po 📄 Paper: arxiv.org/abs/2510.01161 💻 GitHub: github.com/Infini-AI-Lab/… 🧵 1/4
Infini-AI-Lab tweet media
English
3
39
233
62.6K
verl project retweetledi
Shirley Wu
Shirley Wu@ShirleyYXWu·
With the help from Bytedance @verl_project team, we have integrated collabllm as a recipe to veRL, a to-be-most-popular open RL library for LLMs. Now you are only one step away from making your LLM a great collaborator in multiturn conversations. verl.readthedocs.io/en/latest/algo…
Shirley Wu tweet media
Shirley Wu@ShirleyYXWu

CollabLLM won #ICML2025 ✨Outstanding Paper Award along with 6 other works! icml.cc/virtual/2025/a… 🫂 Absolutey honored and grateful for coauthors @MSFTResearch @StanfordAILab and friends who made this happen! 🗣️ Welcome people to our presentations about CollabLLM tomorrow (Tuesday): - Oral 1A (icml.cc/virtual/2025/s…) - Poster Session 1 East (icml.cc/virtual/2025/s…) - Multiagent Social (icml.cc/virtual/2025/4…) Please check out: Website: aka.ms/CollabLLM Github: github.com/Wuyxin/collabl… Paper: arxiv.org/pdf/2502.00640 Blog: #blog" target="_blank" rel="nofollow noopener">wuyxin.github.io/collabllm/#blog

English
0
11
127
17.7K
verl project
verl project@verl_project·
Join us to discuss RL and Agentic AI with core contributors and developers from @verl_project, @sgl_project, Zilliz, and Creao AI, alongside researchers from OpenAI, Anthropic, MSL, Bytedance Seed and xAI 9.13@SJC lu.ma/bl21t8q4
English
0
0
6
779
verl project retweetledi
SkyPilot
SkyPilot@skypilot_org·
VeRL now officially supports launching via SkyPilot! Let SkyPilot deal with infra heavylifting for @verl_project: 🚀Spin up VeRL workers on your k8s or clouds 🔧Set up ray 🤖Ignite your agentic RL training Check out the VeRL doc: #step-1-setup-skypilot" target="_blank" rel="nofollow noopener">verl.readthedocs.io/en/latest/star…
SkyPilot tweet media
English
1
5
16
3K
verl project retweetledi
Chujie Zheng
Chujie Zheng@ChujieZheng·
Proud to introduce Group Sequence Policy Optimization (GSPO), our stable, efficient, and performant RL algorithm that powers the large-scale RL training of the latest Qwen3 models (Instruct, Coder, Thinking) 🚀 📄 huggingface.co/papers/2507.18…
Chujie Zheng tweet media
English
32
244
1.7K
325.1K
verl project retweetledi
verl project retweetledi
girish
girish@googrish·
To push the open source frontier for RL + LLMs, we need scalable, modular environments with real-world complexity, beyond math benchmarks. Today, we’re releasing *benchmax*. An open-source framework to build, run, & scale useful RL envs for LLM fine-tuning, with integrations to verl & verifiers (more coming soon!).
girish tweet media
English
4
24
87
16.1K