slime

90 posts

slime

@slime_framework

The LLM post-training framework for RL Scaling. https://t.co/4ILpx8hfKN

Katılım Eylül 2025

12 Takip Edilen2K Takipçiler

Sabitlenmiş Tweet

slime@slime_framework·17 Haz

Thrilled to share the release of GLM-5.2 🎉 It is the result of tremendous effort from the whole team, and our best model so far — stronger coding, usable 1M context, and further progress on long-horizon / agentic tasks. For slime, we are honored to once again support RL and OPD at a new scale behind GLM-5.2. As the official post says, intelligence should be open, accessible, and ready to build with. We believe post-training capabilities should be the same. That’s why slime merged GLM-5.2 support on day 0, so the community can more easily explore, reproduce, and build with GLM-5.2. PR: github.com/THUDM/slime/pu…

Z.ai@Zai_org

Introducing GLM-5.2: Frontier Intelligence, Open Weights - Significant improvements in coding and agentic tasks - Strong long-horizon capabilities with a 1M context window - Two levels of reasoning effort: GLM-5.2 (max) pushes the limits, while GLM-5.2 (high) strikes a strong balance between performance and token efficiency - MIT-licensed open weights - Same API pricing as GLM-5.1 Tech Blog: z.ai/blog/glm-5.2 Weights: huggingface.co/zai-org/GLM-5.2 API: docs.z.ai/guides/llm/glm… Coding Plan: z.ai/subscribe Chat: chat.z.ai

English

119

8.1K

slime retweetledi

LMSYS Org@lmsysorg·11h

Serving GLM5.2 NVFP4 Agentic Workload with SGLang: How We Reached 500 TPS on 8xB300 at bs=1 In this deep dive, we explain how SGLang reaches 500+ tok/s/user at bs=1 on 8xB300, with 18 to 34% higher single-user interactivity within two weeks since day-0, and 6 to 11% better peak throughput at high concurrency, benchmarked on a real multi-turn agentic coding workload. Our new TopK-V2 kernel is 2.33x faster at 80K ISL, scaling to 10.17x at 1M ISL, keeping interactivity essentially flat out to 1M tokens. Part of the story is the architecture itself. GLM-5.2 applies IndexShare to its DSA layers and ships a stronger MTP head reusing IndexShare and KVShare. The rest comes from our serving optimizations. Special thanks to @NVIDIAAI for the help in day-0 support of GLM-5.2 NVFP4, and to @Zai_org for IndexShare in SGLang!

English

101

49.6K

slime@slime_framework·2d

@BanghuaZ ❤️

QME

Banghua Zhu@BanghuaZ·3d

@slime_framework 🫡🫡

QME

332

slime@slime_framework·3d

Amazing work by the Miles team and Humans& team! Congrats!

Banghua Zhu@BanghuaZ

Congrats @humansand on native NVFP4 training recipe, landing in Miles!

English

3.6K

slime retweetledi

SGLang@sgl_project·4d

🎉 SGLang v0.5.15 is out! We spent this cycle tuning GLM-5.2 NVFP4 for production serving, now hitting 500+ tok/s/user on 8x B300 and 450 on 4x GB300 (bs=1). We will put commands to run this at the thread below, and full technical details and instructions on a blog very soon 🫡 And we have some newly supported models: Hunyuan 3 (Hy3), Hierarchical Reasoning Model (HRM-Text), NVIDIA LocateAnything-3B, Baidu Unlimited-OCR, JoyEcho, and Qwen3.6. Here are highlights for this release: - Breakable CUDA Graph is now the default capture path - Native web search built in, powered by @ExaAILabs - Decode context parallelism for MLA models, including DeepSeek V3 - FlashInfer all-to-all for routed MoE - DeepSeek-V4: FlashMLA sparse prefill now on by default (>10% throughput on long context), plus a non-paged indexer for long-context prefill (>5% e2e) We welcomed 43 new contributors, and thanks again for our amazing partners and model makers: @NVIDIAAI @AMD @intel @Zai_org @TencentHunyuan @Alibaba_Qwen @deepseek_ai @Sapient_Int Now. MAX LOAD! MAX OUTPUT! 🚀

English

171

21.5K

slime retweetledi

LMSYS Org@lmsysorg·4d

🚀 New blog: Bringing DeepSeek-V4 Flash RL Training to AMD Instinct MI355X GPUs with Miles DeepSeek-V4 Flash RL now runs end-to-end in Miles on @AMD Instinct MI355X GPUs with ROCm! Together with AMD, we aligned the model behavior across SGLang rollout and Megatron training, validated over 100+ optimizer steps on four MI355X nodes: 1️⃣ Train-rollout log-prob gap bounded at ~0.09 across 100+ steps 2️⃣ AIME-2024 pass@1 rose from 0.39 to 0.49, pass@8 from 0.53 to 0.67 3️⃣ FP8 rollout + BF16 actor with datatype-aware online weight updates 4️⃣ Stable TP1 / PP4 / EP4 layout on ROCm without collective stalls 5️⃣ Hybrid attention, mHC mixing & hash-routed MoE aligned across both engines

English

4.6K

slime@slime_framework·6 Tem

slime now adds --release-train, pushing the inference system during agentic RL training to a new limit. In colocated RL training, we want SGLang to use as much room as possible for inference-side optimizations such as HiCache, instead of being constrained by offloaded Megatron training processes. --release-train makes this possible by releasing the Megatron training process during rollout and reloading it for each training round. This gives SGLang more configuration headroom in colocated RL workloads. PR: github.com/THUDM/slime/pu…

English

5.3K

slime retweetledi

PyTorch@PyTorch·30 Haz

Built on PyTorch, Ray, SGLang, and NVIDIA Megatron-LM, Miles is an open source framework from RadixArk for large-scale LLM reinforcement learning post-training. Miles uses PyTorch for models, numerics, profiling, and extensibility; Ray for orchestration; SGLang for rollout generation; and Megatron-LM for distributed training. The framework supports asynchronous rollout and training, NCCL/RDMA weight synchronization, MoE-aware rollout/training alignment, low-precision recipes, LoRA, fault tolerance, observability, and extension points for custom algorithms and model architectures. 🔗 Read more in our latest blog from the Miles Team: pytorch.org/blog/miles-a-p…

English

167

26.5K

slime@slime_framework·20 Haz

@sbskong Yes — we actually have an upcoming multi-teacher OPD optimization that we’re baking internally, and we’ll open-source it as soon as it’s ready.

English

JD Kim@sbskong·20 Haz

@slime_framework Any plans to add MOPD support to the roadmap too? Would love to see that!

English

slime@slime_framework·19 Haz

Thanks for the support! A small note: slime has supported not only OPD, but the full RL + OPD post-training workflow since GLM-4.5. More to come for scalable agentic RL infra.

Didier Lopes@didier_lopes

Incredible how Z. ai literally has their RL infrastructure open source. The entire OPD post-training of GLM-5.2 took on this slime platform took ~2 days. github.com/THUDM/slime

English

11.1K

slime retweetledi

Artificial Analysis@ArtificialAnlys·17 Haz

Z ai’s GLM-5.2 is the new leading open weights model on the Artificial Analysis Intelligence Index scoring 51 and it sits on the Pareto frontier of Intelligence vs Cost per Task @Zai_org’s GLM-5.2 is the same size as GLM-5.1 (744B total / 40B active parameters) but scores 11 points higher on the Intelligence Index v4.1, placing ahead of MiniMax-M3 (44) and DeepSeek V4 Pro (max, 44). On the first-party API it is priced in line with GLM-5.1 at $1.4/$4.4/$0.26 per 1M input/output/cache hit tokens Key results: ➤ GLM-5.2 is the leading open weights model on the Intelligence Index v4.1. At 51, it leads MiniMax-M3 (44), DeepSeek V4 Pro (max, 44) and Kimi K2.6 (43) ➤ Improvements across most evaluations, particularly scientific reasoning: GLM-5.2 gains over GLM-5.1 on most evaluations, led by scientific reasoning on CritPt (+16 points to 21%) and HLE (+12 points to 40%), alongside AA-LCR (+9 points to 71%), tau3 banking (+15 points to 27%) and SciCode (+7 points to 50%). TerminalBench v2.1 also improves (+16 points to 78%) and GPQA Diamond gains 3 points to 89% ➤ Leading open weights model on GDPval-AA v2 and competitive with proprietary models: GLM-5.2 scores 1524 on GDPval-AA v2, ahead of MiniMax-M3 (1418) and DeepSeek V4 Pro (max, 1328). This impressive result places GLM-5.2 in-line with proprietary models including GPT-5.5 (xhigh reasoning). GDPval-AA v2 builds on the original GDPval-AA by baselining Elo to human performance at 1000, introducing a rotating panel of frontier-model judges, and raising the turn limit from 100 to 250 for longer-horizon agent trajectories ➤ GLM-5.2 uses more output tokens per task than other leading open weights models: the model uses 43k output tokens per Intelligence Index task, up from GLM-5.1 (26k) and above MiniMax-M3 (24k), Kimi K2.6 (35k) and DeepSeek V4 Pro (max, 37k) ➤ On the Intelligence vs. Cost per Task Pareto Frontier: GLM-5.2 is on the Pareto frontier of the Intelligence vs Cost per Task chart, with the lowest cost per task among models at its intelligence level. GLM-5.2 costs ~$0.46 per task, compared to GLM-5.1 ($0.25), Kimi K2.6 ($0.31), MiniMax-M3 ($0.18) and DeepSeek V4 Pro (max, $0.05) Additional Model Details: ➤ License: MIT ➤ Size: 744B total parameters, 40B active parameters, equivalent to GLM-5.1 ➤ Context window: 1M tokens, up from 200K on GLM-5.1 ➤ Pricing: $1.4/$0.26/$4.4 per 1M input/cache hit/output tokens ➤ Availability: Alongside Z ai's first-party API, GLM-5.2 is available across third-party providers including @DeepInfra, @novita_labs, @nebiusai, @parasailnetwork , @SiliconFlowAI , @gmi_cloud , @Baseten and @FireworksAI_HQ

English

248

340.7K

slime@slime_framework·17 Haz

Clean abstractions and partial rollout get a real workout in this deep dive on RL system efficiency. Thanks to SemiAnalysis for putting our async modes through their paces!

SemiAnalysis@SemiAnalysis_

RL Systems Mind the Gap: Matching Trainer and Generator Throughput RL Training Infrastructure, GRPO, PipelineRL, Async RL, Policy Staleness, RL Sandbox Infra, CPU Requirements, TCO Analysis, Thinking Machines Tinker newsletter.semianalysis.com/p/rl-systems-m…

English

slime retweetledi

Z.ai@Zai_org·13 Haz

Intelligence should be open, accessible, and ready to build with, empowering every developer, everywhere. GLM-5.2 is now available to all GLM Coding Plan users, including Lite, Pro, Max, and Team plans. docs.z.ai/devpack/latest… As our new flagship model, GLM-5.2 delivers powerful coding capabilities, usable 1M-context support, and continued strengths in long-horizon tasks. API and Chatbot services will launch next week. The model will also be officially open-sourced next week under the MIT License. The future of AI is open, and it belongs to the people.

English

358

996

8.4K

2.6M

slime@slime_framework·11 Haz

@smellslikeml ❤️

QME

Smells Like ML@smellslikeml·11 Haz

@slime_framework

QAM

Smells Like ML@smellslikeml·11 Haz

Outrider opens a PR on a fork of openai-agents-python implementing arxiv.org/abs/2606.06460 github.com/smellslikeml/o…

English

slime@slime_framework·10 Haz

@BanghuaZ @Apodex_AI @radixark ❤️❤️❤️

QME

223

Banghua Zhu@BanghuaZ·10 Haz

@slime_framework @Apodex_AI @radixark slime is definitely one of the best RL framework out there! So happy we build on top of slime there!

English

576

slime@slime_framework·9 Haz

Excited to see @Apodex_AI’s agentic RL work built on Miles, @radixark’s independent fork of slime! We’re happy to see more teams choosing this architecture for industrial-grade agentic RL. This is exactly why we built slime to be clean, extensible, and production-oriented.

Apodex@Apodex_AI

Meet 𝗔𝗽𝗼𝗱𝗲𝘅 𝟭.𝟬 🔭 — a heavy-duty agent team for deep research, which sets the SOTA! The team searches the web, reasons over evidence, and writes reports where every claim is backed by an explicit 𝘦𝘷𝘪𝘥𝘦𝘯𝘤𝘦 𝘤𝘩𝘢𝘪𝘯, independently audited before delivery. 🌐 apodex.ai

English

4.5K

slime@slime_framework·10 Haz

Congrats to the vLLM team on vime! Happy to see slime’s training design inspiring more open RL post-training work. The space benefits from more interoperable systems, more production feedback, and more choices for users. Looking forward to seeing the ecosystem continue to grow.

vLLM@vllm_project

Today we're excited to introduce vime — a simple, stable, and efficient RL framework for LLM post-training in the vLLM ecosystem. Built on slime's proven training design and powered by vLLM inference, vime brings another strong option to the growing vLLM post-training ecosystem. Our goal isn't a one-size-fits-all framework. We want users with different needs to find the right vLLM-ecosystem choice for their workflows—whether that's vime, NeMo RL, OpenRLHF, verl, or others. More choice. More interoperability. More innovation. Learn more: vllm.ai/blog/2026-06-0… #LLM #RLHF #PostTraining #vLLM

English

2.5K

slime retweetledi

LMSYS Org@lmsysorg·9 Haz

📝 New blog: No Token Left Behind: Demystifying Token-In-Token-Out in Miles In agentic RL, a rollout is a chain of model calls, tool outputs & resumed turns. Token-In-Token-Out (TITO) ensures the trainer evaluates the exact tokens the inference engine produced — break it, and training silently drifts off-policy. Why it matters: 📦 One sample per task, not per turn: ~10× less compute on 30–50 turn trajectories 🎯 Keeps every token on-policy How Miles enforces it: 1️⃣ Inference session server: one append-only token buffer per trajectory 2️⃣ Append-only at 3 levels: messages, template rendering, tokens 3️⃣ Pluggable TITO tokenizer: incremental tokenize + per-model splice patches 4️⃣ TokenSeqComparator: verifies every rollout stays bit-perfect Supports Qwen3, GLM, Kimi-K2, Nemotron, Minimax & DeepSeek families.

English

138

23.1K

slime@slime_framework·9 Haz

@Apodex_AI @radixark 🚀🚀🚀

QME

130

Apodex@Apodex_AI·9 Haz

@slime_framework @radixark Built directly on @radixark's Miles, which itself rides on @slime_framework's architecture. Thanks both for shipping infra that production teams can actually fork and build on 🚀

English

874

Keşfet

@NVIDIAAI @Zai_org @BanghuaZ @ExaAILabs @AMD @intel @TencentHunyuan @Alibaba_Qwen