Lingyan H

28 posts

Lingyan H

Lingyan H

@ly_h990

Product @radixark | Ex @GoldmanSachs Quant | ICME @Stanford | Math & CS @VanderbiltU

Katılım Mayıs 2023
114 Takip Edilen19 Takipçiler
Lingyan H retweetledi
LMSYS Org
LMSYS Org@lmsysorg·
🚀 New blog: Serving DeepSeek-V4 on GB300 with SGLang: 5x Higher Throughput at the Same Interactivity Since Day-0 Together with @nvidia, we achieved 5X higher throughput at the same interactivity, serving DeepSeek-V4 on GB300 with SGLang. Here's how the DeepSeek-V4 serving frontier moved on the public @SemiAnalysis_ InferenceX dashboard: 1️⃣ 5X throughput on GB300 disaggregated: ~2,200 → ~11,200 tok/s/GPU at ~50 tok/s/user 2️⃣ 2.6X more throughput at 80 tok/s/user with MTP. Curves now hold deep into the high-interactivity range deployments actually target 3️⃣ 2.91X on Blackwell Ultra aggregated at 30 tok/s/user, with 6X+ peak no-MTP throughput 4️⃣ W4A4 MegaMoE: activations now quantized to MXFP4 with negligible accuracy loss 5️⃣ A single FP8-einsum fix lifted MTP acceptance 0.57 → 0.70 Huge thanks to @NVIDIAAI @radixark for the deep collaboration on this! SGLang is PyTorch-native, and we're excited to share the full write-up on the @PyTorch blog!
PyTorch@PyTorch

While SGLang provided Day-0 support for DeepSeek-V4, the collaboration between the @lmsysorg and @NVIDIAAI engineering teams has taken its production performance to the next level. According to the public SemiAnalysis InferenceX dashboard, the GB300 disaggregated lane (DeepSeek-V4 Pro, FP4, 8K/1K) saw a 5x throughput increase—surging from ~2,200 to ~11,200 tok/s/GPU at identical interactivity levels. These updates sustain high throughput much deeper into target interactivity ranges most deployments target, while also driving a 2.9x lift on the Blackwell Ultra aggregated lane. Find the full technical breakdown in the comments below:

English
7
13
44
10.1K
Lingyan H retweetledi
SGLang
SGLang@sgl_project·
Try it with SGLang (your preferred inference😁)!
ollama@ollama

@Altaf_P7 Try it with @sgl_project or if you have a preferred inference! Cookbooks available: huggingface.co/zai-org/GLM-5.2 We are all working together to make open models viable! Let’s go team open! Ollama’s cloud is my preference but you get choice with openness!

English
1
1
20
2K
Lingyan H
Lingyan H@ly_h990·
SGLang-Jax in production with Ling-2.6-1T. Impressive!
LMSYS Org@lmsysorg

🚀 Our new blog: Optimizing Ling-2.6-1T on TPU with SGLang-JAX: Hiding MoE Data Movement Behind Compute with One Pallas Kernel Ling-2.6-1T, a 1T hybrid MoE model, now serves on TPU v7x with SGLang-JAX. The SGLang-JAX team worked together with @inclusionAI on two fronts: upgrading the fused MoE kernel for deeper compute/comms overlap, and bringing up the full hybrid backbone. 1️⃣ Fused MoE V2: keeps tokens + accumulators VMEM-resident and double-buffers expert weights, hiding routing & prefetch behind compute → MoE prefill −53% 2️⃣ Hybrid memory pools: per-token MLA KV for 10 full-attn layers + per-request recurrent state for 70 GLA layers 3️⃣ GLA linear attention via chunk-wise parallel prefill 4️⃣ Single-controller DP keeps grouped RMSNorm chip-local, no per-layer cross-chip reduce

English
0
0
5
786
Lingyan H
Lingyan H@ly_h990·
@xia_char @Ken_Goldberg @MatthewWon24755 Great overview of the embodied AI landscape! I liked the discussion around model divergence and the reality that convergence may take longer than many expect, and I resonate with this. Thanks for sharing!
English
0
0
0
23
Charlotte Xia
Charlotte Xia@xia_char·
Jim Fan's "Great Parallel" thesis: embodied AI will scale like LLMs did. $5B+ is already betting on #worldmodels. $18B into #robotics. But the field has no shared benchmark, no convergence on architecture, and a 100,000-year data gap (per @Ken_Goldberg). In this blog, Matt and I wrote down some thoughts on why we think the Great Parallel may be harder to come true than it seems, and where we believe the real startup opportunities in the space. charlottexia.substack.com/p/would-the-gr…
Charlotte Xia tweet media
English
11
16
47
14.2K
Lingyan H
Lingyan H@ly_h990·
Thank you, New York and thank you to everyone who came out to our SGLang happy hour during NY Tech Week. A moment I'll remember for a long time: as we wrapped up on Wednesday night, the Knicks had just won Game 1 of the NBA Finals, and the city was alive with celebration. Walking out into that energy, I was struck by how much New Yorkers love both life and the future they're building. The same people cheering for the Knicks were the ones who'd spent the evening with us talking about inference, agents, and AI in production. That's the magic of this city, and that's why we wanted to bring @sgl_project here. So grateful to have shared the night with all of you. See you next time, NYC. 🗽
SGLang@sgl_project

🏙️ SGLang NY Tech Week Happy Hour Recap Last Wednesday, SGLang hosted a NY Tech Week Happy Hour in NYC, co-hosted with @HOFCapital, @Cloudflare, @CrusoeAI, and @ArklexAI. 380+ registered, 200+ in the room, and one unforgettable night. 🧡 The room was packed with engineers, researchers, and enthusiasts from quant funds, banks, and trading firms, all there to talk about one thing: where inference is headed as LLMs move into latency-sensitive production across trading, research, compliance, and risk. NYC, you showed up and brought the energy. We loved every minute. Until next time! ☀️ #NYTechWeek @Techweek_

English
0
0
4
168
Lingyan H retweetledi
SGLang
SGLang@sgl_project·
👋 @sgl_project is back! Welcome to the home of the SGLang community! While @lmsysorg keeps you posted on technical drops and partner news, this space is for you! Here's what we've got lined up: 🚀 Version releases: every new SGLang drop, unpacked 🎙️ Office Hours: deep dives, live deployments, and team Q&A 📺 Tutorials: short how-tos and the best Office Hour moments 🌟 Community spotlights: the cool stuff you're building with SGLang 📅 Event updates: meetups, workshops, and where to catch us next And we'd love to hear from you! What do you want to see? Benchmarks? Model deep dives? A topic for the next Office Hour? Drop it below 👇 Every idea gets read!
SGLang tweet media
English
1
10
51
11.8K
Lingyan H retweetledi
RadixArk
RadixArk@radixark·
Join us in NYC on June 3rd during #NYTechWeek @Techweek_ Liangsheng Yin (@lsyincs) and Mao Cheng (@MCheng89333), both MTS at RadixArk, will present SGLang & Miles, diving into inference infrastructure for finance. RSVP: partiful.com/e/p74X9KDrgoLa…
LMSYS Org@lmsysorg

NYC, we're bringing the inference + finance crowd together for #NYTechWeek @Techweek_! SGLang Happy Hour: AI Infra in Finance 🕤Wed, June 3 · 6–9 PM ET 📍1/2 Bond St, New York Co-hosted with @HOFCapital, @CrusoeAI, @CloudflareDev, @ArklexAI. Lightning talks from inference engineers and researchers shipping into trading, research, compliance, and risk, followed by an open happy hour for networking. More surprise speakers to be announced — stay tuned 👀 Expected attendees from leading quant funds, banks, and trading firms, including Jane Street, Citadel, Two Sigma, Goldman Sachs, Bloomberg, among others. We've also got a bartender on site and a full bar. Come have a drink with us! Limited space. RSVP 👇 partiful.com/e/p74X9KDrgoLa…

English
0
4
22
10.3K
Lingyan H
Lingyan H@ly_h990·
So exciting to bring SGLang to the east coast! Come hang out with us in NYC on June 3rd 🥂
LMSYS Org@lmsysorg

NYC, we're bringing the inference + finance crowd together for #NYTechWeek @Techweek_! SGLang Happy Hour: AI Infra in Finance 🕤Wed, June 3 · 6–9 PM ET 📍1/2 Bond St, New York Co-hosted with @HOFCapital, @CrusoeAI, @CloudflareDev, @ArklexAI. Lightning talks from inference engineers and researchers shipping into trading, research, compliance, and risk, followed by an open happy hour for networking. More surprise speakers to be announced — stay tuned 👀 Expected attendees from leading quant funds, banks, and trading firms, including Jane Street, Citadel, Two Sigma, Goldman Sachs, Bloomberg, among others. We've also got a bartender on site and a full bar. Come have a drink with us! Limited space. RSVP 👇 partiful.com/e/p74X9KDrgoLa…

English
0
0
2
78
Lingyan H retweetledi
RadixArk
RadixArk@radixark·
We've heard the community's feedback. Our intent was to make sure the credits reached the people who supported SGLang along the way, and we couldn't be here without you. We're updating the offer to better reflect that. RadixArk's platform is open for beta, and we're offering $200 in compute credits to get you started → Sign up at platform.radixark.com and repost this so we can get you set up. → Limited spots, first come first serve. Open through May 13, 2026 (AoE). → Credits will be granted after we verify the repost. (If you already reposted our earlier announcement, that counts too; no need to do it again.) And if SGLang has been useful in your work, consider giving it a star on GitHub. It's a small gesture that means a lot to the people maintaining it. We're in this together, and we're grateful to be building it with you 🧡
RadixArk tweet media
English
13
85
172
15K
Lingyan H
Lingyan H@ly_h990·
Let’s goooo!
RadixArk@radixark

Hey everyone, we hear you, and we've updated the post: x.com/radixark/statu… Our original intent was to give back to the people who supported SGLang, the contributors, the early users, the ones who believed in the project. None of this would exist without you, and this was our way of saying thank you. We're sorry for the confusion it caused. Thank you for caring enough to speak up, and we're grateful to be on this journey with you. Let's go SGLang!

English
0
0
1
64
Lingyan H
Lingyan H@ly_h990·
So proud to be part of this team!
RadixArk@radixark

Today, we are thrilled to officially launch RadixArk with $100M in Seed funding at a $400M valuation. The round was led by @Accel and co-led by @sparkcapital. RadixArk exists to make frontier AI infrastructure open and accessible to everyone. Today, the systems behind the most capable AI models are concentrated in a small number of companies. As a result, most AI teams are forced to rebuild training and inference stacks from scratch, duplicating the same infrastructure work instead of focusing on new models, products, and ideas. RadixArk was founded to change that. We are building an AI platform that makes it easier for teams to train and serve the best models at scale. RadixArk comes from the open-source community. We started with SGLang, where many of us are core developers and maintainers, and expanded our work to Miles for large-scale RL and post-training. We will continue contributing to both projects and working with the community to make them the strongest open-source infrastructure foundations for frontier AI. We would like to thank our long-term partners, contributors, and the broader SGLang community for believing in this mission. We're also grateful to @Accel and @sparkcapital, NVentures (Venture capital arm of @nvidia), Salience Capital, A&E Investment, @HOFCapital, @walden_catalyst, @AMD, LDVP, WTT Fubon Family, @MediaTek, Vocal Ventures, @Sky9Capital and our angel investors @ibab, @LipBuTan1, Hock Tan, @johnschulman2, @soumithchintala, @lilianweng, @oliveur, @Thom_Wolf, @LiamFedus, @robertnishihara, @ericzelikman, @OfficialLoganK, and @multiply_matrix among others. Thanks for the exclusive interview with @MeghanBobrowsky at @WSJ about our vision.

English
1
1
8
376
Lingyan H retweetledi
LMSYS Org
LMSYS Org@lmsysorg·
🚀 We just published a deep technical blog on how SGLang and Miles delivered Day-0 support for DeepSeek-V4. 199 tok/s on B200 (Pro 1.6T), 266 tok/s on H200 (Flash 284B) at 4K context, and throughput stays strong at 900K context (180 and 240 tok/s respectively). This is a full story behind V4 Pro (1.6T) and Flash (284B): how we built systems for hybrid sparse attention, manifold-constrained hyper-connections (mHC), and FP4 expert weights, plus a full RL training stack that runs at 1.6T scale. What's covered: 1. Inference (caching and attention): ShadowRadix prefix cache, HiSparse CPU-extended KV, MTP speculative decoding with in-graph metadata, Flash Compressor, Lightning TopK, hierarchical multi-stream overlap. 2. Inference (kernels and deployment): fast kernel integrations (FlashMLA, FlashInfer TRTLLM-Gen MoE, DeepGEMM Mega MoE, TileLang mHC), DP/TP/CP attention, EP MoE on DeepEP, PD disaggregation. 3. RL training: full parallelism (DP/TP/SP/EP/PP/CP), tilelang attention, enhanced stability, FP8 training. 4. Multi-hardware: NVIDIA Hopper, Blackwell, Grace Blackwell, AMD, NPU.
LMSYS Org tweet media
English
7
53
266
59.1K
Lingyan H
Lingyan H@ly_h990·
Brand new design! And finally don’t need to check separate websites for usage and recipe 👌
LMSYS Org@lmsysorg

📚 SGLang Docs just got a major upgrade! We've migrated SGLang Document and SGLang Cookbook to one unified home on Mintlify: 🔗 docs.sglang.io What's new: → Fresh look, cleaner structure, built-in "Ask AI" → General User Guide, SGLang Diffusion Guide, and Cookbook all in one place → Dedicated hardware page: NVIDIA, AMD, Ascend NPU, TPU, Jetson Orin, XPU, CPU → Clearer Cookbook sections: Autoregressive & Diffusion Models, each with Benchmark cards We rebuilt the docs from the ground up, making it a smoother reading experience for humans, and fundamentally more friendly to the coding agents in your workflow. Huge thanks to @mintlify for powering this upgrade, and to the dedicated contributors from @ACM_VIT for their work on the migration. Happy building! 🔧

English
0
0
2
58
Lingyan H retweetledi
LMSYS Org
LMSYS Org@lmsysorg·
🎉 Congrats to @Zai_org on releasing GLM-5.1, SGLang is ready to support on day-0! GLM-5.1 is a next-gen flagship built for agentic engineering: 🏆 SWE-Bench Pro: #1 open source, #3 globally 🔨 Terminal-Bench 2.0: top-ranked on real-world terminal tasks ⏳ Long-Horizon: runs autonomously for 8 hours through thousands of tool calls Cookbook:cookbook.sglang.io/autoregressive… Try it now with SGLang!
LMSYS Org tweet media
Z.ai@Zai_org

Introducing GLM-5.1: The Next Level of Open Source - Top-Tier Performance: #1 in open source and #3 globally across SWE-Bench Pro, Terminal-Bench, and NL2Repo. - Built for Long-Horizon Tasks: Runs autonomously for 8 hours, refining strategies through thousands of iterations. Blog: z.ai/blog/glm-5.1 Weights: huggingface.co/zai-org/GLM-5.1 API: docs.z.ai/guides/llm/glm… Coding Plan: z.ai/subscribe Coming to chat.z.ai in the next few days.

English
5
11
72
7.3K
Lingyan H retweetledi
LMSYS Org
LMSYS Org@lmsysorg·
🎬 SGLang at GTC 2026: Full Recap SGLang showed up at GTC 2026, and we didn't hold back. 5 events in 3 days. Keynote feature, open-source AI panel, a happy hour, a hands-on training lab, and a 200-person meetup at LinkedIn HQ. The full recap is live on the newly launched lmsys.org 🎉
LMSYS Org tweet media
English
4
2
14
1.3K
Lingyan H retweetledi
LMSYS Org
LMSYS Org@lmsysorg·
🎉 Congrats on the Gemma 4 launch from @googlegemma, day-0 support is now live in SGLang! Gemma 4 is a multimodal family (4 sizes: E2B, E4B, 26B A4B, and 31B) with both Dense and MoE architectures, built for everything from mobile to server-scale: 👁️ Rich multimodal understanding: Text, image, video, and audio (E2B/E4B) all in one model 🧠 Built-in thinking mode: Configurable step-by-step reasoning 📚 Massive context: Up to 256K tokens for the medium models 🔧 Native function calling for agentic workflows Cookbook: cookbook.sglang.io/autoregressive… Run it now with SGLang!
LMSYS Org tweet media
Google Gemma@googlegemma

Meet Gemma 4! Purpose-built for advanced reasoning and agentic workflows on the hardware you own, and released under an Apache 2.0 license. We listened to invaluable community feedback in developing these models. Here is what makes Gemma 4 our most capable open models yet: 👇

English
1
9
38
4.7K
Lingyan H retweetledi
LMSYS Org
LMSYS Org@lmsysorg·
🌐 lmsys.org is live with a full rebuild: faster, easier to read, and now with an open talent pool! We migrated from a GitHub Pages static blog to a modern Next.js 15 + React 19 site, with the blog build time dropping from ~20min to ~1min. 🎨 What's new: - Refreshed visual design with responsive layout, fluid typography, and a new project timeline - New blog categories (Tech Blogs / News) & full-text search: filter and find exactly what you're looking for - Live scroll tracking on every post, easy to navigate long technical reads 🛠 Better experience for contributors: - PR preview staging: Live PR previews before merge - Full SEO overhaul for better discoverability: dynamic sitemap, OpenGraph, Twitter Cards 🌟 We're also launching North Star talent pool, a long-term talent initiative for engineers, researchers, and builders who want their work to run in production. We're looking for people across inference systems, training infrastructure, AI agents, developer tools, and Event Operations & Community Ops experts. Drop by if you're looking to contribute or join. 🌐 Check out the new site: lmsys.org 📩 Interested in North Star? Drop us a note at talent@lmsys.org
LMSYS Org tweet mediaLMSYS Org tweet media
English
0
2
23
4K
Lingyan H retweetledi
LMSYS Org
LMSYS Org@lmsysorg·
🚀 @AMD just dropped a full guide: run OpenClaw🦞 for free on AMD Developer Cloud with Qwen3.5 + SGLang on a single MI300X! Following SGLang's support in @OpenClaw, here's a free way to get your own stack running: 🆓 $100 AMD Developer Cloud credits (~50 hrs of MI300X) 🧠 Qwen3.5-122B-A10B-FP8 served via SGLang in one Docker command 🔧 SGLang selectable natively in OpenClaw's onboarding CLI Your own self-hosted agent stack, on enterprise hardware, at zero cost. Happy building! 🫡 👉Blog link: amd.com/en/developer/r…
LMSYS Org tweet media
English
1
10
40
4.1K