Shuo Yang

60 posts

Shuo Yang

@Andy_ShuoYang

2nd year phd at Berkeley; Efficient ML System;

Berkeley Katılım Şubat 2023

96 Takip Edilen97 Takipçiler

Shuo Yang retweetledi

Chenfeng_X@Chenfeng_X·19 May

Excited that our paper StreamdiffusionV2 received the Best Research Paper Award at #MLSys26! 🚀Video generation is quickly moving from demos to production-facing workloads. It is no longer a turn-based pipeline but should be a streaming pipeline to interact with users. 📖Our project page: streamdiffusionv2.github.io and paper: arxiv.org/pdf/2511.07399 👂Come join the talk if you are interested in streaming video generation. Our talk will be at the Research Track Oral Presentation: Best Paper Session on Tue 8:45AM at #MLSys26 , I will talk about how we attacked the efficiency and quality challenges. Hope to see you there! ❤️Huge thanks to all authors! This work would not have been possible without the incredible effort from the entire team. Big shout out to Tianrui Feng, Zhi Li, @Andy_ShuoYang , @HaochengXiUCB, @lmxyy1999 , @lvminzhang , @xiuyu_l , Keting Yang, @ZiqiPeng, @songhan_mit , @magrawala, @KurtKeutzer , and @cumulo_autumn

English

211

56.7K

Shuo Yang retweetledi

Qiuyang Mang@MangQiuyang·15 May

Open-ended coding training data may no longer be the bottleneck: AI can scale open-ended tasks—and even outperform human-expert curation. FrontierCS team is releasing FrontierSmith: a system for synthesizing open-ended coding problems at scale. Starting from closed-ended coding tasks, FrontierSmith mutates, filters, and builds runnable optimization environments for long-horizon coding agents. In our experiments, FrontierSmith data trains stronger models than human-curated open-ended data on FrontierCS and ALE-bench. Blog: frontier-cs.org/blog/frontiers… Paper: arxiv.org/abs/2605.14445 Code: github.com/FrontierCS/Fro… Model: huggingface.co/runyuanhe/qwen…

English

330

92.6K

Shuo Yang retweetledi

YUCHAO GU@YuchaoGu·14 May

🚀 We are excited to announce the release of AnyFlow, the first any-step video diffusion on-policy distillation (OPD) framework. By leveraging Flow Map distillation, AnyFlow significantly enhances model inference efficiency by reducing sample steps. (Code, models, and demos are now open-source!) Key Highlights: ⚡ Any-Step Generation: Unlike traditional distilled models tied to fixed step budgets, AnyFlow enables a single model to adapt to arbitrary inference budgets. It achieves high-quality few-step generation while providing stable improvements as more sampling steps are added. 🔀 Multiple Architectures: AnyFlow supports any-step distillation for both causal and bidirectional video diffusion models. 🎬 Multiple Tasks: AnyFlow supports Text-to-Video, Image-to-Video, and Video-to-Video generation within one causal video diffusion model. 📈 Scalable Performance: AnyFlow is validated from 1.3B up to 14B parameters. 📄 Paper: huggingface.co/papers/2605.13… 💻 Code: github.com/NVlabs/AnyFlow 🎨 Pre-trained Models: huggingface.co/collections/nv… 🎬 Demo: nvlabs.github.io/AnyFlow/demo

English

176

22.7K

Shuo Yang retweetledi

Qiuyang Mang@MangQiuyang·12 May

We integrated FrontierCS into Harbor and are releasing a preview long-horizon agent leaderboard (up to 835 turns, ~200K output tokens) with Kimi K2.6 @Kimi_Moonshot (score 46.9) and Claude Code Opus 4.7 @claudeai (43.0) 🚢. The goal: evaluate frontier coding agents in a setting where they iteratively write code, run experiments, read feedback, and improve in an extremely long loop. FrontierCS tasks are open-ended optimization problems. Each task has a continuous score. There is no single accepted output. Agents need to search for better solutions under a step/time/token budget. This makes FrontierCS a natural fit for agentic evaluation. Just plan, code, test, revise, fail, recover, and keep optimizing. Check out our blog: frontier-cs.org/blog/harbor FrontierCS GitHub: github.com/FrontierCS/Fro…

English

142

28K

Shuo Yang retweetledi

Melissa Pan@melissapan·30 Nis

Excited to share: MAP has been accepted as 🌟 ICML Spotlight 🌟 We hope MAP can provide data-driven insights that help the communities to work on various under-explored research directions around agent systems! Huge thanks & congrats to my amazing co-authors. See you all at Seoul! 🫡

English

232

47.6K

Shuo Yang retweetledi

Haocheng Xi@HaochengXiUCB·29 Nis

🎥 Video generation is hitting the memory wall. As videos get longer, the KV cache quietly explodes — and long-horizon consistency starts to break. We built Quant VideoGen: a training-free KV cache compression method for auto-regressive video diffusion. Instead of storing every KV in high precision, QVG exploits video’s spatiotemporal redundancy with semantic-aware smoothing + progressive residual quantization. 🚀 Up to 7× KV memory reduction ⚡ <4% overhead ✅ Strong long-video quality 🕹️ Deploy HYWorldPlay on your own RTX 5090 locally KV compression is becoming a core scaling primitive — not just for LLMs, but for video generation too. Paper: arxiv.org/abs/2602.02958 Code: github.com/svg-project/Qu… (1/5)

English

266

63K

Shuo Yang retweetledi

Haocheng Xi@HaochengXiUCB·27 Mar

Really exciting to see KV-cache compression getting attention. A similar bottleneck shows up beyond LLMs: for world models and autoregressive long-video generation, KV cache can quickly dominate memory and limit long-horizon consistency. Our recent work, Quant VideoGen, explores training-free 2-bit KV-cache quantization for video diffusion models, achieving up to 7.0× KV memory reduction with <4% latency overhead. Link: arxiv.org/abs/2602.02958

Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

English

482

53.4K

Shuo Yang@Andy_ShuoYang·19 Mar

@CyrusMaher @HaochengXiUCB Flash-kmeans has been adopted by some sparse attention for video generation! Check arxiv.org/pdf/2505.18875

English

Cyrus Maher@CyrusMaher·17 Mar

@HaochengXiUCB Great work! Do you think this could be used for sparse long context attention?

English

116

Shuo Yang retweetledi

Haocheng Xi@HaochengXiUCB·17 Mar

𝗞-𝗺𝗲𝗮𝗻𝘀 𝗶𝘀 𝘀𝗶𝗺𝗽𝗹𝗲. 𝗠𝗮𝗸𝗶𝗻𝗴 𝗶𝘁 𝗳𝗮𝘀𝘁 𝗼𝗻 𝗚𝗣𝗨𝘀 𝗶𝘀𝗻’𝘁. That’s why we built Flash-KMeans — an IO-aware implementation of exact k-means that rethinks the algorithm around modern GPU bottlenecks. By attacking the memory bottlenecks directly, Flash-KMeans achieves 30x speedup over cuML and 200x speedup over FAISS — with the same exact algorithm, just engineered for today’s hardware. At the million-scale, Flash-KMeans can complete a k-means iteration in milliseconds. A classic algorithm — redesigned for modern GPUs. Paper: arxiv.org/abs/2603.09229 Code: github.com/svg-project/fl…

English

201

1.8K

306.9K

Shuo Yang retweetledi

Shu Lynn Liu@shulynnliu·17 Mar

Researchers spend hours and hours hand-crafting the strategies behind LLM-driven optimization systems like AlphaEvolve: deciding which ideas to reuse, when to explore vs exploit, and what mutations to try. 🤖But what if AI could evolve its own evolution process? We introduce EvoX, a meta-evolution pipeline that lets AI evolve the strategy guiding the optimization. It achieves high-quality solutions for <$5, while existing open systems and even Claude Code often cost 3-5× more on some tasks. Across ~200 optimization problems, EvoX delivers the strongest overall results: often outperforming AlphaEvolve, OpenEvolve, GEPA, and ShinkaEvolve on math and systems tasks, exceeding human SOTA, and improving median performance by up to 61% on 172 competitive programming problems. 👇

English

498

99.2K

Shuo Yang@Andy_ShuoYang·17 Mar

@MangQiuyang @HaochengXiUCB We should find out lol

English

Qiuyang Mang@MangQiuyang·17 Mar

@HaochengXiUCB Can we discover a faster k-means algo by k-search?

English

464

Shuo Yang@Andy_ShuoYang·17 Mar

@botir33751732 @HaochengXiUCB We will explore more algorithms soon!

English

151

Botir Khaltaev@botir33751732·17 Mar

@HaochengXiUCB bisecting-kmeans, mini-batch-kmeans, hdbscan, spectral etc

English

341

Shuo Yang retweetledi

Qiuyang Mang@MangQiuyang·16 Mar

🚀 Excited to share our new work, SVG-EAR! Built on SVG2, we study a simple but important setting for accelerating video diffusion transformers: compute only a subset of attention blocks exactly, and recover the rest with a training-free linear compensation. A key takeaway from this project is that, once compensation is introduced, routing should not be treated as a naive top-p selection problem. Instead, routing and compensation need to be co-designed: the real question is not just which blocks have high scores, but which skipped blocks can be accurately recovered and which still need exact computation. Based on this idea, SVG-EAR combines parameter-free linear compensation with error-aware routing and delivers a clear quality gain over SVG2 at similar runtime 📈 On the project page, we show about +2.1 PSNR on Wan 2.2 I2V at the same 1.61× speedup, along with consistent gains on Wan 2.2 T2V and HunyuanVideo. Really excited about this direction for making sparse video generation both faster and more faithful 🎥✨ Huge thanks to all collaborators Xuanyi Zhou @randwalk0 @HaochengXiUCB @Jintao_Zhang_ @HuanzhiMao @profjoeyg @KurtKeutzer @istoica05 @alvinkcheung for making this possible 🙌 See the project page for more details and videos: svg-project.github.io/v3/

English

1.8K

Shuo Yang retweetledi

AK@_akhaliq·12 Mar

Flash-KMeans Fast and Memory-Efficient Exact K-Means paper: huggingface.co/papers/2603.09…

English

43.9K

Shuo Yang retweetledi

Hanchen Li@lihanc02·10 Mar

x.com/i/article/2031…

ZXX

13.3K

Shuo Yang retweetledi

Shu Lynn Liu@shulynnliu·3 Mar

AlphaEvolve is closed-source. We release 🌟SkyDiscover🌟, a flexible, modular open-source framework with two new adaptive algorithms that match or exceed AlphaEvolve on many benchmarks and outperform OpenEvolve, GEPA, and ShinkaEvolve across 200+ optimization tasks. Our new algorithms dynamically adapt their search strategy, and can even let the AI optimize its own optimization process on the fly! Results: 📊 +34% median score improvement on 172 Frontier-CS problems. 🧮 Matches/exceeds AlphaEvolve on many math benchmarks ⚙️ Discovers system optimizations beyond human-designed SOTA 🧵👇

GIF

English

105

582

141.5K

Shuo Yang retweetledi

Shiyi Cao@shiyi_c98·26 Şub

Introducing our new work K-Search: LLM Kernel Generation via Co-Evolving Intrinsic World Model — a new paradigm for automated GPU kernel generation, achieving SoTA results. 🔍 Big insight: Traditional methods treat LLMs as stochastic code generators inside heuristic loops — but this misses a key point: LLMs are powerful planners with rich domain priors. 🧠 Core idea: K-Search uses the LLM itself as a co-evolving world model — one that plans + updates beliefs + guides search decisions based on experience. 📌 This decouples high-level strategy (intent) from low-level code implementation, allowing the optimizer to pursue multi-step transformations even when intermediate implementations don’t immediately improve performance. 📈 Key results: 🔥 Our discovered kernels are ~2.10× average speedup vs state-of-the-art evolutionary search across 4 FlashInfer kernels on H100/B200. 🔥 Up to 14.3× gain on complex Mixture-of-Experts (MoE) kernels. 🔥 State-of-the-art performance on GPUMode TriMul (H100) task — beating both automated and human solutions. 🙏 Acknowledgements This work is developed in @BerkeleySky, w/ the amazing @ziming_mao, @profjoeyg, and @istoica05. We thank @DachengLi177, @MayankMish98, @randwalk0, @pgasawa, @fangz_zzu, and @tian_xia_ for helpful discussion and feedback. We also thank the generous compute support from @databricks, @awscloud, @anyscalecompute, @nvidia, @Google, @LambdaAPI, and @MayfieldFund. 👨‍💻 GitHub: github.com/caoshiyi/K-Sea… 📄 arXiv: arxiv.org/pdf/2602.19128…

English

305

96.1K

Shuo Yang retweetledi

Alvin Cheung@alvinkcheung·3 Şub

We recently released version 1.0 of Frontier CS 🎉🎉🎉 -- a benchmark that aims to measure the performance of frontier models' ability to solve 200+ open-ended computer science problems.

English

5.2K

Shuo Yang retweetledi

AI-Driven Research for Systems@ai4research_ucb·8 Oca

🎯 We leveraged ADRS to automatically discover a model-placement algorithm that delivers 17% performance improvement for GPU sharing in multi-LLM serving. [ADRS Blog #9] Starting from a deliberately “bad” random baseline, AI was able to rapidly rediscovered an algorithm similar to our manually designed heuristic, and then pushed beyond it with two additional enhancements. This case study further illustrates how ADRS can accelerate algorithm design and reshape the way we approach systems research, moving from handcrafted heuristics to automated discovery and refinement. ✍️ Read the blog: adrs-ucb.notion.site/prism 📖 ADRS Blog Series: ucbskyadrs.github.io 📄 ADRS Paper: arxiv.org/abs/2510.06189 👩‍💻 Code: github.com/UCB-ADRS/ADRS

AI-Driven Research for Systems tweet media

English

10.6K

Shuo Yang retweetledi

Wentao Guo@WentaoGuo7·19 Ara

🚀SonicMoE🚀: a blazingly-fast MoE implementation optimized for NVIDIA Hopper GPUs. SonicMoE reduces activation memory by 45% and is 1.86x faster on H100 than previous SOTA😃 Paper: arxiv.org/abs/2512.14080 Work with @MayankMish98, @XinleC295, @istoica05, @tri_dao

English

112

639

247.6K

Keşfet

@HaochengXiUCB @lmxyy1999 @lvminzhang @xiuyu_l @ZiqiPeng @songhan_mit @magrawala @KurtKeutzer