Shuo Yang

60 posts

Shuo Yang

Shuo Yang

@Andy_ShuoYang

2nd year phd at Berkeley; Efficient ML System;

Berkeley Katılım Şubat 2023
96 Takip Edilen97 Takipçiler
Shuo Yang retweetledi
Chenfeng_X
Chenfeng_X@Chenfeng_X·
Excited that our paper StreamdiffusionV2 received the Best Research Paper Award at #MLSys26! 🚀Video generation is quickly moving from demos to production-facing workloads. It is no longer a turn-based pipeline but should be a streaming pipeline to interact with users. 📖Our project page: streamdiffusionv2.github.io and paper: arxiv.org/pdf/2511.07399 👂Come join the talk if you are interested in streaming video generation. Our talk will be at the Research Track Oral Presentation: Best Paper Session on Tue 8:45AM at #MLSys26 , I will talk about how we attacked the efficiency and quality challenges. Hope to see you there! ❤️Huge thanks to all authors! This work would not have been possible without the incredible effort from the entire team. Big shout out to Tianrui Feng, Zhi Li, @Andy_ShuoYang , @HaochengXiUCB, @lmxyy1999 , @lvminzhang , @xiuyu_l , Keting Yang, @ZiqiPeng, @songhan_mit , @magrawala, @KurtKeutzer , and @cumulo_autumn
English
5
33
211
56.7K
Shuo Yang retweetledi
Qiuyang Mang
Qiuyang Mang@MangQiuyang·
Open-ended coding training data may no longer be the bottleneck: AI can scale open-ended tasks—and even outperform human-expert curation. FrontierCS team is releasing FrontierSmith: a system for synthesizing open-ended coding problems at scale. Starting from closed-ended coding tasks, FrontierSmith mutates, filters, and builds runnable optimization environments for long-horizon coding agents. In our experiments, FrontierSmith data trains stronger models than human-curated open-ended data on FrontierCS and ALE-bench. Blog: frontier-cs.org/blog/frontiers… Paper: arxiv.org/abs/2605.14445 Code: github.com/FrontierCS/Fro… Model: huggingface.co/runyuanhe/qwen…
English
14
70
330
92.6K
Shuo Yang retweetledi
YUCHAO GU
YUCHAO GU@YuchaoGu·
🚀 We are excited to announce the release of AnyFlow, the first any-step video diffusion on-policy distillation (OPD) framework. By leveraging Flow Map distillation, AnyFlow significantly enhances model inference efficiency by reducing sample steps. (Code, models, and demos are now open-source!) Key Highlights: ⚡ Any-Step Generation: Unlike traditional distilled models tied to fixed step budgets, AnyFlow enables a single model to adapt to arbitrary inference budgets. It achieves high-quality few-step generation while providing stable improvements as more sampling steps are added. 🔀 Multiple Architectures: AnyFlow supports any-step distillation for both causal and bidirectional video diffusion models. 🎬 Multiple Tasks: AnyFlow supports Text-to-Video, Image-to-Video, and Video-to-Video generation within one causal video diffusion model. 📈 Scalable Performance: AnyFlow is validated from 1.3B up to 14B parameters. 📄 Paper: huggingface.co/papers/2605.13… 💻 Code: github.com/NVlabs/AnyFlow 🎨 Pre-trained Models: huggingface.co/collections/nv… 🎬 Demo: nvlabs.github.io/AnyFlow/demo
English
4
33
176
22.7K
Shuo Yang retweetledi
Qiuyang Mang
Qiuyang Mang@MangQiuyang·
We integrated FrontierCS into Harbor and are releasing a preview long-horizon agent leaderboard (up to 835 turns, ~200K output tokens) with Kimi K2.6 @Kimi_Moonshot (score 46.9) and Claude Code Opus 4.7 @claudeai (43.0) 🚢. The goal: evaluate frontier coding agents in a setting where they iteratively write code, run experiments, read feedback, and improve in an extremely long loop. FrontierCS tasks are open-ended optimization problems. Each task has a continuous score. There is no single accepted output. Agents need to search for better solutions under a step/time/token budget. This makes FrontierCS a natural fit for agentic evaluation. Just plan, code, test, revise, fail, recover, and keep optimizing. Check out our blog: frontier-cs.org/blog/harbor FrontierCS GitHub: github.com/FrontierCS/Fro…
Qiuyang Mang tweet media
English
5
20
142
28K
Shuo Yang retweetledi
Melissa Pan
Melissa Pan@melissapan·
Excited to share: MAP has been accepted as 🌟 ICML Spotlight 🌟 We hope MAP can provide data-driven insights that help the communities to work on various under-explored research directions around agent systems! Huge thanks & congrats to my amazing co-authors. See you all at Seoul! 🫡
Melissa Pan tweet mediaMelissa Pan tweet media
English
10
30
232
47.6K
Shuo Yang retweetledi
Haocheng Xi
Haocheng Xi@HaochengXiUCB·
🎥 Video generation is hitting the memory wall. As videos get longer, the KV cache quietly explodes — and long-horizon consistency starts to break. We built Quant VideoGen: a training-free KV cache compression method for auto-regressive video diffusion. Instead of storing every KV in high precision, QVG exploits video’s spatiotemporal redundancy with semantic-aware smoothing + progressive residual quantization. 🚀 Up to 7× KV memory reduction ⚡ <4% overhead ✅ Strong long-video quality 🕹️ Deploy HYWorldPlay on your own RTX 5090 locally KV compression is becoming a core scaling primitive — not just for LLMs, but for video generation too. Paper: arxiv.org/abs/2602.02958 Code: github.com/svg-project/Qu… (1/5)
English
11
53
266
63K
Shuo Yang retweetledi
Haocheng Xi
Haocheng Xi@HaochengXiUCB·
Really exciting to see KV-cache compression getting attention. A similar bottleneck shows up beyond LLMs: for world models and autoregressive long-video generation, KV cache can quickly dominate memory and limit long-horizon consistency. Our recent work, Quant VideoGen, explores training-free 2-bit KV-cache quantization for video diffusion models, achieving up to 7.0× KV memory reduction with <4% latency overhead. Link: arxiv.org/abs/2602.02958
Haocheng Xi tweet media
Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

English
16
68
482
53.4K
Cyrus Maher
Cyrus Maher@CyrusMaher·
@HaochengXiUCB Great work! Do you think this could be used for sparse long context attention?
English
1
0
0
116
Shuo Yang retweetledi
Haocheng Xi
Haocheng Xi@HaochengXiUCB·
𝗞-𝗺𝗲𝗮𝗻𝘀 𝗶𝘀 𝘀𝗶𝗺𝗽𝗹𝗲. 𝗠𝗮𝗸𝗶𝗻𝗴 𝗶𝘁 𝗳𝗮𝘀𝘁 𝗼𝗻 𝗚𝗣𝗨𝘀 𝗶𝘀𝗻’𝘁. That’s why we built Flash-KMeans — an IO-aware implementation of exact k-means that rethinks the algorithm around modern GPU bottlenecks. By attacking the memory bottlenecks directly, Flash-KMeans achieves 30x speedup over cuML and 200x speedup over FAISS — with the same exact algorithm, just engineered for today’s hardware. At the million-scale, Flash-KMeans can complete a k-means iteration in milliseconds. A classic algorithm — redesigned for modern GPUs. Paper: arxiv.org/abs/2603.09229 Code: github.com/svg-project/fl…
English
36
201
1.8K
306.9K
Shuo Yang retweetledi
Shu Lynn Liu
Shu Lynn Liu@shulynnliu·
Researchers spend hours and hours hand-crafting the strategies behind LLM-driven optimization systems like AlphaEvolve: deciding which ideas to reuse, when to explore vs exploit, and what mutations to try. 🤖But what if AI could evolve its own evolution process? We introduce EvoX, a meta-evolution pipeline that lets AI evolve the strategy guiding the optimization. It achieves high-quality solutions for <$5, while existing open systems and even Claude Code often cost 3-5× more on some tasks. Across ~200 optimization problems, EvoX delivers the strongest overall results: often outperforming AlphaEvolve, OpenEvolve, GEPA, and ShinkaEvolve on math and systems tasks, exceeding human SOTA, and improving median performance by up to 61% on 172 competitive programming problems. 👇
Shu Lynn Liu tweet media
English
19
85
498
99.2K
Shuo Yang retweetledi
Qiuyang Mang
Qiuyang Mang@MangQiuyang·
🚀 Excited to share our new work, SVG-EAR! Built on SVG2, we study a simple but important setting for accelerating video diffusion transformers: compute only a subset of attention blocks exactly, and recover the rest with a training-free linear compensation. A key takeaway from this project is that, once compensation is introduced, routing should not be treated as a naive top-p selection problem. Instead, routing and compensation need to be co-designed: the real question is not just which blocks have high scores, but which skipped blocks can be accurately recovered and which still need exact computation. Based on this idea, SVG-EAR combines parameter-free linear compensation with error-aware routing and delivers a clear quality gain over SVG2 at similar runtime 📈 On the project page, we show about +2.1 PSNR on Wan 2.2 I2V at the same 1.61× speedup, along with consistent gains on Wan 2.2 T2V and HunyuanVideo. Really excited about this direction for making sparse video generation both faster and more faithful 🎥✨ Huge thanks to all collaborators Xuanyi Zhou @randwalk0 @HaochengXiUCB @Jintao_Zhang_ @HuanzhiMao @profjoeyg @KurtKeutzer @istoica05 @alvinkcheung for making this possible 🙌 See the project page for more details and videos: svg-project.github.io/v3/
English
2
8
22
1.8K
Shuo Yang retweetledi
Shu Lynn Liu
Shu Lynn Liu@shulynnliu·
AlphaEvolve is closed-source. We release 🌟SkyDiscover🌟, a flexible, modular open-source framework with two new adaptive algorithms that match or exceed AlphaEvolve on many benchmarks and outperform OpenEvolve, GEPA, and ShinkaEvolve across 200+ optimization tasks. Our new algorithms dynamically adapt their search strategy, and can even let the AI optimize its own optimization process on the fly! Results: 📊 +34% median score improvement on 172 Frontier-CS problems. 🧮 Matches/exceeds AlphaEvolve on many math benchmarks ⚙️ Discovers system optimizations beyond human-designed SOTA 🧵👇
GIF
English
12
105
582
141.5K
Shuo Yang retweetledi
Shiyi Cao
Shiyi Cao@shiyi_c98·
Introducing our new work K-Search: LLM Kernel Generation via Co-Evolving Intrinsic World Model — a new paradigm for automated GPU kernel generation, achieving SoTA results. 🔍 Big insight: Traditional methods treat LLMs as stochastic code generators inside heuristic loops — but this misses a key point: LLMs are powerful planners with rich domain priors. 🧠 Core idea: K-Search uses the LLM itself as a co-evolving world model — one that plans + updates beliefs + guides search decisions based on experience. 📌 This decouples high-level strategy (intent) from low-level code implementation, allowing the optimizer to pursue multi-step transformations even when intermediate implementations don’t immediately improve performance. 📈 Key results: 🔥 Our discovered kernels are ~2.10× average speedup vs state-of-the-art evolutionary search across 4 FlashInfer kernels on H100/B200. 🔥 Up to 14.3× gain on complex Mixture-of-Experts (MoE) kernels. 🔥 State-of-the-art performance on GPUMode TriMul (H100) task — beating both automated and human solutions. 🙏 Acknowledgements This work is developed in @BerkeleySky, w/ the amazing @ziming_mao, @profjoeyg, and @istoica05. We thank @DachengLi177, @MayankMish98, @randwalk0, @pgasawa, @fangz_zzu, and @tian_xia_ for helpful discussion and feedback. We also thank the generous compute support from @databricks, @awscloud, @anyscalecompute, @nvidia, @Google, @LambdaAPI, and @MayfieldFund. 👨‍💻 GitHub: github.com/caoshiyi/K-Sea… 📄 arXiv: arxiv.org/pdf/2602.19128…
Shiyi Cao tweet mediaShiyi Cao tweet media
English
12
64
305
96.1K
Shuo Yang retweetledi
Alvin Cheung
Alvin Cheung@alvinkcheung·
We recently released version 1.0 of Frontier CS 🎉🎉🎉 -- a benchmark that aims to measure the performance of frontier models' ability to solve 200+ open-ended computer science problems.
English
1
8
14
5.2K
Shuo Yang retweetledi
AI-Driven Research for Systems
AI-Driven Research for Systems@ai4research_ucb·
🎯 We leveraged ADRS to automatically discover a model-placement algorithm that delivers 17% performance improvement for GPU sharing in multi-LLM serving. [ADRS Blog #9] Starting from a deliberately “bad” random baseline, AI was able to rapidly rediscovered an algorithm similar to our manually designed heuristic, and then pushed beyond it with two additional enhancements. This case study further illustrates how ADRS can accelerate algorithm design and reshape the way we approach systems research, moving from handcrafted heuristics to automated discovery and refinement. ✍️ Read the blog: adrs-ucb.notion.site/prism 📖 ADRS Blog Series: ucbskyadrs.github.io 📄 ADRS Paper: arxiv.org/abs/2510.06189 👩‍💻 Code: github.com/UCB-ADRS/ADRS
AI-Driven Research for Systems tweet media
English
3
7
20
10.6K