
Shuo Yang
80 posts

Shuo Yang
@Andy_ShuoYang
2nd year phd at Berkeley; Efficient ML System;





New blog post: The Forgetting Wall in Video and World Models Long-horizon video generation is not just limited by compute. It is limited by how much of its own past the model can afford to remember. I wrote about why long videos drift, why KV cache becomes the memory bottleneck, and why compression is a key direction for future video/world models. haochengxi.github.io/posts/forgetti…


FlashLib update: we now support ANN search with IVF-Flat — up to 6.5× faster than cuVS on real-world vector workloads (SIFT-1M) while matching recall. LEANN now supports FlashLib as a backend: 26× faster build, 29× faster single-query, and 298× faster batch search. Huge thanks to @YichuanM for the help! We’re also opening Discord / Slack channels — join us to suggest new operators you want to see, and hardware backends you want FlashLib to support next! Slack: join.slack.com/t/flashml/shar… Discord: discord.gg/ce5Xa5pf


FlashLib update: we now support ANN search with IVF-Flat — up to 6.5× faster than cuVS on real-world vector workloads (SIFT-1M) while matching recall. LEANN now supports FlashLib as a backend: 26× faster build, 29× faster single-query, and 298× faster batch search. Huge thanks to @YichuanM for the help! We’re also opening Discord / Slack channels — join us to suggest new operators you want to see, and hardware backends you want FlashLib to support next! Slack: join.slack.com/t/flashml/shar… Discord: discord.gg/ce5Xa5pf

















