
Alibaba Group
9.9K posts

Alibaba Group
@AlibabaGroup
Driven by passion and imagination, we are at the forefront in AI, Cloud, and e-commerce. Join us on this journey! 🚀✨













🚀 Introducing FlashQLA: high-performance linear attention kernels built on TileLang. ⚡ 2–3× forward speedup. 2× backward speedup. 💻 Purpose-built for agentic AI on your personal devices. 💡Key insights: 1. Gate-driven automatic intra-card CP. 2. Hardware-friendly algebraic reformulation. 3. TileLang fused warp-specialized kernels. FlashQLA boosts SM utilization via automatic intra-device CP. The gains are especially pronounced for TP setups, small models, and long-context workloads. Instead of fusing the entire GDN flow into a single kernel, we split it into two kernels optimized for CP and backward efficiency. At large batch sizes this incurs extra memory I/O overhead vs. a fully fused approach, but it delivers better real-world performance on edge devices and long-context workloads. The backward pass was the hardest part: we built a 16-stage warp-specialized pipeline under extremely tight on-chip memory constraints, ultimately achieving 2×+ kernel-level speedups. We hope this is useful to the community!🫶🫶 Learn more: 📖 Blog: qwen.ai/blog?id=flashq… 💻 Code: github.com/QwenLM/FlashQLA






Do you ever notice the magic hiding in the cracks of reality? Let your wildest thoughts run free. Don't ignore it — the key to infinity is already in your hands. You are the chosen dreamer. Now, ride your imagination, and let it reshape your world. 🐴 “Who Let the Horse Out”, Co-created by DIGITAI and HappyHorse 1.0 🔗happyhorse.com from Alibaba-ATH #happyhorse #HappyHorseAI

