ChonLam Lao retweetledi

🚀 Excited to release mKernel: a set of fast multi-node, multi-GPU fused kernels.
💻 Code: github.com/uccl-project/m…
📝 Blog: uccl-project.github.io/posts/mkernel/
mKernel fuses compute + communication into one persistent GPU kernel, covering both intra/inter-node with GPU-initiated communication.
Amazing team: @yangzhouy, Chon Lam Lao, Costin Raiciu, Scott Shenker, @istoica05

English


