Ran Levinstein nag-retweet

Accelerate your transformer model with the new Block-Sparse-Flash-Attention! github.com/Danielohayon/B…
This training-free, drop-in replacement extends FlashAttention-2 with minimal code changes (CUDA Kernels Included). Paper: arxiv.org/abs/2512.07011
English




