Felipe Vallejo Uribe retweetledi

@AppenResearch independently evaluated @subquadratic's SSA kernel - a learned sparse attention mechanism designed to reduce the quadratic scaling limitations of full attention.
Results at 1M-token context lengths:
- 56.2× wall clock speedup vs. FA2
- 62.8× FLOP reduction

English






