Appen Research retweetledi

We've partnered with Appen to evaluate the benchmarks we published last week.
Results are in and we've actually improved across the board.
Link below to the full report.
Appen Research@AppenResearch
@AppenResearch independently evaluated @subquadratic's SSA kernel - a learned sparse attention mechanism designed to reduce the quadratic scaling limitations of full attention. Results at 1M-token context lengths: - 56.2× wall clock speedup vs. FA2 - 62.8× FLOP reduction (validated via torch.profiler, <4% variance from theoretical) - 95.6% average score across RULER tasks at 128K - 86.2% average score on the hardest MRCR 8-needle bucket (512K–1M contexts) - 81.8% SWE-Bench Verified resolved rate Full report: appen.com/whitepapers/be…
English



