Appen Research (@AppenResearch) - Twitter Profili

Appen Research retweetledi

We've partnered with Appen to evaluate the benchmarks we published last week. Results are in and we've actually improved across the board. Link below to the full report.

Appen Research@AppenResearch

@AppenResearch independently evaluated @subquadratic's SSA kernel - a learned sparse attention mechanism designed to reduce the quadratic scaling limitations of full attention. Results at 1M-token context lengths: - 56.2× wall clock speedup vs. FA2 - 62.8× FLOP reduction (validated via torch.profiler, <4% variance from theoretical) - 95.6% average score across RULER tasks at 128K - 86.2% average score on the hardest MRCR 8-needle bucket (512K–1M contexts) - 81.8% SWE-Bench Verified resolved rate Full report: appen.com/whitepapers/be…

English

17

16

105

25.5K

Appen Research@AppenResearch·4d

@AppenResearch independently evaluated @subquadratic's SSA kernel - a learned sparse attention mechanism designed to reduce the quadratic scaling limitations of full attention. Results at 1M-token context lengths: - 56.2× wall clock speedup vs. FA2 - 62.8× FLOP reduction (validated via torch.profiler, <4% variance from theoretical) - 95.6% average score across RULER tasks at 128K - 86.2% average score on the hardest MRCR 8-needle bucket (512K–1M contexts) - 81.8% SWE-Bench Verified resolved rate Full report: appen.com/whitepapers/be…

English

3

17

72

35.8K

Appen Research@AppenResearch·5d

"Benchmaxxing" is quietly breaking ASR evaluation. Models tuned to climb public leaderboards don't generalise to real speech; they're optimised for the test, not the task. Our new whitepaper covers how to build benchmarks that are actually production-representative (and benchmaxxing-resistant). appen.com/whitepapers/pr… #ASR #SpeechAI #MachineLearning

English

0

1

524

Appen Research@AppenResearch·7 May

ASR leaderboards are being gamed. We partnered with @huggingface to add private benchmark data to the Open ASR Leaderboard - held back from developers so rankings reflect real-world performance. 🔒 appen.com/blog/hugging-f…

English

0

1

2

338

Appen Research

Keşfet