Appen Research

4 posts

Appen Research banner
Appen Research

Appen Research

@AppenResearch

Human data for frontier AI. Research and insights from Appen.

Katılım Mayıs 2026
1 Takip Edilen91 Takipçiler
Appen Research retweetledi
Alexander Whedon
Alexander Whedon@alex_whedon·
We've partnered with Appen to evaluate the benchmarks we published last week. Results are in and we've actually improved across the board. Link below to the full report.
Appen Research@AppenResearch

@AppenResearch independently evaluated @subquadratic's SSA kernel - a learned sparse attention mechanism designed to reduce the quadratic scaling limitations of full attention. Results at 1M-token context lengths: - 56.2× wall clock speedup vs. FA2 - 62.8× FLOP reduction (validated via torch.profiler, <4% variance from theoretical) - 95.6% average score across RULER tasks at 128K - 86.2% average score on the hardest MRCR 8-needle bucket (512K–1M contexts) - 81.8% SWE-Bench Verified resolved rate Full report: appen.com/whitepapers/be…

English
17
16
105
25.5K
Appen Research
Appen Research@AppenResearch·
@AppenResearch independently evaluated @subquadratic's SSA kernel - a learned sparse attention mechanism designed to reduce the quadratic scaling limitations of full attention. Results at 1M-token context lengths: - 56.2× wall clock speedup vs. FA2 - 62.8× FLOP reduction (validated via torch.profiler, <4% variance from theoretical) - 95.6% average score across RULER tasks at 128K - 86.2% average score on the hardest MRCR 8-needle bucket (512K–1M contexts) - 81.8% SWE-Bench Verified resolved rate Full report: appen.com/whitepapers/be…
Appen Research tweet media
English
3
17
72
35.8K
Appen Research
Appen Research@AppenResearch·
"Benchmaxxing" is quietly breaking ASR evaluation. Models tuned to climb public leaderboards don't generalise to real speech; they're optimised for the test, not the task. Our new whitepaper covers how to build benchmarks that are actually production-representative (and benchmaxxing-resistant). appen.com/whitepapers/pr… #ASR #SpeechAI #MachineLearning
Appen Research tweet media
English
0
1
1
524
Appen Research
Appen Research@AppenResearch·
ASR leaderboards are being gamed. We partnered with @huggingface to add private benchmark data to the Open ASR Leaderboard - held back from developers so rankings reflect real-world performance. 🔒 appen.com/blog/hugging-f…
Appen Research tweet media
English
0
1
2
338