Meridian أُعيد تغريده

New paper from @scale_AI & @MeridianAgent: SpreadsheetArena 📄
We evaluated 16 LLMs on end-to-end spreadsheet generation via 4,300+ blind pairwise votes.
Crucially, we move beyond scalar Elo ratings to decompose the latent preference signal into functional, structural, and stylistic components. 🧵
Spreadsheet Arena@sheetarena
Spreadsheets have entered the arena! ⚔️ Announcing Spreadsheet Arena, the first research platform for human preference rankings on LLM-generated spreadsheets. The results? @AnthropicAI Claude Opus is on top, but the gap is tighter than you’d think. w/ @LTIatCMU, @Cornell, and @scale_ai. 🧵
English



