Michael Becker
4.5K posts

Michael Becker
@beckerfuffle
Data Scientist @PennMedicine working to improve medicine by predicting the future! https://t.co/x7INgZwhhN. Also I started DataPhilly https://t.co/JzMnoRgIIg. Amateur #AIArtist

Does style matter over substance in Arena? Can models "game" human preference through lengthy and well-formatted responses? Today, we're launching style control in our regression model for Chatbot Arena — our first step in separating the impact of style from substance in rankings. Highlights: - GPT-4o-mini, Grok-2-mini drop below most frontier models when style is controlled - Claude 3.5 Sonnet, Opus, and Llama-3.1-405B rise significantly - In Hard Prompts, Claude 3.5 Sonnet ties for #1 with ChatGPT-4o-latest. Llama-405B climbs to joint #3. More analysis in the thread below👇




1. Claude 3.5 Sonnet is now #1 in Instruction Following on the SEAL leaderboards (scale.com/leaderboard) 🏆

GraphRAG, a graph-based approach to retrieval-augmented generation (RAG) that significantly improves question-answering over private or previously unseen datasets, is now available on GitHub. Learn more. msft.it/6010l8lew











