Ryan Shar retweetledi
Ryan Shar
8 posts

Ryan Shar
@RyanShar01
Research Scientist @ Apple | CMU ML
Katılım Ağustos 2024
26 Takip Edilen7 Takipçiler
Ryan Shar retweetledi
Ryan Shar retweetledi
Ryan Shar retweetledi

Blog post on @CopilotArena out now!
ML@CMU@mlcmublog
blog.ml.cmu.edu/2025/04/09/cop… How do real-world developer preferences compare to existing evaluations? A CMU and UC Berkeley team led by @iamwaynechi and @valeriechen_ created @CopilotArena to collect user preferences on in-the-wild workflows. This blogpost overviews the design and deployment of Copilot Arena + new insights into developer code preferences.
English
Ryan Shar retweetledi

What do developers 𝘳𝘦𝘢𝘭𝘭𝘺 think of AI coding assistants?
In October, we launched @CopilotArena to collect user preferences on real dev workflows. After months of live service, we’re here to share our findings in our recent preprint.
Here's what we have learned /🧵

Arena.ai@arena
Introducing Copilot Arena - Interactive coding evaluation in the wild. Our extension lets you test top models for free, right in VSCode. Let's vote and build the Copilot leaderboard! Download here: marketplace.visualstudio.com/items?itemName… Led by @iamwaynechi and @valeriechen_ at CMU. 1/🧵
English
Ryan Shar retweetledi

When benchmarks talk, do LLMs listen?
Our new paper shows that evaluating that code LLMs with interactive feedback significantly affects model performance compared to standard static benchmarks!
Work w/ @RyanShar01, @jacob_pfau, @atalwalkar, @hhexiy, and @valeriechen_!
[1/6]

English
Ryan Shar retweetledi
Ryan Shar retweetledi

Which model is best for coding? @CopilotArena leaderboard is out!
Our code completions leaderboard contains data collected over the last month, with >100K completions served and >10K votes!
Let’s discuss our findings so far🧵

English




