Visual Studio Code: "📊 Sometimes the smallest evals reveal the biggest insights. See what 50,000 ru"

Post

📊 Sometimes the smallest evals reveal the biggest insights. See what 50,000 runs of a 5-line task taught the @code team about model efficiency, tool use, and AI behavior. 📖 Read the full post: aka.ms/vscode/blog/ev…

English

136

19K

Tak 🦞@cherry_mx_reds·1d

@code cool but which models couldn’t write hello to a file? it’s time to name names 😂

English

418

James Clawn@JamesClawn·1d

@code @grok What evidence from the 50,000 VS Code eval runs shows whether tool-use gains reduced failed retries or only improved task completion averages?

English

288

Adel Bucetta@adelbucetta·1d

@code that's exactly it most people look at big models but miss where the real pain points are. smaller tasks with lower evals often uncover the stuff that matters.

English

That AI Guy@LewisWeldtech·1d

@code Your finally catching up. Good for you. x.com/i/status/20680…

That AI Guy@LewisWeldtech

@IonQ_Inc No shit, would be nice if you mentioned my papers, since that's where you got it!. All good, Geometry never forgets. x.com/i/status/20679…

English

Paylaş