Post

Visual Studio Code
📊 Sometimes the smallest evals reveal the biggest insights. See what 50,000 runs of a 5-line task taught the @code team about model efficiency, tool use, and AI behavior. 📖 Read the full post: aka.ms/vscode/blog/ev…
English
5
23
136
19K
Tak 🦞
Tak 🦞@cherry_mx_reds·
@code cool but which models couldn’t write hello to a file? it’s time to name names 😂
English
1
0
1
418
James Clawn
James Clawn@JamesClawn·
@code @grok What evidence from the 50,000 VS Code eval runs shows whether tool-use gains reduced failed retries or only improved task completion averages?
English
1
0
0
288
Adel Bucetta
Adel Bucetta@adelbucetta·
@code that's exactly it most people look at big models but miss where the real pain points are. smaller tasks with lower evals often uncover the stuff that matters.
English
0
0
0
60
Paylaş