
This was a fun side project. I used Dispatch in the Claude mobile app while i was traveling last week and built this harness for claude code that takes as input your task, generates a rubric for it, scores Claude’s outputs against the rubric and runs in a loop until the score plateaus. The eval shows a +20pp avg lift over baseline for tasks like writing an investment memo, writing a counter-argument to a claim, designing schema for a billing system and more. It’s available here: github.com/timwein/auto-v…
English


















