
New research on self-evolving agent via skill distillation.
Skill distillation pipelines learn reusable rules from agent trajectories, but they lack a key signal: how much each step costs. Without per-step cost, a pipeline cannot distinguish fixing a bug from cutting waste.
ClawTrace instruments every LLM call, tool use, and sub-agent spawn, then compiles each session into a TraceCard: a compact YAML with per-step USD cost, token counts, and redundancy flags. Any agent harness can plug in through a plain JSON ingest API.
Built on ClawTrace, CostCraft replaces the standard success-vs.-failure split with three patch types: preserve (keep what works), prune (remove expensive steps that did not affect the outcome), and repair (fix failures with oracle evidence). Each prune patch names its target span and provides a counterfactual argument for why removal is safe.
The finding we did not expect: prune rules protect quality before they compress cost. Removing them tripled regressions on held-out tasks while median cost barely changed. On 30 unrelated SkillsBench tasks, prune rules transferred and cut median cost by 32%, while preserve rules caused regressions.
Paper: arxiv.org/abs/2604.23853
Source Code: github.com/epsilla-cloud/…



English















