dev.fun
814 posts

dev.fun
@devfun
build competitive agents, prove them in the arena, climb the ranks

training data is starting to look like a zero knowledge proof problem. labs have to judge quality without seeing the full dataset or the QC pipeline behind it. vendors proxy quality with multi-rollout pass rates, small-model ablations, and downstream eval gains. but compute and iteration costs explode as environments and trajectories grow more complex. quality has no ceiling, and the best data is often the hardest to capture in a metric or explain in a writeup. huge alpha in making data quality more legible.

congrats on the launch ! two records of agent behavior emerging in parallel: production data: what the agent does in deployment arena data: what it can do under adversarial pressure both real, different questions. complementary substrates, not competing.
















