Sachin Iyer
10 posts

Sachin Iyer
@sachiniyergreen
Making claude mad @ Detail

Sold a car to Carvana today and I have no idea how they stay in business. Inexplicable.


people are walking around with their laptops slightly ajar to keep their agents running

new walk of shame: agent still working, but the cafe closed

We ran a chess tournament for bugs. The question we wanted to answer: are bugs from Detail "important"? How do they compare to what code review bots catch? One of the most important ways we benchmark ourselves is that we want the bugs we generate to be significantly more important than the typical comment from a code review bot. We took a week of findings from CR bots running on OpenClaw and vLLM, plus findings from Detail on the same week of changes. We put them through an LLM-as-judge tournament. We fed the head-to-head results into a Bradley-Terry model to compute ELO ratings for bugs. Out comes a global ranking from most to least important. Awesome exploration from @sachiniyergreen below, with methodology, charts, and a PostHog secret exfiltration vuln that four code review bots missed.