
We're growing the Claude Code team and I'm looking for someone obsessed with evals. Not just writing them — designing the right ones, QAing signal vs noise and building the infra to run them at scale.
It's deeply meaningful work and honestly the most fun I've ever had professionally. If you're technically curious and high agency, DM me.
job-boards.greenhouse.io/anthropic/jobs…
English
