

Jonathan Lebensold
855 posts

@jonlebensold
Helping you hill-climb your agentic system @ Jetty. AI has an evaluation problem and I’m trying to fix it. PhD in privacy and ML, ex-Meta AI, Google.







Evals are the verifier for the agent building process. Once you have an eval, you can autonomously hill climb it with Meta-Harness (agents building agents). @karpathy’s autoresearch (agent building models) is another piece. To complete the full autonomous loop of agent development we need Meta-Measure / autobenchmark (agents building evaluations). We have started automating the task discrimination (quality control) part of the benchmark development process (in our efforts for Terminal-Bench 3.0), but the real unlock will be cracking task generation. DM me if you are interested in autobenchmarking and let’s jam on some ideas!














Unveiling our new startup Advanced Machine Intelligence (AMI Labs). We just completed our seed round: $1.03B / 890M€, one the largest seeds ever, probably the largest for a European company. We're hiring! [the background image is the Veil Nebula - a picture I took from my backyard, most appropriate for an unveiling] More details here: techcrunch.com/2026/03/09/yan…





Excited to kick off this year’s Systems Reading Group series with @harborframework and @terminalbench! Top frontier labs, data vendors, and AI cos are moving to Harbor for their RL infra and evals. Come by to learn why, and dive into key components of their architecture with creators @alexgshaw & @ryanmart3n! Sign up below for the event on 3/10 👉 luma.com/wkdfbw17

