samarth @ ICLR
165 posts

samarth @ ICLR
@samarth__go
building @harvey // prev. hrt, imc, goldman sachs, avalanche

New @ScaleAILabs Research: Your AI agent just gave you an answer but did it actually solve the problem, get lucky, or just sound right? Today’s benchmarks can’t tell. We built HiL-Bench (Human-in-Loop Benchmark) to test a critical skill: does your agent know what it’s missing and when to ask for clarification? 🧵



In legal work, outcomes depend on the quality and coverage of your knowledge sources. In this blog, @samarth__go and Christopher Bello explain how we built the Data Factory to discover authoritative legal sources, validate them for compliance, and test real legal reasoning at scale. The result: Harvey scaled from six jurisdictions to 60+, and from 20 legal data sources to 400+. Full breakdown: harvey.ai/blog/using-age…























