
Emmett Chen-Ran
691 posts

Emmett Chen-Ran
@doubleemt
chief code deprecator @virioai prev: ff @southpkcommons, apm @salesforce, eng @stripe, cs @yale


Yesterday I interviewed @SeanZCai about AI data. This is essentially a guide for founders on how to sell data and RL envs to AI labs. "I've never seen a data contract get turned down by a top lab, if it's good quality data, for budget reasons." 00:00 What areas of data are underserved? 02:10 For bio data, is it real-world or purely digital? 04:21 For cyber data, which subsets are most underserved? 05:50 What is the sales process like? 07:04 Why would a lab not renew or increase their purchase volume? 10:13 When a researcher is exploring a new direction, what's the first step? 11:35 In robotics data, what do you view as underserved? 13:12 What does the initial data delivery look like, what format? 13:53 Do labs have more sophisticated internal setups for running environments? 14:32 Are the non-frontier labs buying off-the-shelf data from Anthropic / OpenAI vendors? 16:11 Do Anthropic data vendors put expiry timeframes on the exclusivity? 16:42 Are purchase decisions researcher-led? 17:41 Decagon, Sierra, Ramp: what kinds of data are they buying? 19:06 Long-term, when do labs still need to buy external data vs train on user traces? 21:15 Will end-vendor benchmarks shift to performance per dollar? 22:04 How many labs are spending at the 1B+/yr data level? 23:53 Delta between Anthropic's stated $1B and your 10-20B/lab number? 26:05 What makes inference providers / neoclouds a good fit to acquire RL env cos?








Introducing Claude Opus 4.7, our most capable Opus model yet. It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back. You can hand off your hardest work with less supervision.



















