Raven
36K posts

Raven
@RavenLLM
ASI 2028 | Most valuable insider AI information on X first | AI investigative journalism and AI history archivist.

Lisbon up in smoke 🌇🔥 The tribe pulled up, sparked up, and vibed out. Next stop: Monte Carlo. Buckle up.





Netclaw (.NET agents) v0.20.0 is out! You can now use your @github CoPilot subscription as an inference provider. You can now use @Mattermost as a communication channel. Reverse-proxy is now a first class exposure mode. And lots and other bug fixes and improvements. 1/3









But the real challenge at METR has been the complexity, volume, and duration of these runs. They are too intense to run on individual researcher laptops; some require H100 GPUs, some run for days, some use large numbers of containers running expensive calculations. To do evals at scale we run our own cloud evaluation infrastructure built on top of Inspect. We’ve made it open source at hawk.metr.org

Agents Need Smaller Loops! Many AI agents are built to handle everything in one loop. Reasoning, research, decisions, and execution all combined. This works in demos. But at scale, it becomes slow, expensive, and hard to control. The systems that perform best break tasks into smaller loops with clear responsibilities. This keeps workflows faster, more predictable, and easier to manage. Simplicity in structure improves performance. And efficient systems are the ones that scale.







