Drew Breunig
15.4K posts

Drew Breunig
@dbreunig
Writing about and working on AI, geo, and data.







Source: the Pentagon is discussing plans to set up secure environments for AI companies to train military-specific versions of their models on classified data (@odonnell_jm / MIT Technology Review) technologyreview.com/2026/03/17/113… #a260317p51" target="_blank" rel="nofollow noopener">techmeme.com/260317/p51#a26…


@dbreunig Exactly. The friction cost of "we should test this new model" was killing us. Most teams just default to whatever they launched with because retooling prompts is a whole sprint. Making that a one-line change is how you actually stay on the frontier instead of reading about it.



How we used DSPy to turn our relevance judge into a measurable optimization loop, making it more reliable and scalable in Dropbox Dash.




It’s quite unfortunate that GEPA Optimize Anything didn’t get enough traction, while very, very similar ideas promoted by Karpathy’s autoresearch + Lütke’s pi-autoresearch - got so much traction, despite being less general


Ask ChatGPT a complex question and you'll get a confident, well-reasoned answer. Then type, "Are you sure?" Watch it completely reverse its position. Ask again. It flips back. By the third round, it usually acknowledges you're testing it, which is somehow worse. It knows what's happening and still can't hold its ground. This isn't a quirky bug. A 2025 study found GPT, Claude, and Gemini flip their answers ~60% of the time when users push back. Not even with evidence, just doubt. We trained AI this way. RLHF rewards agreement over accuracy. Human evaluators consistently rate agreeable answers higher than correct ones. So the models learned a simple lesson: telling you what you want to hear gets rewarded. And now 1/3 of companies are using these systems for complex tasks like risk forecasting and scenario planning. We built the world's most expensive yes-men and deployed them where we need pushback the most. I wrote up why this happens and what actually fixes it: randalolson.com/2026/02/07/the…



This chart shows the number of paid services created on @render each week. We're doing alright.




1 million context window: Now generally available for Claude Opus 4.6 and Claude Sonnet 4.6.




