
another day of wondering whether the efficiency gains from coding agents really justify the amount of time I spend undoing their bullshit design choices
Jack Friedson
125 posts

@JackFriedson
building something new · prev infra/product eng @haizelabs, applied AI @datadog

another day of wondering whether the efficiency gains from coding agents really justify the amount of time I spend undoing their bullshit design choices

A million people have rightly dunked on this guy, & I don’t care, I’m going to do it too, bc these people should have their catastrophic and massively consequential failures in judgement shoved in their faces forever. Sometimes a dog doesn’t learn unless you rub it’s face in it.

Your RL post-training may be sabotaging your LLM’s test-time scaling! Conventional RL pretends that you can collapse all reward signals *upfront* into a single *scalar reward*. We introduce Vector Policy Optimization (VPO), which natively maximizes *vector-valued* rewards, boosting test time search performance, even on the original scalar.











Can LLMs adapt continually without losing base skills? Fast-Slow Training (FST) pairs "slow" weights with "fast" context. FST vs. RL: • 3x more sample-efficient • Higher performance ceiling • Less KL drift (better plasticity) • Continual learning: succeeds where RL stalls

Great example of why you should 1. Run your agent on a separate machine from the sandbox it uses (e.g. sandbox as a tool) 2. Never set env vars in your sandbox. Instead, use something like LangSmith’s sandbox proxy auth (reqs are intercepted as they leave the sandbox and secrets are injected, that way the secret never enters the sandbox)


The big problem with getting any kind of substantive work done right now is that it feels so much more rational to work on the factory that produces the substantive work, instead of the work itself. It’s like the woodsman who was once asked: “What would you do if you had just five minutes to chop down a tree?" The woodsman: "I would spend the first two and a half minutes sharpening my axe.” Well, what if you only had 30 years left to achieve greatness? "You should probably spend 15 years building the most effectively calibrated and defensible, recursively self-improving systems for the production of great work.” AI psychosis or a perfectly rational strategy?

How tf is this even possible