dei
1.1K posts

dei
@parcadei
venture altruist. token aficionado. harness hacker. code jockey.

Okay is it just me or is the new Claude zoomer? Like it's lazy and neurotic and self-righteous when you ask it to actually do stuff it gets mad at you


I had hoped some AI folks would prove me wrong and that you can indeed go to bed and have "agents running while you sleep". I'd love that. All I got was a bunch of vague posts, claims from folks who are "totally doing it" or "have a friend who does this all the time". Lots of anonymous anime accounts. Lots of folks butthurt by me merely asking for something more credible than "trust me bro". I was expecting links to videos or posts from credible developers explaining how they're making it happen. I mean, stuff like what @mitsuhiko or @badlogicgames put out here all the time about how they work and which tools they use. But nope. Crickets. x.com/hunvreus/statu…

Claude voice mode is incredibly bad. It feeds what it says from speakers back into microphone and interrupts itself with what it just said constantly:





How to run agents overnight 101 There's a difference between *running* agents overnight and running *agents* overnight. The difference is the loop you put them in The first loop is closed. Come morning, the loop has resolved and you wake up to an answered question, a new feature, a completed project. There are discrete tasks that stack one after the other, and your job is to launch the agents down that path. The second loop is open; come daybreak, the question is still unanswered, but the search space has shrunk or the system has been further optimised on whatever metric you're measuring. For those more astute, you'll have noticed: the former is exploit, the latter is explore. Both come with their own challenges. Closed loops assume the answer is known and all that's required is the prerequisite work to make it true. Open loops assume you don't know what you're looking for until you find it. Now, the earlier RALPH craze, is an example of former, and basically a "while loop". The agent chips away while some condition isn't true, until it is. Earlier versions weren't designed great and brute-forced the problems to completion. No feedback. Just an expensive token-burning black hole. But from what I've seen most have to round to the beauty of a feedback loop. Because designing a real loop means designing feedback. For a closed loop you have to manage context, tokens (if you fear the wrath of a usage limit), and the actual aspect to tackle... getting the goal achieved, or at worst closer to it, by morning. Context is the most important of these. If it's wrong or unstructured, a single agent making a few changes cascades into catastrophe and you wake up to burnt tokens, a broken codebase and a raging desire to throw your machine at the wall. The fix, in its simplest form, is a relay system. You begin by decomposing the task, hand it to an orchestrator who it manages the process. This is the easiest part. The harder part is that the output comes out as middling if you're unlucky, or "good but not great" if you are, And that's a sign that you weren't upfront or clear about the objective. Open loops are more forgiving but no easier. You're optimising toward a metric or shrinking a search space, and the same principle applies: you still need a relay to manage context. Karpathy's AutoResearch is one flavour of this. Without a relay system in place, you'll wake up to find that instead of researching or optimising, the agents decided to redo everything you'd already done or the reward hack their way to goal. The reason it's called a relay is that each agent is passing a baton to the next one. That baton is usually a structured handoff that explains what has been done and what's left. One of the key things to understand is that it's multiple loops. At the highest level, the orchestrator loops around the validators, who loop around the workers, who loop around the structured plan given to them by you, who is a strange loop. (hehe) It all culminates in loop design. The end goal of every loop is to be closed. The question to ask is: in 6–8 hours, how do I aim to close this loop? If I can't, what progress am I willing to be happy with? Philosophy out of the way. How do you actually do it? Here's a few examples to study: github.com/karpathy/autor… (open loop, optimising a metric overnight) github.com/parcadei/Conti… (closed loop, orchestrator + validators + atomic workers) github.com/snarktank/ralph (RALPH pattern with feedback loops and per-iteration memory) The reason there is no universal overnight system is that "overnight" isn't the problem. You run a loop when you need a task to be done. The idea that agents will run 24/7 without management is a different problem and a much harder one at that, because that's where you have to tackle memory and learning at scale. Further, your system has to be able to regulate all the variety it will come to face. That's worth a massive post of its own. The problem isn't that we can't run agents overnight, it's that the texture of the task will often surface interesting problems that need to be resolved, and were never tackled prior to launch. i.e. The task decomposition wasn't atomic enough, or when researching, they ingested information that led them down the wrong path. Or you didn't realise there was there information that existed in your mind that you assumed the agents would know, and instead they spent too long on the wrong thing. And so on.

















