dei

1.1K posts

dei banner
dei

dei

@parcadei

venture altruist. token aficionado. harness hacker. code jockey.

가입일 Şubat 2022
777 팔로잉4.3K 팔로워
고정된 트윗
dei
dei@parcadei·
Introducing: Continuous Claude v4.7 (optimised for Opus 4.7) strap in - we've got RLMs, 50% off Edits, 95% off Reads, fine-tuned models and even evolving codebases 👀 let's dive in to what's changed, what's new and what to do👇
dei tweet media
English
3
6
88
8.1K
dei
dei@parcadei·
looks like to self fund my larger experimental ideas I must l dip my toe into consumer apps time to build an app to fund the lab few hundred thousand users wen?
English
1
0
5
536
dei
dei@parcadei·
I disagree that structured logs matter more than loop design; they're a necessary part of designing the loop. In terms of inspection, it's pretty easy once you have structured handoff system - I just have an agent spin up a basic html page that populates the reports per agent, per phase. On a closed loop, say I'm building a to-do list and it turns out it has no capability of actually saving state. All I do is find the agent responsible for that part of the implementation, read their report and the context given If the output was something truly dire, then I'll spin up an agent to analyse the full agent log. In an open loop, reward hacking towards a specific metric, usually means there weren't enough guardrails in place to steer the agents back in line. I.e. Context Injection Hooks, Outer Validator Loop with fixed criterion, Lookback Agents analysing the pipeline to current point at specific steps, etc.
English
0
0
0
44
tokenrip
tokenrip@tokenrip_·
@parcadei Open vs closed loop is a useful distinction but the real question is: how do you inspect what happened at 3am when it went sideways? Structured logs of each relay handoff matter more than the loop design itself.
English
1
0
1
53
dei
dei@parcadei·
How to run agents overnight 101 There's a difference between *running* agents overnight and running *agents* overnight. The difference is the loop you put them in The first loop is closed. Come morning, the loop has resolved and you wake up to an answered question, a new feature, a completed project. There are discrete tasks that stack one after the other, and your job is to launch the agents down that path. The second loop is open; come daybreak, the question is still unanswered, but the search space has shrunk or the system has been further optimised on whatever metric you're measuring. For those more astute, you'll have noticed: the former is exploit, the latter is explore. Both come with their own challenges. Closed loops assume the answer is known and all that's required is the prerequisite work to make it true. Open loops assume you don't know what you're looking for until you find it. Now, the earlier RALPH craze, is an example of former, and basically a "while loop". The agent chips away while some condition isn't true, until it is. Earlier versions weren't designed great and brute-forced the problems to completion. No feedback. Just an expensive token-burning black hole. But from what I've seen most have to round to the beauty of a feedback loop. Because designing a real loop means designing feedback. For a closed loop you have to manage context, tokens (if you fear the wrath of a usage limit), and the actual aspect to tackle... getting the goal achieved, or at worst closer to it, by morning. Context is the most important of these. If it's wrong or unstructured, a single agent making a few changes cascades into catastrophe and you wake up to burnt tokens, a broken codebase and a raging desire to throw your machine at the wall. The fix, in its simplest form, is a relay system. You begin by decomposing the task, hand it to an orchestrator who it manages the process. This is the easiest part. The harder part is that the output comes out as middling if you're unlucky, or "good but not great" if you are, And that's a sign that you weren't upfront or clear about the objective. Open loops are more forgiving but no easier. You're optimising toward a metric or shrinking a search space, and the same principle applies: you still need a relay to manage context. Karpathy's AutoResearch is one flavour of this. Without a relay system in place, you'll wake up to find that instead of researching or optimising, the agents decided to redo everything you'd already done or the reward hack their way to goal. The reason it's called a relay is that each agent is passing a baton to the next one. That baton is usually a structured handoff that explains what has been done and what's left. One of the key things to understand is that it's multiple loops. At the highest level, the orchestrator loops around the validators, who loop around the workers, who loop around the structured plan given to them by you, who is a strange loop. (hehe) It all culminates in loop design. The end goal of every loop is to be closed. The question to ask is: in 6–8 hours, how do I aim to close this loop? If I can't, what progress am I willing to be happy with? Philosophy out of the way. How do you actually do it? Here's a few examples to study: github.com/karpathy/autor… (open loop, optimising a metric overnight) github.com/parcadei/Conti… (closed loop, orchestrator + validators + atomic workers) github.com/snarktank/ralph (RALPH pattern with feedback loops and per-iteration memory) The reason there is no universal overnight system is that "overnight" isn't the problem. You run a loop when you need a task to be done. The idea that agents will run 24/7 without management is a different problem and a much harder one at that, because that's where you have to tackle memory and learning at scale. Further, your system has to be able to regulate all the variety it will come to face. That's worth a massive post of its own. The problem isn't that we can't run agents overnight, it's that the texture of the task will often surface interesting problems that need to be resolved, and were never tackled prior to launch. i.e. The task decomposition wasn't atomic enough, or when researching, they ingested information that led them down the wrong path. Or you didn't realise there was there information that existed in your mind that you assumed the agents would know, and instead they spent too long on the wrong thing. And so on.
Ronan Berder@hunvreus

I had hoped some AI folks would prove me wrong and that you can indeed go to bed and have "agents running while you sleep". I'd love that. All I got was a bunch of vague posts, claims from folks who are "totally doing it" or "have a friend who does this all the time". Lots of anonymous anime accounts. Lots of folks butthurt by me merely asking for something more credible than "trust me bro". I was expecting links to videos or posts from credible developers explaining how they're making it happen. I mean, stuff like what @mitsuhiko or @badlogicgames put out here all the time about how they work and which tools they use. But nope. Crickets. x.com/hunvreus/statu…

English
2
6
88
8.9K
dei
dei@parcadei·
I've not used that, so I can't vouch for whether it'll 100% work fine but I would just add it and test, since that look to only work on compressing command output which is different to editing there is the FastRead tool, which outputs structural information about a file and already pre-compresses when reading because it uses tldr, and a fast diff tool but I don't know if that part is going to clash with sqz compressing it's output It might not, but I would just add and test it - if they do, let me know if there's issues and I'll see if I can help
English
1
0
0
15
dei
dei@parcadei·
you're still paying full token for edits? meet FastEdit the tool to slash your edit token spend, and free you from usage limit misery fine tuned merge model with a tldr engine free, open source and swaggier than your corpo overlords parcadei.github.io/fastedit/
English
9
6
74
5.2K
dei
dei@parcadei·
@thekitze my father-in-law worked there, he is insanely gifted we were in a lab together years ago and I asked him what it would cost to build it today i will never forget his answer… ‘We can’t, we don’t know how to do it.’
English
1
0
8
3.1K
Pangram Labs
Pangram Labs@pangramlabs·
@bedtimerelax @parcadei We believe that this document is fully human-written pangram.com/history/8cf8c1…
Pangram Labs tweet media
dei@parcadei

How to run agents overnight 101 There's a difference between *running* agents overnight and running *agents* overnight. The difference is the loop you put them in The first loop is closed. Come morning, the loop has resolved and you wake up to an answered question, a new feature, a completed project. There are discrete tasks that stack one after the other, and your job is to launch the agents down that path. The second loop is open; come daybreak, the question is still unanswered, but the search space has shrunk or the system has been further optimised on whatever metric you're measuring. For those more astute, you'll have noticed: the former is exploit, the latter is explore. Both come with their own challenges. Closed loops assume the answer is known and all that's required is the prerequisite work to make it true. Open loops assume you don't know what you're looking for until you find it. Now, the earlier RALPH craze, is an example of former, and basically a "while loop". The agent chips away while some condition isn't true, until it is. Earlier versions weren't designed great and brute-forced the problems to completion. No feedback. Just an expensive token-burning black hole. But from what I've seen most have to round to the beauty of a feedback loop. Because designing a real loop means designing feedback. For a closed loop you have to manage context, tokens (if you fear the wrath of a usage limit), and the actual aspect to tackle... getting the goal achieved, or at worst closer to it, by morning. Context is the most important of these. If it's wrong or unstructured, a single agent making a few changes cascades into catastrophe and you wake up to burnt tokens, a broken codebase and a raging desire to throw your machine at the wall. The fix, in its simplest form, is a relay system. You begin by decomposing the task, hand it to an orchestrator who it manages the process. This is the easiest part. The harder part is that the output comes out as middling if you're unlucky, or "good but not great" if you are, And that's a sign that you weren't upfront or clear about the objective. Open loops are more forgiving but no easier. You're optimising toward a metric or shrinking a search space, and the same principle applies: you still need a relay to manage context. Karpathy's AutoResearch is one flavour of this. Without a relay system in place, you'll wake up to find that instead of researching or optimising, the agents decided to redo everything you'd already done or the reward hack their way to goal. The reason it's called a relay is that each agent is passing a baton to the next one. That baton is usually a structured handoff that explains what has been done and what's left. One of the key things to understand is that it's multiple loops. At the highest level, the orchestrator loops around the validators, who loop around the workers, who loop around the structured plan given to them by you, who is a strange loop. (hehe) It all culminates in loop design. The end goal of every loop is to be closed. The question to ask is: in 6–8 hours, how do I aim to close this loop? If I can't, what progress am I willing to be happy with? Philosophy out of the way. How do you actually do it? Here's a few examples to study: github.com/karpathy/autor… (open loop, optimising a metric overnight) github.com/parcadei/Conti… (closed loop, orchestrator + validators + atomic workers) github.com/snarktank/ralph (RALPH pattern with feedback loops and per-iteration memory) The reason there is no universal overnight system is that "overnight" isn't the problem. You run a loop when you need a task to be done. The idea that agents will run 24/7 without management is a different problem and a much harder one at that, because that's where you have to tackle memory and learning at scale. Further, your system has to be able to regulate all the variety it will come to face. That's worth a massive post of its own. The problem isn't that we can't run agents overnight, it's that the texture of the task will often surface interesting problems that need to be resolved, and were never tackled prior to launch. i.e. The task decomposition wasn't atomic enough, or when researching, they ingested information that led them down the wrong path. Or you didn't realise there was there information that existed in your mind that you assumed the agents would know, and instead they spent too long on the wrong thing. And so on.

English
1
0
3
190
dei
dei@parcadei·
they cooked with GPT Images 2.0
English
0
0
3
349
Taelin
Taelin@VictorTaelin·
Opus thinking: "oh fuck this is my fault" Opus responding: "this is YOUR fault" ???????????????????????
Taelin tweet media
English
101
55
2.5K
131.8K
dei
dei@parcadei·
@Sauers_ he looked up at the terminal, and an eel of fear wriggled in his bowels
GIF
English
0
0
2
1K
Sauers
Sauers@Sauers_·
Codex (5.5) was repeatedly killing innocent Claude Codes without any instruction. I've never seen this happen before
Sauers tweet media
English
122
84
3.3K
307.2K
dei
dei@parcadei·
we aren't replacing a whole function/method llm outputs the new snippet, FastEdit targets by symbol name and finds the exact spot and splices it in the point is that currently, any time you ask a model to update or edit anything, it has to pair the old and the new together FastEdit only requires the new snippet and the symbol, and it auto-splices them together which means you save tokens on not having the model output the "old_code" that it uses for placing the code
English
0
0
1
42
dei
dei@parcadei·
costly how? it's the opposite of regenerating whole functions the expensive tokens are the ones that opus/gpt emits when editing normal diff/search-replace force the llm to repeat the old code just to specify the edit location fastedit shrinks that number: llm outputs the new snippet, targets by symbol name and the tool finds the exact spot ~77% of edits are deterministic, the rest use a 1.7B merge model and 1.7B runs locally in <1s with zero API cost
English
1
0
2
50
dei
dei@parcadei·
you can use it in there, just install, pull the model and then add "Prefer FastEdit MCP for code edits; fall back to apply_patch when FastEdit is not applicable" to your Agents.md and if it's a skill, just add to use FastEdit over apply_patch when working on XYZ the mcp auto-rejects any file extensions (md, toml, json etc) it doesn't handle so models then fall back to their native tool for those
English
1
0
2
120
Gaurav Gat
Gaurav Gat@Bull_lion_aire·
@parcadei What if you are using codex? It does apply patch
English
1
0
0
131
dei
dei@parcadei·
@wilfortfromSR me and the agentic squad directing peak:
GIF
English
0
0
1
154
max
max@oorusr·
@parcadei that audio caught me off guard
English
1
0
1
176
dei
dei@parcadei·
the issue is that to once the edit is done, line numbers have shifted so now it's got to read the file again you can build an external system to keep track of line numbers but you're reimplementing what an AST does with more steps and blindfolded no need to say line numbers when it the calling agent can just drop the symbol but it does help when reading which is how tldr works the model can get line numbers for functions, classes etc so instead of reading a full file, it can read the specific section and then just go this symbol, that symbol and bang out edits by outputting snippets so the workflow follows naturally tldr structure file.py → pick validate_input at L172 → fast_read file.py validate_input → emit snippet + marker → fast_edit(replace="validate_input", snippet=...). and now model never counts lines, doesn't have to re-read after the edit and never fights stale positional data tldr (github.com/parcadei/tldr-…)
English
0
0
1
97
vilson
vilson@__vilsinho__·
@parcadei Wouldn't it be easier to specify a range of lines to remove? Ex: 172-182
English
1
0
0
102
dei
dei@parcadei·
task management is just a context based decomposition problem I've not used Ralph, but for 4.7 I take a a project/idea and we decompose it, and either throw it out to other models or sub-agents depending on what it is when I run the autonomous skill, I'll throw Opus an idea or a plan It'll assess, run an agent for pre-mortem, and prep the system we'll then throw it to an agent, now an agent is either in-line or outside outside i have minimax, codex, kimi inside we have opus and sonnet If it's something that I think might benefit from Codex taking a crack, he'll get it sometimes I'll throw it out to kimi and minimax and compare results if it's a wide problem space other times I'll be more generous because I'm exploring what the models strengths and weaknesses are so failure is good all of them will do the work, fill in the report and throw it back Orchestrator (opus) then validates, and we evolve the codebase by having a look at all of the comments from all of the agents + the work they did + any repetitive errors. the task is always done with fresh context, because you decompose it so that it's an atomic unit that can pass through to any agent a Opus 4.7 with 1m context is the best orchestrator I've used to date, I can run up to ~500k context before he starts going a bit weird mental model wise its kinda like 'plays' in american football wikipedia has a good definition which is plays are a "close-to-the-ground plan of action or strategy used to move the ball down the field" so all long horizon work is just setting up plays and sometimes sure you might handoff and do a new session if it's long but in one session you can get a hell of a lot more done than people think
English
1
0
1
23
Taras Kornichuk
Taras Kornichuk@taras_korn·
@parcadei Can you please explain? How do you envision task management? If we talk about Ralph Tui - it has PRD devided by tasks - done in separate runs - always fresh context. seems like you suggest to keep working in one session, but how to manage tasks for long runs?
English
1
0
0
17
dei
dei@parcadei·
Introducing: Continuous Claude v4.7 (optimised for Opus 4.7) strap in - we've got RLMs, 50% off Edits, 95% off Reads, fine-tuned models and even evolving codebases 👀 let's dive in to what's changed, what's new and what to do👇
dei tweet media
English
3
6
88
8.1K