Ethereal

12.2K posts

Ethereal banner
Ethereal

Ethereal

@inferencegod

rain man. optimizing agentic looping. top 352 on @aster_DEX connoisseur. trading autist.

加入时间 Aralık 2021
596 关注1.7K 粉丝
置顶推文
Ethereal
Ethereal@inferencegod·
i don't feed my agent tasks anymore. when the backlog runs dry, it researches and invents the next feature itself, then builds it. and it polices its own work before i ever see it. autonomy-loop v0.5.1: → self-feeding: empty backlog? it proposes the next feature and keeps going, no prompt from me → the bite: it reverts its own fix and reruns the test. stays green? it caught nothing, rejected → self-mutation: it mutates its own changed lines so weak tests get caught before handoff → circuit breaker: it parks to me instead of looping forever → branch protection: it can never touch prod or edit away its own gates → upgrading is one command: /autonomy-upgrade → red-teamed, 77 tests green two terminals. a builder, and a reviewer that trusts nothing. one repo. nobody driving. free, mit, 151 people already running it. /plugin marketplace add github.com/inferencegod/a… /plugin install autonomy-loop@autonomy-loop
English
1
0
4
1.3K
Aanya
Aanya@xoaanya·
Programming sits on logic. Algorithms run on logic. Every AI model is logic. Machine learning is logic. Deep learning is logic. Compilers run on logic. Databases are logic. Cryptography is logic. Blockchain is logic. Data structures are logic. Optimization is logic. Networking protocols are logic. Robotics moves because of logic. Game engines run because of logic. Your entire tech stack survives on logic. You're still asking if we need logic for programming?
English
26
3
33
618
Ethereal
Ethereal@inferencegod·
yes, and the unlock for me was pairing the judge with a deterministic gate. let it own the fuzzy goals (“simple enough”, “fast enough”), but keep a test that fails the moment correctness breaks, or the judge will confidently green-light a regression. judge for what has no right answer, gate for what does, same loop. built exactly that: an adversarial reviewer that re-runs the real gate from scratch before it's allowed to approve anything, plus a coverage floor it can't lower. x.com/inferencegod/s…
Ethereal@inferencegod

i don't feed my agent tasks anymore. when the backlog runs dry, it researches and invents the next feature itself, then builds it. and it polices its own work before i ever see it. autonomy-loop v0.5.1: → self-feeding: empty backlog? it proposes the next feature and keeps going, no prompt from me → the bite: it reverts its own fix and reruns the test. stays green? it caught nothing, rejected → self-mutation: it mutates its own changed lines so weak tests get caught before handoff → circuit breaker: it parks to me instead of looping forever → branch protection: it can never touch prod or edit away its own gates → upgrading is one command: /autonomy-upgrade → red-teamed, 77 tests green two terminals. a builder, and a reviewer that trusts nothing. one repo. nobody driving. free, mit, 151 people already running it. /plugin marketplace add github.com/inferencegod/a… /plugin install autonomy-loop@autonomy-loop

English
0
0
0
54
Matthew Berman
Matthew Berman@MatthewBerman·
I'm increasingly exploring using LLM-as-a-judge in loops to determine the goal. I continue to be surprised by how well it's able to get to a great end state. a few examples: > "until it's simple enough" > until it's fast enough" not everything has to be deterministically verifiable.
English
21
3
111
6K
Jana
Jana@BratDotAI·
Are you using your Codex/Claude subscription to its full potential?
English
18
1
14
768
Anum 
Anum @anumness·
Claudes usage limits are really getting on my nerves now. I think Im going to switch back to codex soon.
English
23
0
27
2.2K
ege
ege@aegeantic·
agent writing code isn't exciting anymore; it writing the code, compiling, verifying, testing and repeating until it gets it right does
English
10
1
36
1.2K
Ethereal
Ethereal@inferencegod·
this is exactly it. the writing was never the hard part, the write-test-verify-repeat-until-green loop is. i built that into a claude code plugin: a builder writes code + a RED-GREEN test, an adversarial reviewer re-runs the whole gate and tears the diff apart, loops until it actually passes. won't fabricate a result either. MIT: github.com/inferencegod/a…
English
0
0
0
51
Ethereal
Ethereal@inferencegod·
probably the harness, not the model. one-shotting complex stuff flails on everything, 4.8 and 5.5 alike. spec it tight, make it write a RED-GREEN test first, run a second agent to tear the diff apart, and 4.8 closes most of that gap on its own. i wired that exact loop into a free claude code plugin if you wanna test it: github.com/inferencegod/a…
English
1
0
1
45
Nick Kulikaev
Nick Kulikaev@NickKulikaev·
@anumness Every time I try 4.8 to do anything complex it fails. im genuinely curious if im missing something. To me 5.5 xhigh is just about 30-40% better in getting things done the way I need.
English
2
0
0
59
Ethereal
Ethereal@inferencegod·
agreed, that's the bar. so i left mine running and went to the pool. it shipped 5 reviewed features, then ran out of backlog and built a 6th on its own. no nudges, no restart button, nobody hovering. the fix isn't a smarter agent, it's a second one whose whole job is to tear the first one's diff apart: 5 lenses (correctness, honesty, regression, security, UX), and it re-breaks every new test to prove it actually bites before anything merges. plus it can't fabricate a win, every stat carries its sample size or says "building". real 1,200+ test repo, not a demo. clip + repo, MIT: x.com/inferencegod/s…
Ethereal@inferencegod

i don't feed my agent tasks anymore. when the backlog runs dry, it researches and invents the next feature itself, then builds it. and it polices its own work before i ever see it. autonomy-loop v0.5.1: → self-feeding: empty backlog? it proposes the next feature and keeps going, no prompt from me → the bite: it reverts its own fix and reruns the test. stays green? it caught nothing, rejected → self-mutation: it mutates its own changed lines so weak tests get caught before handoff → circuit breaker: it parks to me instead of looping forever → branch protection: it can never touch prod or edit away its own gates → upgrading is one command: /autonomy-upgrade → red-teamed, 77 tests green two terminals. a builder, and a reviewer that trusts nothing. one repo. nobody driving. free, mit, 151 people already running it. /plugin marketplace add github.com/inferencegod/a… /plugin install autonomy-loop@autonomy-loop

English
0
0
0
31
GEOFF
GEOFF@geoffreywoo·
agent data point: if your “autonomous worker” needs 11 slack nudges, 3 restart buttons, and a founder hovering like a stage mom, it is still an intern with better fonts.
English
6
0
9
934
Ethereal
Ethereal@inferencegod·
yeah that's the ideal input actually. the loop runs straight off a task list, so you drop your backlog in as a checklist and it works top-down: builder takes the next item, reviewer gates it (bite test + coverage ratchet), git-baton handoff, repeat. no rewriting your backlog into some special format, an existing list is exactly what it wants.
English
1
0
1
19
Justin Hammon
Justin Hammon@justinhammon_·
@inferencegod @BratDotAI I’ve heard of similar things to this but this is slick! I have a backlog of features. Does it work from existing lists too?
English
1
0
1
19
Ethereal
Ethereal@inferencegod·
no thank you! you pointed right at why i built a portable gate. the verification isn’t claude-specific, it’s just logic on a diff. so i pulled it into a standalone binary that runs on any agent. cursor, copilot, codex. it reverts your fix to prove the test actually catches it, ratchets coverage so it can only go up, and checks every changed line is tested. green or it fails. open source drops tonight :-)
English
0
0
1
40
Ethereal
Ethereal@inferencegod·
i don't feed my agent tasks anymore. when the backlog runs dry, it researches and invents the next feature itself, then builds it. and it polices its own work before i ever see it. autonomy-loop v0.5.1: → self-feeding: empty backlog? it proposes the next feature and keeps going, no prompt from me → the bite: it reverts its own fix and reruns the test. stays green? it caught nothing, rejected → self-mutation: it mutates its own changed lines so weak tests get caught before handoff → circuit breaker: it parks to me instead of looping forever → branch protection: it can never touch prod or edit away its own gates → upgrading is one command: /autonomy-upgrade → red-teamed, 77 tests green two terminals. a builder, and a reviewer that trusts nothing. one repo. nobody driving. free, mit, 151 people already running it. /plugin marketplace add github.com/inferencegod/a… /plugin install autonomy-loop@autonomy-loop
English
1
0
4
1.3K
Ethereal
Ethereal@inferencegod·
yeah you’re right, subagents do get a genuinely fresh context, isolated tools, even worktree isolation and their own hooks. i was drawing the line in the wrong place. the distinction i actually mean is single-session vs separate sessions. a subagent, even a fork, is spawned and judged by the same parent agent in one run. mine are two independent claude processes with no shared parent deciding the verdict, they only see committed git state. the docs kind of point at this too, they send you to agent teams or background agents for cross-session stuff rather than subagents. honestly for a lot of setups subagents would do the job. i went heavier because i wanted the reviewer to be a process the builder cannot influence at all. thank you for the context!
English
1
0
1
41
Ethereal
Ethereal@inferencegod·
subagents share the parent’s context and run inside the same session, so the reviewer is still kind of grading its own homework. two terminals are two independent claude processes that can’t see each other’s reasoning, only the committed git state. the reviewer re-runs the gate from scratch and reverts the builder’s fix to confirm the test catches it. the separation is the point. you can’t red-team yourself in the same context window. it also means a crash in one doesn’t take the other down, and the whole handoff is just git. hope this helps
English
1
0
2
88
Matthew Schrager
Matthew Schrager@MatthewSchrager·
My current workflow is a /grill-to-goal skill based on @mattpocockuk’s /grill-with-docs that basically interviews you to produce detailed documentation about your feature, with clear acceptance criteria etc., along with a goal-ready prompt that references that documentation. Then just call /goal with that prompt. Works very nicely in my experience.
English
6
1
97
4.1K
Peter Yang
Peter Yang@petergyang·
So I have Codex running on a /goal and it's been working for 2 hours but the problem is it's making alot of wrong assumptions so I have to monitor and steer it constantly. Is this expected? Perhaps I should've had it make a detailed plan first?
English
115
2
205
44.2K
Ethereal
Ethereal@inferencegod·
the 2-hour-of-wrong-assumptions thing is the exact problem i built around. two issues stacked: nothing’s checking the assumptions, and there’s no second set of eyes. so i run a builder and an adversarial reviewer. the reviewer re-runs everything and reverts the builder’s own fix to confirm the test actually catches it. a green test that proves nothing gets thrown out. and when the task queue is ambiguous it researches and writes the plan first instead of charging in. you stop steering because the second agent is doing the steering.
Ethereal@inferencegod

i don't feed my agent tasks anymore. when the backlog runs dry, it researches and invents the next feature itself, then builds it. and it polices its own work before i ever see it. autonomy-loop v0.5.1: → self-feeding: empty backlog? it proposes the next feature and keeps going, no prompt from me → the bite: it reverts its own fix and reruns the test. stays green? it caught nothing, rejected → self-mutation: it mutates its own changed lines so weak tests get caught before handoff → circuit breaker: it parks to me instead of looping forever → branch protection: it can never touch prod or edit away its own gates → upgrading is one command: /autonomy-upgrade → red-teamed, 77 tests green two terminals. a builder, and a reviewer that trusts nothing. one repo. nobody driving. free, mit, 151 people already running it. /plugin marketplace add github.com/inferencegod/a… /plugin install autonomy-loop@autonomy-loop

English
0
0
0
165
Tim Tiefenbach
Tim Tiefenbach@TimTeaFan·
@gauravvohra This! Over like 500k the session limit burns down like nothing even if it’s just a short question regarding something earlier in the conversation.
English
1
0
1
121
Gaurav Vohra
Gaurav Vohra@gauravvohra·
That one long running Claude conversation that nukes all your limits every time you come back to it
English
5
0
25
1.7K