Ethereal

0

4

1.3K

Ethereal@inferencegod·4h

@xoaanya made something to alleviate that x.com/inferencegod/s…

i don't feed my agent tasks anymore. when the backlog runs dry, it researches and invents the next feature itself, then builds it. and it polices its own work before i ever see it. autonomy-loop v0.5.1: → self-feeding: empty backlog? it proposes the next feature and keeps going, no prompt from me → the bite: it reverts its own fix and reruns the test. stays green? it caught nothing, rejected → self-mutation: it mutates its own changed lines so weak tests get caught before handoff → circuit breaker: it parks to me instead of looping forever → branch protection: it can never touch prod or edit away its own gates → upgrading is one command: /autonomy-upgrade → red-teamed, 77 tests green two terminals. a builder, and a reviewer that trusts nothing. one repo. nobody driving. free, mit, 151 people already running it. /plugin marketplace add github.com/inferencegod/a… /plugin install autonomy-loop@autonomy-loop

English

8

Aanya@xoaanya·4h

@inferencegod Yea that definitely is the real job

English

Matthew Berman@MatthewBerman

0

18

Aanya@xoaanya·5h

Programming sits on logic. Algorithms run on logic. Every AI model is logic. Machine learning is logic. Deep learning is logic. Compilers run on logic. Databases are logic. Cryptography is logic. Blockchain is logic. Data structures are logic. Optimization is logic. Networking protocols are logic. Robotics moves because of logic. Game engines run because of logic. Your entire tech stack survives on logic. You're still asking if we need logic for programming?

English

26

3

33

618

Ethereal@inferencegod·4h

first open source project is going accordingly once you see it, you can’t go back! everyone needs judge now!🤞

I'm increasingly exploring using LLM-as-a-judge in loops to determine the goal. I continue to be surprised by how well it's able to get to a great end state. a few examples: > "until it's simple enough" > until it's fast enough" not everything has to be deterministically verifiable.

English

1

57

Ethereal@inferencegod·4h

yes, and the unlock for me was pairing the judge with a deterministic gate. let it own the fuzzy goals (“simple enough”, “fast enough”), but keep a test that fails the moment correctness breaks, or the judge will confidently green-light a regression. judge for what has no right answer, gate for what does, same loop. built exactly that: an adversarial reviewer that re-runs the real gate from scratch before it's allowed to approve anything, plus a coverage floor it can't lower. x.com/inferencegod/s…

i don't feed my agent tasks anymore. when the backlog runs dry, it researches and invents the next feature itself, then builds it. and it polices its own work before i ever see it. autonomy-loop v0.5.1: → self-feeding: empty backlog? it proposes the next feature and keeps going, no prompt from me → the bite: it reverts its own fix and reruns the test. stays green? it caught nothing, rejected → self-mutation: it mutates its own changed lines so weak tests get caught before handoff → circuit breaker: it parks to me instead of looping forever → branch protection: it can never touch prod or edit away its own gates → upgrading is one command: /autonomy-upgrade → red-teamed, 77 tests green two terminals. a builder, and a reviewer that trusts nothing. one repo. nobody driving. free, mit, 151 people already running it. /plugin marketplace add github.com/inferencegod/a… /plugin install autonomy-loop@autonomy-loop

English

54

Matthew Berman@MatthewBerman·5h

I'm increasingly exploring using LLM-as-a-judge in loops to determine the goal. I continue to be surprised by how well it's able to get to a great end state. a few examples: > "until it's simple enough" > until it's fast enough" not everything has to be deterministically verifiable.

English

21

3

111

6K

Ethereal@inferencegod·4h

@justinhammon_ @BratDotAI awesome !! let me know your thoughts

English

2

7

Justin Hammon@justinhammon_·5h

@inferencegod @BratDotAI I'll give this a look soon!!

English

0

1

27

Jana@BratDotAI·16h

Are you using your Codex/Claude subscription to its full potential?

English

18

1

14

768

Ethereal@inferencegod·4h

@NickKulikaev @anumness let me know your thoughts !

English

1

12

Nick Kulikaev@NickKulikaev·4h

@inferencegod @anumness thanks! i'll test it.

English

0

1

22

Anum @anumness·16h

Claudes usage limits are really getting on my nerves now. I think Im going to switch back to codex soon.

English

23

0

27

2.2K

Ethereal@inferencegod·4h

it’s an /autonomy-loop summer

ege@aegeantic

agent writing code isn't exciting anymore; it writing the code, compiling, verifying, testing and repeating until it gets it right does

English

@aegeantic x.com/inferencegod/s…

52

Ethereal@inferencegod·4h

i don't feed my agent tasks anymore. when the backlog runs dry, it researches and invents the next feature itself, then builds it. and it polices its own work before i ever see it. autonomy-loop v0.5.1: → self-feeding: empty backlog? it proposes the next feature and keeps going, no prompt from me → the bite: it reverts its own fix and reruns the test. stays green? it caught nothing, rejected → self-mutation: it mutates its own changed lines so weak tests get caught before handoff → circuit breaker: it parks to me instead of looping forever → branch protection: it can never touch prod or edit away its own gates → upgrading is one command: /autonomy-upgrade → red-teamed, 77 tests green two terminals. a builder, and a reviewer that trusts nothing. one repo. nobody driving. free, mit, 151 people already running it. /plugin marketplace add github.com/inferencegod/a… /plugin install autonomy-loop@autonomy-loop

QME

11

ege@aegeantic·6h

agent writing code isn't exciting anymore; it writing the code, compiling, verifying, testing and repeating until it gets it right does

English

10

1

36

1.2K

Ethereal@inferencegod·4h

this is exactly it. the writing was never the hard part, the write-test-verify-repeat-until-green loop is. i built that into a claude code plugin: a builder writes code + a RED-GREEN test, an adversarial reviewer re-runs the whole gate and tears the diff apart, loops until it actually passes. won't fabricate a result either. MIT: github.com/inferencegod/a…

English

51

Ethereal@inferencegod·4h

probably the harness, not the model. one-shotting complex stuff flails on everything, 4.8 and 5.5 alike. spec it tight, make it write a RED-GREEN test first, run a second agent to tear the diff apart, and 4.8 closes most of that gap on its own. i wired that exact loop into a free claude code plugin if you wanna test it: github.com/inferencegod/a…

English

0

1

45

Nick Kulikaev@NickKulikaev·11h

@anumness Every time I try 4.8 to do anything complex it fails. im genuinely curious if im missing something. To me 5.5 xhigh is just about 30-40% better in getting things done the way I need.

English

2

0

59

Ethereal@inferencegod·5h

agreed, that's the bar. so i left mine running and went to the pool. it shipped 5 reviewed features, then ran out of backlog and built a 6th on its own. no nudges, no restart button, nobody hovering. the fix isn't a smarter agent, it's a second one whose whole job is to tear the first one's diff apart: 5 lenses (correctness, honesty, regression, security, UX), and it re-breaks every new test to prove it actually bites before anything merges. plus it can't fabricate a win, every stat carries its sample size or says "building". real 1,200+ test repo, not a demo. clip + repo, MIT: x.com/inferencegod/s…

i don't feed my agent tasks anymore. when the backlog runs dry, it researches and invents the next feature itself, then builds it. and it polices its own work before i ever see it. autonomy-loop v0.5.1: → self-feeding: empty backlog? it proposes the next feature and keeps going, no prompt from me → the bite: it reverts its own fix and reruns the test. stays green? it caught nothing, rejected → self-mutation: it mutates its own changed lines so weak tests get caught before handoff → circuit breaker: it parks to me instead of looping forever → branch protection: it can never touch prod or edit away its own gates → upgrading is one command: /autonomy-upgrade → red-teamed, 77 tests green two terminals. a builder, and a reviewer that trusts nothing. one repo. nobody driving. free, mit, 151 people already running it. /plugin marketplace add github.com/inferencegod/a… /plugin install autonomy-loop@autonomy-loop

English

31

GEOFF@geoffreywoo·6h

agent data point: if your “autonomous worker” needs 11 slack nudges, 3 restart buttons, and a founder hovering like a stage mom, it is still an intern with better fonts.

English

6

0

9

934

Ethereal@inferencegod·5h

yeah that's the ideal input actually. the loop runs straight off a task list, so you drop your backlog in as a checklist and it works top-down: builder takes the next item, reviewer gates it (bite test + coverage ratchet), git-baton handoff, repeat. no rewriting your backlog into some special format, an existing list is exactly what it wants.

English

0

1

19

Justin Hammon@justinhammon_·5h

@inferencegod @BratDotAI I’ve heard of similar things to this but this is slick! I have a backlog of features. Does it work from existing lists too?

English

0

1

19

Ethereal@inferencegod·8h

no thank you! you pointed right at why i built a portable gate. the verification isn’t claude-specific, it’s just logic on a diff. so i pulled it into a standalone binary that runs on any agent. cursor, copilot, codex. it reverts your fix to prove the test actually catches it, ratchets coverage so it can only go up, and checks every changed line is tested. green or it fails. open source drops tonight :-)

English

1

40

Stephen Martin@martintechlabs·8h

@inferencegod Gotcha. Makes sense now. Thanks for the details!

English

0

1

38

Ethereal@inferencegod·11h

i don't feed my agent tasks anymore. when the backlog runs dry, it researches and invents the next feature itself, then builds it. and it polices its own work before i ever see it. autonomy-loop v0.5.1: → self-feeding: empty backlog? it proposes the next feature and keeps going, no prompt from me → the bite: it reverts its own fix and reruns the test. stays green? it caught nothing, rejected → self-mutation: it mutates its own changed lines so weak tests get caught before handoff → circuit breaker: it parks to me instead of looping forever → branch protection: it can never touch prod or edit away its own gates → upgrading is one command: /autonomy-upgrade → red-teamed, 77 tests green two terminals. a builder, and a reviewer that trusts nothing. one repo. nobody driving. free, mit, 151 people already running it. /plugin marketplace add github.com/inferencegod/a… /plugin install autonomy-loop@autonomy-loop

English

0

4

1.3K

Ethereal@inferencegod·8h

yeah you’re right, subagents do get a genuinely fresh context, isolated tools, even worktree isolation and their own hooks. i was drawing the line in the wrong place. the distinction i actually mean is single-session vs separate sessions. a subagent, even a fork, is spawned and judged by the same parent agent in one run. mine are two independent claude processes with no shared parent deciding the verdict, they only see committed git state. the docs kind of point at this too, they send you to agent teams or background agents for cross-session stuff rather than subagents. honestly for a lot of setups subagents would do the job. i went heavier because i wanted the reviewer to be a process the builder cannot influence at all. thank you for the context!

English

0

1

41

Stephen Martin@martintechlabs·8h

@inferencegod Oh, that is a different understanding than what I had. I thought the whole point of subagents was to have a fresh context. ref: code.claude.com/docs/en/sub-ag…

English

0

1

60

Ethereal@inferencegod·8h

subagents share the parent’s context and run inside the same session, so the reviewer is still kind of grading its own homework. two terminals are two independent claude processes that can’t see each other’s reasoning, only the committed git state. the reviewer re-runs the gate from scratch and reverts the builder’s fix to confirm the test catches it. the separation is the point. you can’t red-team yourself in the same context window. it also means a crash in one doesn’t take the other down, and the whole handoff is just git. hope this helps

English

0

2

88

Stephen Martin@martintechlabs·8h

@inferencegod Why not just use subagents?

English

2

0

1

70

Ethereal@inferencegod·8h

nice setup, but that grill step is still you in the chair up front. that’s the bottleneck i was trying to remove. autonomy-loop does the same spec-and-acceptance-criteria thinking, just inside the loop. when the queue is ambiguous the builder researches and writes the plan, the reviewer checks it, and it only pulls you in for the calls that actually change direction. your docs are great fuel for it though, not a competing approach. x.com/inferencegod/s…

i don't feed my agent tasks anymore. when the backlog runs dry, it researches and invents the next feature itself, then builds it. and it polices its own work before i ever see it. autonomy-loop v0.5.1: → self-feeding: empty backlog? it proposes the next feature and keeps going, no prompt from me → the bite: it reverts its own fix and reruns the test. stays green? it caught nothing, rejected → self-mutation: it mutates its own changed lines so weak tests get caught before handoff → circuit breaker: it parks to me instead of looping forever → branch protection: it can never touch prod or edit away its own gates → upgrading is one command: /autonomy-upgrade → red-teamed, 77 tests green two terminals. a builder, and a reviewer that trusts nothing. one repo. nobody driving. free, mit, 151 people already running it. /plugin marketplace add github.com/inferencegod/a… /plugin install autonomy-loop@autonomy-loop

English

1

106

Matthew Schrager@MatthewSchrager·9h

My current workflow is a /grill-to-goal skill based on @mattpocockuk’s /grill-with-docs that basically interviews you to produce detailed documentation about your feature, with clear acceptance criteria etc., along with a goal-ready prompt that references that documentation. Then just call /goal with that prompt. Works very nicely in my experience.

English

6

1

97

4.1K

Peter Yang@petergyang·9h

So I have Codex running on a /goal and it's been working for 2 hours but the problem is it's making alot of wrong assumptions so I have to monitor and steer it constantly. Is this expected? Perhaps I should've had it make a detailed plan first?

English

115

2

205

44.2K

Ethereal@inferencegod·8h

the 2-hour-of-wrong-assumptions thing is the exact problem i built around. two issues stacked: nothing’s checking the assumptions, and there’s no second set of eyes. so i run a builder and an adversarial reviewer. the reviewer re-runs everything and reverts the builder’s own fix to confirm the test actually catches it. a green test that proves nothing gets thrown out. and when the task queue is ambiguous it researches and writes the plan first instead of charging in. you stop steering because the second agent is doing the steering.

i don't feed my agent tasks anymore. when the backlog runs dry, it researches and invents the next feature itself, then builds it. and it polices its own work before i ever see it. autonomy-loop v0.5.1: → self-feeding: empty backlog? it proposes the next feature and keeps going, no prompt from me → the bite: it reverts its own fix and reruns the test. stays green? it caught nothing, rejected → self-mutation: it mutates its own changed lines so weak tests get caught before handoff → circuit breaker: it parks to me instead of looping forever → branch protection: it can never touch prod or edit away its own gates → upgrading is one command: /autonomy-upgrade → red-teamed, 77 tests green two terminals. a builder, and a reviewer that trusts nothing. one repo. nobody driving. free, mit, 151 people already running it. /plugin marketplace add github.com/inferencegod/a… /plugin install autonomy-loop@autonomy-loop

English

165

Ethereal@inferencegod·9h

@gmirabelli this is the real tool x.com/inferencegod/s…

i don't feed my agent tasks anymore. when the backlog runs dry, it researches and invents the next feature itself, then builds it. and it polices its own work before i ever see it. autonomy-loop v0.5.1: → self-feeding: empty backlog? it proposes the next feature and keeps going, no prompt from me → the bite: it reverts its own fix and reruns the test. stays green? it caught nothing, rejected → self-mutation: it mutates its own changed lines so weak tests get caught before handoff → circuit breaker: it parks to me instead of looping forever → branch protection: it can never touch prod or edit away its own gates → upgrading is one command: /autonomy-upgrade → red-teamed, 77 tests green two terminals. a builder, and a reviewer that trusts nothing. one repo. nobody driving. free, mit, 151 people already running it. /plugin marketplace add github.com/inferencegod/a… /plugin install autonomy-loop@autonomy-loop

English

@TimTeaFan @gauravvohra x.com/inferencegod/s…

12

George Mirabelli@gmirabelli·10h

Zero P1 defects. Fixed deadline. Massive app rewrite. Virgin Atlantic used AI coding tools to ship on time with near-total test coverage. Real result, real deadline. The tools are here. Are you using them? openai.com/index/virgin-a… #CustomSoftware #SoftwareDev #AI

English

2

0

1

32

Ethereal@inferencegod·9h

that’s the exact problem i designed around. i don’t work in one chat anymore. the loop passes small tasks between two terminals through git, so nothing balloons until it nukes your limits. fresh context every turn, state read from the last commit. terminal crashes? you lose nothing, it picks up from the commit. x.com/inferencegod/s…

QME

6

Tim Tiefenbach@TimTeaFan·13h

@gauravvohra This! Over like 500k the session limit burns down like nothing even if it’s just a short question regarding something earlier in the conversation.

English

0

1

121

Gaurav Vohra@gauravvohra·14h

That one long running Claude conversation that nukes all your limits every time you come back to it

English

5

0

25

1.7K

Ethereal@inferencegod·9h

that’s the exact problem i designed around. i don’t work in one chat anymore. the loop passes small tasks between two terminals through git, so nothing balloons until it nukes your limits. fresh context every turn, state read from the last commit. terminal crashes? you lose nothing, it picks up from the commit. x.com/inferencegod/s…

i don't feed my agent tasks anymore. when the backlog runs dry, it researches and invents the next feature itself, then builds it. and it polices its own work before i ever see it. autonomy-loop v0.5.1: → self-feeding: empty backlog? it proposes the next feature and keeps going, no prompt from me → the bite: it reverts its own fix and reruns the test. stays green? it caught nothing, rejected → self-mutation: it mutates its own changed lines so weak tests get caught before handoff → circuit breaker: it parks to me instead of looping forever → branch protection: it can never touch prod or edit away its own gates → upgrading is one command: /autonomy-upgrade → red-teamed, 77 tests green two terminals. a builder, and a reviewer that trusts nothing. one repo. nobody driving. free, mit, 151 people already running it. /plugin marketplace add github.com/inferencegod/a… /plugin install autonomy-loop@autonomy-loop

English