Greg Wedow

1.8K posts

Greg Wedow

@gregwedow

https://t.co/dGaMsQiwEc Task Management for Agents

Canada Katılım Ekim 2009

270 Takip Edilen318 Takipçiler

Sabitlenmiş Tweet

Greg Wedow@gregwedow·19 Oca

Once a programmer becomes used to a complex solution to a problem, simple solutions to the same problem feel incomplete and uncomfortable — Doug Hoyte

English

1.7K

Greg Wedow@gregwedow·7h

github.com/wedow/harness

ZXX

Greg Wedow@gregwedow·7h

full agreement here. this is why i'm working to make harness the most easily extensible agent runtime. most of what i do with harness is not in the main repo, because it's so trivial to customize it from within my project repos. still a handful of things i need to clean up in the core (like tearing out the TUI) but then it will be basically "finished" software that anyone can reshape to fit whatever work they're doing

Viv@Vtrivedy10

ok hot take, who (dis)agrees? The general purpose agent/harness doesn’t exist the best harnesses are deeply Task specific and when we use a “default harness” out-of-the-box, we’re just making a tradeoff between - acceptable task performance - time+money spent designing around our task(s) that’s a totally fair tradeoff to make, maybe we’re happy with the out of box perf! what we call a “general purpose” harness is just one that’s reasonably good at a relatively large portion of tasks but there’s a reason why teams that want top 1% agent performance obsessively tweak the harness per Task+Model it’s because you can squeeze out a lot by building bespoke harness tooling for a Task. For a high value task, it’s totally worth the investment Your entire company might be predicated on that investment this effect is pretty clear when we try to swap models “models are non-fungible in their harness” - so the suck if we just drop in codex into the Claude Code harness but if you use the models together in a joint harness and design around the specific problem, you can get great perf I’ve mentioned before but i think the most exciting future is just-in-time harness creation per task idk if that’s a very popular take vs “one model will do everything” but it’s a current mental model and exciting thing i’m messing around with

English

142

Greg Wedow@gregwedow·1d

looking very slick! i need to go look at your fork some more. what are you using for the GUI? i'm on the fence about shipping something like this tbh. i run 90% of my agents headless these days and really just want a solid planning interface rather than the normal chat thing. at minimum, the current `agent` script that runs the TUI needs to be torn apart so the UI is divorced from the actual agent processing. ideally, that would make interfaces like this much more straightforward to build going to find some time tonight to see how you've built this and how it can be integrated though. seems so much nicer than pretty well all the other agent interfaces i've used

English

Szymon Rączka@screenfluent·2d

@gregwedow Hey @gregwedow what do you say about Harness looking like this? 😁

English

Greg Wedow@gregwedow·3d

harness is <1% this size with ~80% capabilities and the gaps are so easy to fill claude has 4 different compaction strategies across 40+ files in ~10k LOC you can add compaction to harness as a hook in ~20 LOC in any language and then enhance it at runtime as you think of new strategies.

dax@thdxr

claude code source is 512K lines opencode is 118K we're getting LOC mogged

English

320

Greg Wedow@gregwedow·2d

@communicating thanks! it fits really naturally in lower resource environments. zero cpu utilization when idle means i don't care if it's left running started using it on my phone in the Termux app yesterday and it's quite nice. need to add some web search tools to be properly useful though

English

Christopher@communicating·3d

@gregwedow Still haven’t had time to play w/ your awesome harness but I’m so fascinated A near 0 dependency harness must have so many use cases where this structure is superior. Things like bare metal setup & ci/cd ephemeral runners come to mind but there must be so many more. Keep at it👍

English

Greg Wedow@gregwedow·2d

another fun small model here. dumb as a rock but a very capable tool user when given direct instructions. seems to work quite fluently in harness with a more focused tool set so i'm experimenting with using it for some context management tasks

Liquid AI@liquidai

Today, we release LFM2.5-350M. Agentic loops at 350M parameters. A 350M model trained for reliable data extraction and tool use, where models at this scale typically struggle. <500MB when quantized, built for environments where compute, memory, and latency are constrained. 🧵

English

222

Greg Wedow@gregwedow·2d

@chinar_amrutkar the ticket repo really needs some love these days. i have been meaning to tag a release and merge a bunch of things for a month now going to try to bring it up the priority list soon sorry for the delay!

English

Chinar Amrutkar@chinar_amrutkar·4d

@gregwedow Btw I created a couple PRs for ticket a few weeks ago. Let me know your thoughts!

English

Greg Wedow@gregwedow·6d

seems like this actually has a decent way to measure slop. i've been saying the combo of ticket and my skills repo has been generating slop-free code for a while now next week i'll integrate those with harness and put it to the test

Gabe Orlanski@GOrlanski

We found that agents generate progressively worse code with each iteration. Real developers do not. SlopCodeBench is the only eval that faithfully measures quality degradation on iterative, long-horizon coding tasks. arxiv.org/abs/2603.24755 scbench.ai 🧵

English

409

Greg Wedow@gregwedow·3d

@AaronHnatiw I'm as surprised as you

English

Aaron Hnatiw@AaronHnatiw·3d

@gregwedow Kimi built that AVX2 support??

English

Greg Wedow@gregwedow·3d

Cool little model here. Spun up Kimi K2.5 in `harness` to set things up on my laptop since they said it runs on CPU. Load up the UI, gibberish output at ~0.65t/s. One line fix for the gibberish but speed is still terrible. Turns out they didn't ship AVX2 support for their model in their llama.cpp fork. ~3 minutes later and AVX2 is implemented and running and I'm getting 15t/s. Not bad Kimi, not bad at all.

Jacob Miller@pwnies

Played around with PrismML's 1bit model. prismml.com It uses 1 bit per parameter, and a FP16 scale factor for each group of 128 params. Cool demo - runs crazy fast. It's able to handle basic tool usage via cursor, but it's nowhere near usable. I rate it neat / 10

English

237

Greg Wedow retweetledi

Greg Wedow@gregwedow·3d

touring through the claude code makes me very sure i've made the right design choices with harness. maximizing separation of concerns makes every feature orthogonal, hot-swappable, and trivial to add github.com/wedow/harness

English

Greg Wedow@gregwedow·4d

github.com/wedow/harness

ZXX

Greg Wedow@gregwedow·4d

big thanks to @screenfluent for taking harness for a spin and catching a bunch of bugs harness is now working with full ChatGPT and Claude subscription support on both mac and linux still want to add a few quality of life improvements but it's now feature-complete enough that i've switched off of claude code for personal projects. feeling good about this thing

Szymon Rączka@screenfluent

@gregwedow @chinar_amrutkar That was the one thing I ran into. No worries though already having fun with it. Built some stuff like context compression inspired by Mastra's observational memory and nicer tool display. Great architecture btw, really easy to extend.

English

384

Greg Wedow@gregwedow·4d

nice! looks like the core of the fix was just the curl user agent hitting a bot fence? going to throw in a small fix for that and another for an issue with refresh token handling that i saw i like the addition of the proper oauth flow but would rather not depend on python. should be straightforward to adapt the netcat handler from the chatgpt provider though. will push that shortly as well

English

Szymon Rączka@screenfluent·5d

@gregwedow Woo hoo! 🎉 received authorization via browser callback exchanging code for tokens... logged in successfully saved credential for claude

English

Greg Wedow@gregwedow·28 Mar

harness now includes an auth plugin for api key management with hooks to customize selection logic. makes it easy to round-robin keys and avoid rate limits also includes a chatgpt oauth flow to use your subscription instead of an api key. should have one for claude tomorrow. it's being a bit finicky.

English

261

Greg Wedow@gregwedow·5d

@DanielMiessler at this point, a harness can and should be basically nothing. the models are good enough to take it from there github.com/wedow/harness

English

178

ᴅᴀɴɪᴇʟ ᴍɪᴇssʟᴇʀ 🛡️@DanielMiessler·5d

Don’t over-engineer your AI harness (Bitter Lesson Engineering) danielmiessler.com/blog/bitter-le…

English

195

11.9K

Greg Wedow@gregwedow·5d

@screenfluent @chinar_amrutkar Thanks for the PRs by the way! It's all very pre-alpha quality at the moment. Was netcat really the only macOS compat issue? I honestly thought it'd need more work than that.

English

Szymon Rączka@screenfluent·5d

@gregwedow @chinar_amrutkar Hey, thanks for your work! Happy user of Pi, but got tempted to go another level of complexity down haha (BTW just submitted smol PR). However I'm confused why CLAUDE.md not AGENTS.md in your skills repo?

English

Greg Wedow@gregwedow·5d

ah good callout — claude code was my main agent when i made all that and just never changed it. my system is littered with symlinks as i try out new things. it was all written specifically to get better results out of claude models i also find gpt hyperfocuses on some of the language in that file and figured i should make a dedicated version of the content for that model family

English

Greg Wedow@gregwedow·5d

@screenfluent ah yes there's a bug during the code exchange that I haven't had a chance to work out this weekend. good workaround though! oauth in bash is a new fun adventure for sure

English

Szymon Rączka@screenfluent·5d

@gregwedow FYI I was getting rate limiting error from CC sub, so I got around it with copying token ~/.pi/agent/auth.json → .harness/.auth.json

English

Greg Wedow@gregwedow·5d

@chinar_amrutkar let me know how that goes. i'm sure the planning phase especially will need some adaptations. the rest is very hands-off and should work regardless of who or what is driving

English

Chinar Amrutkar@chinar_amrutkar·5d

@gregwedow Doing this on Openclaw btw, so I need to adapt a few things. Kidding, the agent will adapt it for me 😝

English

Keşfet

@communicating @chinar_amrutkar @AaronHnatiw @screenfluent @elonmusk @BarackObama @taylorswift13 @cristiano