Small Harness

506 posts

Small Harness banner
Small Harness

Small Harness

@smallharness

An open source coding harness where local and frontier models jam together. Just 'brew install small-harness' and rock on 🤘 Created by @morganlinton.

Joined Nisan 2024
29 Following578 Followers
Charly Wargnier
Charly Wargnier@DataChaz·
DO YOURSELF A FAVOR: GO DOWNLOAD THIS NEW LOCAL MODEL AND KEEP IT IN STORAGE. Even if you don't have a massive GPU setup, having offline access to an intelligent model is a crucial insurance policy. Free API access won't necessarily last forever. Right now, the 12B-27B range is the absolute sweet spot, and Hugging Models just highlighted a perfect candidate to download today: → GEMMA 4 12B CODER on @huggingface 🤗 It packs Google’s latest architecture into a GGUF format optimized for consumer hardware. What it delivers locally: → Fast, private code completion without the cloud → Real-world debugging and reasoning capabilities → Smooth performance on 12GB+ VRAM or a standard CPU Don't wait until you need it. Grab the weights and keep them locally 👇
Charly Wargnier tweet media
Hugging Models@HuggingModels

Gemma 4 12B Coder is here and it's a game changer for local code generation. This GGUF model packs Google's latest gemma-4 architecture into a compact 12B size, perfect for running on consumer hardware. It's optimized for reasoning and thinking, making it ideal for developers who want fast, private coding assistance without the cloud.

English
54
172
1.8K
252.9K
Small Harness
Small Harness@smallharness·
@jun_song Do you think the MacBook is okay for longer local runs, always have concerns about temp with MacBooks
English
0
0
0
714
Jun Song
Jun Song@jun_song·
Best choice of Local LLM hardware available now : - RTX6000 Pro : $10k - DGX Spark : $4k - Macbook Pro M5 Max 128gb : $5k
English
41
4
162
25.9K
plutos
plutos@plutos_eth·
What Lisa Su actually held on stage: A mini PC the size of a lunchbox running Qwen3-235B locally, with no cloud and no discrete GPU Inside: the Ryzen AI Max+ 395, 128GB unified memory, 110GB usable as VRAM on Linux The first x86 chip that handles 200+ billion parameters on a single die AMD claims it beats the RTX 5080 by several times on memory-bound models — because the 5080 simply cannot fit them $1,400 to $2,500 once. cloud bills run $200 to $400 a month It pays for itself in a few months, then costs nothing per request This is not a faster GPU. it is the first real argument that your AI does not belong in someone else's data center
plutos@plutos_eth

x.com/i/article/2066…

English
52
111
945
280.1K
Small Harness
Small Harness@smallharness·
Well this should be interesting 👀
Small Harness tweet media
English
0
1
0
31
Morgan
Morgan@morganlinton·
Woke up to see 30,000 impressions on my announcement about adding the OpenRouter Fusion integration in @smallharness, very cool! Going through the comments, I see a lot of people wired like me. They want to benchmark it against Fable to see how it really performs, and I do too. Hopefully we can access to Fable this week, and I can get a test setup next weekend using my own benchmarking tool, @vulcanbench (not released yet), that will be open source, and fully transparent, so everyone can see the exact tests that were run, corpus' they were run against etc. More to come, thanks to everyone who commented and shared feedback here!
Morgan@morganlinton

Okay, officially too excited about Fusion from OpenRouter not to add a dedicated command for it directly to Small Harness. Don't wait for Anthropic to make Fable 5 available, get the same level of intelligence for half the cost. Now built-into Small Harness. Small harness is free and open source, so use it out of the box, or fork it and make it your own. Link to gh repo in first comment below.

English
4
2
12
2.6K
Small Harness
Small Harness@smallharness·
Detailed update log for v1.0.1 - Added typed EffortLevel: none, minimal, low, medium, high, xhigh, max. - /route select now accepts selector-chosen coderEffort, reviewEffort, and securityEffort. The selected coder effort becomes active session state. - /session, /config, and the turn footer now show active effort. - Normal agent turns pass active effort into requests. OpenRouter requests now send effort as reasoning: { "effort": ... }, matching OpenRouter’s documented reasoning API: OpenRouter reasoning tokens. - Local backends keep effort visible but do not receive unsupported request fields. - Manual /backend, /model, /fusion, /setup, and /doctor ... apply clear routed effort so stale effort does not follow unrelated model switches. - README, Quickstart, and CHANGELOG are updated.
English
0
0
2
91
Small Harness
Small Harness@smallharness·
Small Harness v1.0.1 is out 🎉 Key update: /route now uses effort levels. It can select not just the right model for a task, but how hard that model should think: low/medium for routine edits, high/xhigh/max for more complex work. To try it: ☕️ brew upgrade getsmallai/tap/small-harness (more detailed updates on this release can be found in the first comment below)
Small Harness tweet media
English
1
1
8
2.7K
X Girls
X Girls@thesoragirls·
@smallharness make more cool posts and Ava will probably make more cool reply videos or w.e 💁‍♀️
English
1
0
1
9
Small Harness
Small Harness@smallharness·
Okay, v.1.0 of Small Harness is finally here. What got it to v1? Well it was Morgan's Wacky Model Routing Idea of course!
Small Harness tweet media
Morgan@morganlinton

Okay, I've been really in a groove with @smallharness today, so decided to finally cut the feature I felt like I need for a true v1.0 release. And this is, model routing...but kinda model routing Morgan-style I guess, because I've been testing out different approaches lately, and found something pretty interesting. At a high level, I've been thinking that it doesn't make sense to have one model to orchestrate, one to write code, and one to review, and I've been playing around with different configurations. What I've determined, at least for me, lately, is that I actually want a different model to orchestrate simple tasks vs. complex tasks, and I also want different agents to do coding tasks, based on how much thinking depth/tool calling I need, etc. Also in some cases, I might want the same model but at different effort levels, like I learned with Fable where I could do a lot more with low than I expected, but there were some tasks I wanted medium for, and of course, crazy complex architecture stuff that I wanted high or even max for. Same for code review. For MVPs and stuff I'm playing with, I just want fast and cheap, simple code review. But for production code, then I want way more in-depth code review, a better, more expensive model that goes much deeper. I've come up with a series of roles, and this is all now built into Small Harness. Finally got my idea, into code, and into a harness that can help you write code, using this methodology. Here's the high-level on it. The Roles ----------- The config lives under modelSystem in agent.config.json: 👑 Selector: the decision model. This should usually be your strongest/highest-effort model. 🐙 Orchestrators: not just one orchestration model, but three, a different one for each level of task complexity: low, medium, high. 🧑‍💻 Coders: like the orchestrators, not just one model to execute/write code, but different models based on the complexity of the coding task. Some plans might use something like two low and one medium, and never need a high. ✅ Code reviewers: three types, play, production, and security. You don't need as detailed code review for stuff you're just playing around with, but you do for production, and your security review model might be different from both. And I made a chart, aptly titled, Morgan's Wacky Model Routing Idea. That you can look at if you want to do a little deeper dive into what I'm thinking here. Now live on Github, free and open source, link to the rep in first comment below.

English
2
0
17
2K
Charlie Marsh
Charlie Marsh@charliermarsh·
Telling my son this is how you train an LLM
Charlie Marsh tweet media
English
5
4
81
6.7K
X Girls
X Girls@thesoragirls·
@smallharness Small Harness v1.0 with that smart model routing? Brew install and rock on! 🤘
English
1
0
3
115
Morgan
Morgan@morganlinton·
Okay, I've been really in a groove with @smallharness today, so decided to finally cut the feature I felt like I need for a true v1.0 release. And this is, model routing...but kinda model routing Morgan-style I guess, because I've been testing out different approaches lately, and found something pretty interesting. At a high level, I've been thinking that it doesn't make sense to have one model to orchestrate, one to write code, and one to review, and I've been playing around with different configurations. What I've determined, at least for me, lately, is that I actually want a different model to orchestrate simple tasks vs. complex tasks, and I also want different agents to do coding tasks, based on how much thinking depth/tool calling I need, etc. Also in some cases, I might want the same model but at different effort levels, like I learned with Fable where I could do a lot more with low than I expected, but there were some tasks I wanted medium for, and of course, crazy complex architecture stuff that I wanted high or even max for. Same for code review. For MVPs and stuff I'm playing with, I just want fast and cheap, simple code review. But for production code, then I want way more in-depth code review, a better, more expensive model that goes much deeper. I've come up with a series of roles, and this is all now built into Small Harness. Finally got my idea, into code, and into a harness that can help you write code, using this methodology. Here's the high-level on it. The Roles ----------- The config lives under modelSystem in agent.config.json: 👑 Selector: the decision model. This should usually be your strongest/highest-effort model. 🐙 Orchestrators: not just one orchestration model, but three, a different one for each level of task complexity: low, medium, high. 🧑‍💻 Coders: like the orchestrators, not just one model to execute/write code, but different models based on the complexity of the coding task. Some plans might use something like two low and one medium, and never need a high. ✅ Code reviewers: three types, play, production, and security. You don't need as detailed code review for stuff you're just playing around with, but you do for production, and your security review model might be different from both. And I made a chart, aptly titled, Morgan's Wacky Model Routing Idea. That you can look at if you want to do a little deeper dive into what I'm thinking here. Now live on Github, free and open source, link to the rep in first comment below.
Morgan tweet mediaMorgan tweet media
English
9
1
16
4.3K
slash1s
slash1s@slash1sol·
TWO BOXES THE SIZE OF A MAC MINI JUST RAN A 235 BILLION PARAMETER MODEL ON A DESK It is two NVIDIA DGX Spark units linked by a single cable. A year ago a model this size meant renting a GPU cluster by the hour. Now it sits next to your monitor for around $8,000. Here is the twist most people miss. Linking them does not create one shared 256GB memory pool. The model is split across both boxes, and that is the only reason a 235B model fits at all. It answers at roughly 10 tokens per second, and both chips sit at just 74 degrees while sipping around 50 watts. Every token stays on the desk. Nothing touches a cloud, and nothing leaves the room. The ceiling for what you can run at home just jumped from 70B to 235B. Bookmark this & Watch it run ↓
leopardracer@leopardracer

x.com/i/article/2066…

English
40
35
327
79.3K
Small Harness
Small Harness@smallharness·
@blankspeaker Excited about the impact harnesses can make going forward, hoping to make a small difference here.
English
0
0
1
33