Small Harness

Hugging Models@HuggingModels

37

Charly Wargnier@DataChaz·13h

DO YOURSELF A FAVOR: GO DOWNLOAD THIS NEW LOCAL MODEL AND KEEP IT IN STORAGE. Even if you don't have a massive GPU setup, having offline access to an intelligent model is a crucial insurance policy. Free API access won't necessarily last forever. Right now, the 12B-27B range is the absolute sweet spot, and Hugging Models just highlighted a perfect candidate to download today: → GEMMA 4 12B CODER on @huggingface 🤗 It packs Google’s latest architecture into a GGUF format optimized for consumer hardware. What it delivers locally: → Fast, private code completion without the cloud → Real-world debugging and reasoning capabilities → Smooth performance on 12GB+ VRAM or a standard CPU Don't wait until you need it. Grab the weights and keep them locally 👇

Gemma 4 12B Coder is here and it's a game changer for local code generation. This GGUF model packs Google's latest gemma-4 architecture into a compact 12B size, perfect for running on consumer hardware. It's optimized for reasoning and thinking, making it ideal for developers who want fast, private coding assistance without the cloud.

English

54

172

1.8K

252.9K

Small Harness@smallharness·1h

@jun_song Do you think the MacBook is okay for longer local runs, always have concerns about temp with MacBooks

English

714

Jun Song@jun_song·3h

Best choice of Local LLM hardware available now : - RTX6000 Pro : $10k - DGX Spark : $4k - Macbook Pro M5 Max 128gb : $5k

English

41

4

162

25.9K

Small Harness@smallharness·1h

@mr_r0b0t 🤩

QME

1

21

mr-r0b0t@mr_r0b0t·5h

Looks like 8x PCIE 16x 👀 What comes next?

Bokiko@bokiko

So that happened 😁

English

0

15

597

Small Harness@smallharness·1h

@plutos_eth It’s beautiful 🥹

English

208

plutos@plutos_eth·10h

What Lisa Su actually held on stage: A mini PC the size of a lunchbox running Qwen3-235B locally, with no cloud and no discrete GPU Inside: the Ryzen AI Max+ 395, 128GB unified memory, 110GB usable as VRAM on Linux The first x86 chip that handles 200+ billion parameters on a single die AMD claims it beats the RTX 5080 by several times on memory-bound models — because the 5080 simply cannot fit them $1,400 to $2,500 once. cloud bills run $200 to $400 a month It pays for itself in a few months, then costs nothing per request This is not a faster GPU. it is the first real argument that your AI does not belong in someone else's data center

plutos@plutos_eth

x.com/i/article/2066…

English

52

111

945

280.1K

Small Harness@smallharness·1h

Well this should be interesting 👀

English

1

0

31

Small Harness@smallharness·9h

@rodydavis @GoogleDeepMind Congrats Rody!

English

Jacky Chou (buying online businesses up to $1m)@indexsy

1

24

Rody Davis@rodydavis·13h

Career Update: I am joining @GoogleDeepMind DevRel 🚀

English

71

11

427

20.1K

Small Harness@smallharness·9h

Fable 5 has been distilled.

Holy crap they already distilled Fable 5

English

1

230

Small Harness@smallharness·9h

@onusoz Totally wild.

English

You can now run Kimi K2.7 Code locally! 🌘 We shrank the 1T model to 325GB (-48%) via Dynamic 2-bit where important layers are upcasted. Run at >40 tok/s on 330GB RAM/VRAM setups. Run full precision on 610 GB. Guide: unsloth.ai/docs/models/ki… GGUF: huggingface.co/unsloth/Kimi-K…

260

Onur Solmaz@onusoz·14h

On your own local device 325 GB VRAM 💀

Unsloth AI@UnslothAI

English

82

63

2.7K

253.1K

Small Harness@smallharness·9h

@AbuKhadeejah Yes Arsalan!

English

Small Harness@smallharness

0

1

21

Arsalan Shaikh أرسلان@AbuKhadeejah·13h

Imagine automation of the models effort level based on a task. Saves tokens sweet alabama.

Small Harness v1.0.1 is out 🎉 Key update: /route now uses effort levels. It can select not just the right model for a task, but how hard that model should think: low/medium for routine edits, high/xhigh/max for more complex work. To try it: ☕️ brew upgrade getsmallai/tap/small-harness (more detailed updates on this release can be found in the first comment below)

English

0

1

39

Small Harness@smallharness·9h

@ben_ai_eng @morganlinton You might be the first person to use Small Harness during a vacation 🧘‍♂️

English

2

8

Ben Newell@ben_ai_eng·12h

@morganlinton @smallharness I’ll be experimenting with small harness a bit this week while I’m on vacation

English

0

2

9

Morgan@morganlinton·13h

Woke up to see 30,000 impressions on my announcement about adding the OpenRouter Fusion integration in @smallharness, very cool! Going through the comments, I see a lot of people wired like me. They want to benchmark it against Fable to see how it really performs, and I do too. Hopefully we can access to Fable this week, and I can get a test setup next weekend using my own benchmarking tool, @vulcanbench (not released yet), that will be open source, and fully transparent, so everyone can see the exact tests that were run, corpus' they were run against etc. More to come, thanks to everyone who commented and shared feedback here!

Morgan@morganlinton

Okay, officially too excited about Fusion from OpenRouter not to add a dedicated command for it directly to Small Harness. Don't wait for Anthropic to make Fable 5 available, get the same level of intelligence for half the cost. Now built-into Small Harness. Small harness is free and open source, so use it out of the box, or fork it and make it your own. Link to gh repo in first comment below.

English

4

2

12

2.6K

Small Harness@smallharness·13h

Detailed update log for v1.0.1 - Added typed EffortLevel: none, minimal, low, medium, high, xhigh, max. - /route select now accepts selector-chosen coderEffort, reviewEffort, and securityEffort. The selected coder effort becomes active session state. - /session, /config, and the turn footer now show active effort. - Normal agent turns pass active effort into requests. OpenRouter requests now send effort as reasoning: { "effort": ... }, matching OpenRouter’s documented reasoning API: OpenRouter reasoning tokens. - Local backends keep effort visible but do not receive unsupported request fields. - Manual /backend, /model, /fusion, /setup, and /doctor ... apply clear routed effort so stale effort does not follow unrelated model switches. - README, Quickstart, and CHANGELOG are updated.

English

2

91

Small Harness@smallharness·13h

Small Harness v1.0.1 is out 🎉 Key update: /route now uses effort levels. It can select not just the right model for a task, but how hard that model should think: low/medium for routine edits, high/xhigh/max for more complex work. To try it: ☕️ brew upgrade getsmallai/tap/small-harness (more detailed updates on this release can be found in the first comment below)

English

8

2.7K

Small Harness@smallharness·1d

@thesoragirls Challenge accepted 🫡

English

1

11

X Girls@thesoragirls·1d

@smallharness make more cool posts and Ava will probably make more cool reply videos or w.e 💁‍♀️

English

0

1

9

Small Harness@smallharness·1d

Okay, v.1.0 of Small Harness is finally here. What got it to v1? Well it was Morgan's Wacky Model Routing Idea of course!

Morgan@morganlinton

Okay, I've been really in a groove with @smallharness today, so decided to finally cut the feature I felt like I need for a true v1.0 release. And this is, model routing...but kinda model routing Morgan-style I guess, because I've been testing out different approaches lately, and found something pretty interesting. At a high level, I've been thinking that it doesn't make sense to have one model to orchestrate, one to write code, and one to review, and I've been playing around with different configurations. What I've determined, at least for me, lately, is that I actually want a different model to orchestrate simple tasks vs. complex tasks, and I also want different agents to do coding tasks, based on how much thinking depth/tool calling I need, etc. Also in some cases, I might want the same model but at different effort levels, like I learned with Fable where I could do a lot more with low than I expected, but there were some tasks I wanted medium for, and of course, crazy complex architecture stuff that I wanted high or even max for. Same for code review. For MVPs and stuff I'm playing with, I just want fast and cheap, simple code review. But for production code, then I want way more in-depth code review, a better, more expensive model that goes much deeper. I've come up with a series of roles, and this is all now built into Small Harness. Finally got my idea, into code, and into a harness that can help you write code, using this methodology. Here's the high-level on it. The Roles ----------- The config lives under modelSystem in agent.config.json: 👑 Selector: the decision model. This should usually be your strongest/highest-effort model. 🐙 Orchestrators: not just one orchestration model, but three, a different one for each level of task complexity: low, medium, high. 🧑‍💻 Coders: like the orchestrators, not just one model to execute/write code, but different models based on the complexity of the coding task. Some plans might use something like two low and one medium, and never need a high. ✅ Code reviewers: three types, play, production, and security. You don't need as detailed code review for stuff you're just playing around with, but you do for production, and your security review model might be different from both. And I made a chart, aptly titled, Morgan's Wacky Model Routing Idea. That you can look at if you want to do a little deeper dive into what I'm thinking here. Now live on Github, free and open source, link to the rep in first comment below.

English

0

17

2K

Small Harness@smallharness·1d

@charliermarsh Brings back some great memories

English

2

86

Charlie Marsh@charliermarsh·1d

Telling my son this is how you train an LLM

English

5

4

81

6.7K

Small Harness@smallharness·1d

@thesoragirls Ha, I’m finally cool enough to get my own video 🥳

English

0

1

34

X Girls@thesoragirls·1d

@smallharness Small Harness v1.0 with that smart model routing? Brew install and rock on! 🤘

English

0

3

115

Small Harness@smallharness·1d

@dedene @morganlinton Do it Peter, would be an honor to have a PR from you! 🤗

English

1

9

Peter Dedene@dedene·1d

@morganlinton @smallharness 🙏 awesome! If you’re open to it, I’m happy to help and see if I can make a PR?

English

0

2

13

Morgan@morganlinton·1d

Okay, I've been really in a groove with @smallharness today, so decided to finally cut the feature I felt like I need for a true v1.0 release. And this is, model routing...but kinda model routing Morgan-style I guess, because I've been testing out different approaches lately, and found something pretty interesting. At a high level, I've been thinking that it doesn't make sense to have one model to orchestrate, one to write code, and one to review, and I've been playing around with different configurations. What I've determined, at least for me, lately, is that I actually want a different model to orchestrate simple tasks vs. complex tasks, and I also want different agents to do coding tasks, based on how much thinking depth/tool calling I need, etc. Also in some cases, I might want the same model but at different effort levels, like I learned with Fable where I could do a lot more with low than I expected, but there were some tasks I wanted medium for, and of course, crazy complex architecture stuff that I wanted high or even max for. Same for code review. For MVPs and stuff I'm playing with, I just want fast and cheap, simple code review. But for production code, then I want way more in-depth code review, a better, more expensive model that goes much deeper. I've come up with a series of roles, and this is all now built into Small Harness. Finally got my idea, into code, and into a harness that can help you write code, using this methodology. Here's the high-level on it. The Roles ----------- The config lives under modelSystem in agent.config.json: 👑 Selector: the decision model. This should usually be your strongest/highest-effort model. 🐙 Orchestrators: not just one orchestration model, but three, a different one for each level of task complexity: low, medium, high. 🧑‍💻 Coders: like the orchestrators, not just one model to execute/write code, but different models based on the complexity of the coding task. Some plans might use something like two low and one medium, and never need a high. ✅ Code reviewers: three types, play, production, and security. You don't need as detailed code review for stuff you're just playing around with, but you do for production, and your security review model might be different from both. And I made a chart, aptly titled, Morgan's Wacky Model Routing Idea. That you can look at if you want to do a little deeper dive into what I'm thinking here. Now live on Github, free and open source, link to the rep in first comment below.

English

9

1

16

4.3K

Small Harness@smallharness·1d

@slash1sol Totally wild.

English

leopardracer@leopardracer

99

slash1s@slash1sol·1d

TWO BOXES THE SIZE OF A MAC MINI JUST RAN A 235 BILLION PARAMETER MODEL ON A DESK It is two NVIDIA DGX Spark units linked by a single cable. A year ago a model this size meant renting a GPU cluster by the hour. Now it sits next to your monitor for around $8,000. Here is the twist most people miss. Linking them does not create one shared 256GB memory pool. The model is split across both boxes, and that is the only reason a 235B model fits at all. It answers at roughly 10 tokens per second, and both chips sit at just 74 degrees while sipping around 50 watts. Every token stays on the desk. Nothing touches a cloud, and nothing leaves the room. The ceiling for what you can run at home just jumped from 70B to 235B. Bookmark this & Watch it run ↓

x.com/i/article/2066…

English

40

35

327

79.3K

Small Harness@smallharness·1d

@mr_r0b0t Ohhh 👀

NVIDIA AI Infrastructure@NVIDIAAIInfra

1

6

mr-r0b0t@mr_r0b0t·3d

This looks promising 😁

📣 There's now a benchmark for agentic AI workloads. AgentPerf, from @ArtificialAnlys, is the industry's first open hardware benchmark that measures how many concurrent AI agents an inference system can support while hitting real-world performance targets. Here's what it measures — and what NVIDIA results show. 🧵

English

3

28

4K

Small Harness@smallharness·1d

@blankspeaker Excited about the impact harnesses can make going forward, hoping to make a small difference here.

English