BuildLocalAI

8 posts

BuildLocalAI

BuildLocalAI

@buildlocalai

Build it. Run it. Own it.

Katılım Mart 2026
9 Takip Edilen0 Takipçiler
BuildLocalAI retweetledi
imran
imran@imranye·
been running hermes agent for three days and i have not had to restart it a single time….
English
17
3
157
45.5K
Sudo su
Sudo su@sudoingX·
i keep saying local ai not because i am against cloud. i use cloud every day. but i do it in a way where i orchestrate things without letting it understand the full and final scope. thinking one step ahead. and that is only possible when you have been fucking around with your own hardware long enough to know the tradeoffs. the more you fuck around the more you get it. every day i wake up and get blown away by what a 9B model can actually do on real tasks. not benchmarks. not demos. actual work. a perfect free automation and cognition tool sitting on a card that costs less than 2 month of pro. if i even told you what i crank out of these little probabilistic machines you would not believe me. so i will just keep iterating the game and show you in a way you can see. because that is how most of you work. you want to see it first. then decide. then maybe give it one try. fail. hit cloud again. post "local ai is not there yet." it is there. you just haven't tried hard enough.
English
3
0
25
2.3K
Sudo su
Sudo su@sudoingX·
this is the worst local ai will ever be. it only gets better from here. if you are not expanding your mind with these small models you are missing what's happening right now 99 percent tool call success rate. when steered well with the right skills and a framework like hermes agent the node becomes a cognition layer. not a chatbot. not a toy. an extension of how you think. i was cranking this node at 35 to 50 tok/s all day on personal experiments and now after all the work is done qwen 3.5 9B is iterating on its own code. the game it created. fixing its own bugs autonomously. and the part you should probably not miss is that all of this is happening on a RTX 3060. not an H100. not an A100. the card most of you have sitting in a drawer right now. if you just open that drawer and put that intelligence to work every tensor core on that card should be running for you. your work. your experiments. your thinking. you all have it but because nobody told you what this hardware can actually do in 2026 you never tried. the day it unlocks is the day you test your workload, understand the tradeoffs, debug the loops, and then decide if you need to scale the hardware. there is no point buying 3 mac studios when things done well you can squeeze a similar level of intelligence from 9B compared to 70B. but only when you create the right environment for your model through the right harness. and let me tell you i have tried claude code as a local harness. i have tried opencode. i have tried various others. somehow i landed on hermes agent and never left. there is something magical going on at @NousResearch. the tool call parsers, the skills system, the way it handles small models natively. nothing else comes close for local inference. own your cognition. your AI. your agent. your prompts. your experiments. why give them away for free. those are who you are and they don't belong on someone else's servers being monitored. just give it a shot with your existing hardware. you run into a problem the community will help you. and if you are migrating from openclaw to hermes i will personally help you make the switch.
Sudo su tweet mediaSudo su tweet mediaSudo su tweet media
Sudo su@sudoingX

this is what 12 gigs of VRAM built in 2026. a 9 billion parameter model running on a 5 year old RTX 3060 wrote a full space shooter from a single prompt. blank screen on first try. i came back with a bug list and the same model on the same card fixed every issue across 11 files without touching a single line myself. enemies still looked wrong so i pushed another iteration and now the game has pixel art octopi, particle effects, screen shake, projectile physics and a combo system. all running locally on a card that was designed to play fortnite. three iterations. zero cloud. zero API calls. every token generated on hardware sitting under my desk. the model reads its own code, finds what's broken, patches it, validates syntax and restarts the server. i just describe what's wrong and it handles the rest. people are paying monthly subscriptions to type into a browser tab and wait for a server farm to respond. meanwhile a GPU you can find used on ebay is running a full autonomous hermes agent framework with 31 tools, 128K context window and thinking mode generating at 29 tokens per second nonstop. the game still needs work. level upgrades don't trigger and boss fights need tuning. but the fact that i'm iterating on gameplay balance instead of debugging whether the code runs at all tells you where this is headed. every iteration the game gets better on the same hardware. same 12 gigs. same 9 billion parameters. same RTX 3060 from 5 years ago your GPU is not a gaming card anymore. it's a local AI lab that never sends your data anywhere.

English
48
60
681
57.9K
BuildLocalAI retweetledi
DailyPapers
DailyPapers@HuggingPapers·
Black Forest Labs just released FLUX.2 klein on Hugging Face Sub-second image generation and editing with state-of-the-art quality. Runs on consumer GPUs with just 13GB VRAM.
DailyPapers tweet media
English
4
14
133
19.1K
BuildLocalAI retweetledi
God of Prompt
God of Prompt@godofprompt·
Alibaba just introduced the Qwen 3.5 Small Model Series. Four models. 0.8B to 9B parameters. Natively multimodal. Built for edge devices, mobile, and real-world deployment. More intelligence, less compute. Here's what this release actually means:
English
6
8
127
24.2K
BuildLocalAI retweetledi
Samuel Wong
Samuel Wong@samuel_wong_·
Qwen3.5 - How to Run Locally Guide Run the new Qwen3.5 LLMs including Medium: Qwen3.5-35B-A3B, 27B, 122B-A10B, Small: Qwen3.5-0.8B, 2B, 4B, 9B and 397B-A17B on your local device! unsloth.ai/docs/models/qw…
English
2
25
160
8.4K
BuildLocalAI retweetledi
Sudo su
Sudo su@sudoingX·
cancel your chatgpt subscription and delete your openclaw slop. i'm serious. go on ebay and buy a used RTX 3060 for the price of two months of pro. or check your drawer because half of you already own one and forgot about it. install hermes agent from @NousResearch. one framework, 31 tools, file operations, terminal, browser, code execution. connect it to your local llama.cpp server running qwen 3.5 9B Q4. total download is 5.3 gigs. that's it. that's the whole setup. every experiment you hesitated to run on API. every project you shelved because you didn't want your data on someone else's server. every late night idea you didn't test because you hit your rate limit. all of that is gone. runs 24/7 on your electricity. your machine. your data never leaves your house. connect it to telegram if you want it on your phone. hook up whatever tools you need. the model thinks at 29 tok/s with 128K context and it never bills you. qwen 3.5 9B and one RTX 3060 is the setup most people will never try because they've been trained to believe intelligence has to come from a datacenter. it doesn't. it runs on 12 gigs of VRAM under your desk right now. stop giving your thinking away for free.
English
99
190
2.1K
145.9K
Sudo su
Sudo su@sudoingX·
this is what 12 gigs of VRAM built in 2026. a 9 billion parameter model running on a 5 year old RTX 3060 wrote a full space shooter from a single prompt. blank screen on first try. i came back with a bug list and the same model on the same card fixed every issue across 11 files without touching a single line myself. enemies still looked wrong so i pushed another iteration and now the game has pixel art octopi, particle effects, screen shake, projectile physics and a combo system. all running locally on a card that was designed to play fortnite. three iterations. zero cloud. zero API calls. every token generated on hardware sitting under my desk. the model reads its own code, finds what's broken, patches it, validates syntax and restarts the server. i just describe what's wrong and it handles the rest. people are paying monthly subscriptions to type into a browser tab and wait for a server farm to respond. meanwhile a GPU you can find used on ebay is running a full autonomous hermes agent framework with 31 tools, 128K context window and thinking mode generating at 29 tokens per second nonstop. the game still needs work. level upgrades don't trigger and boss fights need tuning. but the fact that i'm iterating on gameplay balance instead of debugging whether the code runs at all tells you where this is headed. every iteration the game gets better on the same hardware. same 12 gigs. same 9 billion parameters. same RTX 3060 from 5 years ago your GPU is not a gaming card anymore. it's a local AI lab that never sends your data anywhere.
Sudo su@sudoingX

i run every model through octopus invaders. same prompt, same game spec if a model can build this autonomously on a single GPU it passes. if it can't it doesn't. qwen 3.5 9B Q4 on a RTX 3060. first attempt was blank screen built 2,699 lines across 11 files and nothing rendered. i wrote it off as a ceiling. then last night i came back with a precise bug list and the same model on the same card fixed every single one surgically. game came to life. enemies spawning, background rendering, collisions working. but bullets didn't fire and the enemies looked like colored squares instead of octopi. today i pushed again. listed 9 more bugs. the agent read every file, patched across 4 modules, validated syntax and restarted the server on its own. bullets fire. enemies look like actual pixel art. screen shake works. the game is playable and i genuinely enjoyed it. level upgrades still don't trigger and there's more to fix but i'm iterating on a single 12GB card running everything locally. every file, every prompt, every output stays on my machine. 29 tok/s generation, 417 tok/s prefill, 128K context window on a card that most people bought to play warzone. if you use AI in any part of your life and you have a computer with a GPU in it you should not be sleeping on this. the model weights are free. the hermes agent framework is free. your data never leaves your house. own your cognition.

English
37
53
678
167.8K