byMAR.CO

841 posts

byMAR.CO banner
byMAR.CO

byMAR.CO

@MarcoTundo

Master Biomechanical Engineer, Human AI Consultant - I build AI on a Farm. Prev CTO https://t.co/YUQrtFP3DJ, https://t.co/oEOh3lNDP3

Canada Katılım Mart 2010
643 Takip Edilen280 Takipçiler
byMAR.CO
byMAR.CO@MarcoTundo·
@sudoingX How do you like Hermes? Better than OpenClaw? I've found Qwen3.5-27B Q6 working amazing with 24gb and 230k ctx
English
0
0
0
36
Sudo su
Sudo su@sudoingX·
i did not use LM Studio or Ollama. compile llama.cpp from source with CUDA. for personal inference and efficiency nothing else comes close. exact flags: ./llama-server -m Qwen3.5-9B-Q4_K_M.gguf -ngl 99 -c 131072 -np 1 -fa on --cache-type-k q4_0 --cache-type-v q4_0 --host 0.0.0.0 then install hermes agent and point it at localhost:8080. 31 tools. 11 model-specific parsers. persistent memory. the agent handles the rest.
English
8
0
74
2.1K
Sudo su
Sudo su@sudoingX·
hear this anon you don't need a $4,699 box to get started local AI. use what you already have first. test your workload. this is what a $250 GPU did today. iteration 3 of octopus invaders is here. 4 phases. 6 prompts. zero handwritten code. the same 9B on the same 3060 fixed its own enemy spawning, patched a dual start conflict, added level progression, resized every bullet, and when the browser cached old files it figured that out on its own and added version parameters to force reload. 3,200+ lines across 13 files. every line by qwen 3.5 9B Q4 at 35-50 tok/s on 12 gigs through hermes agent. understand what your load actually needs before you build. don't get trapped by influencers selling you boxes next to a plant. test on what you have. then decide. this 3060 impressed me in ways i did not expect and its autonomy is what kept me going. now its time to move to new experiments on other nodes and other models for all of us. if you are running this setup the exact stack, flags, and open source code, exact prompts i used are in the replies. if you run into issues let me know. seeing students and builders discover hermes from my posts and start running local is why i do this. full autonomous build at 8x speed in the video. gameplay at the end. watch it.
Sudo su@sudoingX

this is what 12 gigs of VRAM built in 2026. a 9 billion parameter model running on a 5 year old RTX 3060 wrote a full space shooter from a single prompt. blank screen on first try. i came back with a bug list and the same model on the same card fixed every issue across 11 files without touching a single line myself. enemies still looked wrong so i pushed another iteration and now the game has pixel art octopi, particle effects, screen shake, projectile physics and a combo system. all running locally on a card that was designed to play fortnite. three iterations. zero cloud. zero API calls. every token generated on hardware sitting under my desk. the model reads its own code, finds what's broken, patches it, validates syntax and restarts the server. i just describe what's wrong and it handles the rest. people are paying monthly subscriptions to type into a browser tab and wait for a server farm to respond. meanwhile a GPU you can find used on ebay is running a full autonomous hermes agent framework with 31 tools, 128K context window and thinking mode generating at 29 tokens per second nonstop. the game still needs work. level upgrades don't trigger and boss fights need tuning. but the fact that i'm iterating on gameplay balance instead of debugging whether the code runs at all tells you where this is headed. every iteration the game gets better on the same hardware. same 12 gigs. same 9 billion parameters. same RTX 3060 from 5 years ago your GPU is not a gaming card anymore. it's a local AI lab that never sends your data anywhere.

English
27
32
442
44.4K
the tiny corp
the tiny corp@__tinygrad__·
Mac Mini + eGPU. Both NVIDIA and AMD supported.
the tiny corp tweet media
Magyar
138
210
2.8K
311.2K
byMAR.CO
byMAR.CO@MarcoTundo·
sudo systemctl stop ollama 2>/dev/null; \ sudo systemctl disable ollama 2>/dev/null; \ sudo rm -f /etc/systemd/system/ollama.service; \ sudo rm -f $(which ollama 2>/dev/null) /usr/local/bin/ollama /usr/bin/ollama; \ sudo rm -rf /usr/share/ollama; \ rm -rf ~/.ollama Won't miss u
Indonesia
0
0
0
20
byMAR.CO
byMAR.CO@MarcoTundo·
@PurzBeats Cool idea! I've been experimenting with a Comfy CLI that any agent can pick up and use, with predefined workflows. I simply set node names with @ and the cli surfaces them to the agent, so it doesn't get lost in the workflows. Works great! github.com/BuffMcBigHuge/…
English
0
0
7
103
Purz.ai
Purz.ai@PurzBeats·
Did a quick experiment: by spinning up a local instance of ComfyUI, Claude Code can operate entirely headless — issuing JSON directly and interacting with the system through API calls. Pretty interesting to see how naturally an LLM can slot into a graph-based toolchain.
English
12
5
76
3.1K
byMAR.CO
byMAR.CO@MarcoTundo·
@wildmindai It's amazing isn't it! Just a small note - I've had more success with 27B than 35B A3B. Something about the dense performing better, perhaps it's the llama.cpp settings? 🤷
English
0
0
1
638
StubbyTech
StubbyTech@cuylertech·
@sudoingX The -np 1 was the one I was missing. So far so good 🙏 easy 122t/s
English
4
0
16
79K
Sudo su
Sudo su@sudoingX·
this is the worst local AI will ever be. tomorrow it gets faster. next month the models get smarter. next year your GPU runs what a data center runs today. Qwen3.5-35B-A3B on a single 3090. told it to visualize its own expert routing. 256 experts, 8 active per token, rendered in 3D on the same GPU running inference. no API key. no subscription. no permission needed. closed AI isn't losing ground. it's losing the argument.
Sudo su@sudoingX

if open source models are hitting 113 tok/s at 262K context on a single rtx 3090s today, edge devices aren't far. every engineer, every researcher, every builder will run local inference. no API keys, no subscriptions, no permission needed. nothing is more dangerous to closed AI than open source that actually works. chinese labs have been shipping relentlessly. MoE, hybrid architectures, models designed to run on smaller hardware. constraints bred innovation. while others gatekeep, they ship. and the results speak for themselves. been running this little one for hours now and i just love it. this is actually good if you can steer it.

English
28
43
643
106.2K
byMAR.CO
byMAR.CO@MarcoTundo·
@tom_doerr I like the idea, but scouring through that repo, yikes I'll pass on this one.
English
0
0
0
81
byMAR.CO
byMAR.CO@MarcoTundo·
@AIWarper AceStep 1.5 is crazy in the audio world.
English
1
0
4
524
A.I.Warper
A.I.Warper@AIWarper·
Is there anything worth playing with in open source right now? LTX was promising but honestly I have zero motivation to dive into that at this point.
English
25
0
45
7.5K
byMAR.CO
byMAR.CO@MarcoTundo·
@BennyKokMusic Here's a summary of the bugs found and the fixes applied.
English
0
0
0
76
BennyKok
BennyKok@BennyKokMusic·
Now I have the full picture. Let me write the plan.
English
3
1
15
1.2K
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
@TheAhmadOsman 512GB of GPUs will cost you a fortune, in buying and running. Mac Studios will silently win 🤷🏻‍♂️
English
3
1
25
2.1K
Ahmad
Ahmad@TheAhmadOsman·
GPUs are great because theyʼre like Mac Studios for AI but they actually work
English
31
13
281
34.2K
byMAR.CO
byMAR.CO@MarcoTundo·
@steipete @racheltnguyen @openclaw And for those who disable the browser tool and browser boolean, anything change or does the agent still want to run dead tools, causing it to decide to remove the params I explicitly set and restart gateway on its own lool?
English
0
0
1
42
byMAR.CO retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
First there was chat, then there was code, now there is claw. Ez
English
161
187
3.4K
358.7K
byMAR.CO
byMAR.CO@MarcoTundo·
@steipete Has to downgrade my OpenClaw last night. Too many bugs are finding their way into releases. Perhaps we need more humans reviewing the code.
English
0
0
0
31
Peter Steinberger 🦞
Peter Steinberger 🦞@steipete·
Been wrangling a lot of time how to deal with the onslaught of PRs, none of the solutions that are out there seem made for our scale. I spun up 50 codex in parallel, let them analyze the PR and generate a JSON report with various signals, comparing with vision, intent (much higher signal than any of the text), risk and various other signals. Then I can ingest all reports into one session and run AI queries/de-dupe/auto-close/merge as needed on it. Same for Issues. P rompt R equests really are just issues with additional metadata. Don't even need a vector db. Was thinking way too complex for a while. There's like 8 PRs for auto-update in the last 2 days alone (still need to ingest 3k PRs, only have 1k so far).
Peter Steinberger 🦞 tweet media
English
425
211
4.1K
568.3K
byMAR.CO
byMAR.CO@MarcoTundo·
@JsonBasedman Can't we just take the SF money and bring it over here? What's stopping us?
English
0
0
1
63
json
json@JsonBasedman·
I kinda want to start a group chat for Canadians who just want to fucking win.
English
130
11
934
51.1K
BennyKok
BennyKok@BennyKokMusic·
Is there open source open router that we can self host?
English
5
0
6
841
byMAR.CO
byMAR.CO@MarcoTundo·
There are many ways you can give it browser access. Two that I use is Agent-Browser from Vercel, and CDP connected to a local browser host. Brave Web Search is a bit week and the relay setup sucks.
English
0
0
3
32
Bunagaya
Bunagaya@Bunagayafrost·
5 acres. Two humanoids. 3D printer. Starlink. Solar roof. A few GPUs running a local AGI. Security drones, chickens roam, fruit trees heavy with harvest. Kids run barefoot past potato rows, as robots fix the fence. Civilization optional. Family of 4 on the techno homestead.
Bunagaya tweet media
English
353
569
5.9K
620.9K
EDDY V
EDDY V@EddyVGG·
back in canada btw. what have i missed?
English
8
0
34
911