byMAR.CO

841 posts

byMAR.CO

@MarcoTundo

Master Biomechanical Engineer, Human AI Consultant - I build AI on a Farm. Prev CTO https://t.co/YUQrtFP3DJ, https://t.co/oEOh3lNDP3

Canada Katılım Mart 2010

643 Takip Edilen280 Takipçiler

byMAR.CO@MarcoTundo·13h

@sudoingX How do you like Hermes? Better than OpenClaw? I've found Qwen3.5-27B Q6 working amazing with 24gb and 230k ctx

English

Sudo su@sudoingX·1d

i did not use LM Studio or Ollama. compile llama.cpp from source with CUDA. for personal inference and efficiency nothing else comes close. exact flags: ./llama-server -m Qwen3.5-9B-Q4_K_M.gguf -ngl 99 -c 131072 -np 1 -fa on --cache-type-k q4_0 --cache-type-v q4_0 --host 0.0.0.0 then install hermes agent and point it at localhost:8080. 31 tools. 11 model-specific parsers. persistent memory. the agent handles the rest.

English

2.1K

Sudo su@sudoingX·1d

hear this anon you don't need a $4,699 box to get started local AI. use what you already have first. test your workload. this is what a $250 GPU did today. iteration 3 of octopus invaders is here. 4 phases. 6 prompts. zero handwritten code. the same 9B on the same 3060 fixed its own enemy spawning, patched a dual start conflict, added level progression, resized every bullet, and when the browser cached old files it figured that out on its own and added version parameters to force reload. 3,200+ lines across 13 files. every line by qwen 3.5 9B Q4 at 35-50 tok/s on 12 gigs through hermes agent. understand what your load actually needs before you build. don't get trapped by influencers selling you boxes next to a plant. test on what you have. then decide. this 3060 impressed me in ways i did not expect and its autonomy is what kept me going. now its time to move to new experiments on other nodes and other models for all of us. if you are running this setup the exact stack, flags, and open source code, exact prompts i used are in the replies. if you run into issues let me know. seeing students and builders discover hermes from my posts and start running local is why i do this. full autonomous build at 8x speed in the video. gameplay at the end. watch it.

Sudo su@sudoingX

this is what 12 gigs of VRAM built in 2026. a 9 billion parameter model running on a 5 year old RTX 3060 wrote a full space shooter from a single prompt. blank screen on first try. i came back with a bug list and the same model on the same card fixed every issue across 11 files without touching a single line myself. enemies still looked wrong so i pushed another iteration and now the game has pixel art octopi, particle effects, screen shake, projectile physics and a combo system. all running locally on a card that was designed to play fortnite. three iterations. zero cloud. zero API calls. every token generated on hardware sitting under my desk. the model reads its own code, finds what's broken, patches it, validates syntax and restarts the server. i just describe what's wrong and it handles the rest. people are paying monthly subscriptions to type into a browser tab and wait for a server farm to respond. meanwhile a GPU you can find used on ebay is running a full autonomous hermes agent framework with 31 tools, 128K context window and thinking mode generating at 29 tokens per second nonstop. the game still needs work. level upgrades don't trigger and boss fights need tuning. but the fact that i'm iterating on gameplay balance instead of debugging whether the code runs at all tells you where this is headed. every iteration the game gets better on the same hardware. same 12 gigs. same 9 billion parameters. same RTX 3060 from 5 years ago your GPU is not a gaming card anymore. it's a local AI lab that never sends your data anywhere.

English

442

44.4K

byMAR.CO@MarcoTundo·6d

@__tinygrad__ Dope - now to figure out that power supply setup.

English

1.3K

the tiny corp@__tinygrad__·6d

Mac Mini + eGPU. Both NVIDIA and AMD supported.

Magyar

138

210

2.8K

311.2K

byMAR.CO@MarcoTundo·11 Mar

@fofrAI c̶a̶n̶ ̶w̶e̶

136

fofr@fofrAI·11 Mar

Incorporating Gemini Embedding 2 to improve search on my local prompting tools, as simple as asking OpenClaw (with Gemini 3.1 Pro):

fofr@fofrAI

Gemini Embedding 2 is out. It’s a natively multimodal embedding model that maps text, images, video, audio and documents into a single embedding space: - text, up to 8192 input tokens - images, up to 6 images per request - videos, up to 120 seconds - audio, natively ingests and embeds audio data without any intermediate text transcriptions - documents, directly embed PDFs up to 6 pages long blog.google/innovation-and…

English

byMAR.CO@MarcoTundo·11 Mar

sudo systemctl stop ollama 2>/dev/null; \ sudo systemctl disable ollama 2>/dev/null; \ sudo rm -f /etc/systemd/system/ollama.service; \ sudo rm -f $(which ollama 2>/dev/null) /usr/local/bin/ollama /usr/bin/ollama; \ sudo rm -rf /usr/share/ollama; \ rm -rf ~/.ollama Won't miss u

Indonesia

byMAR.CO@MarcoTundo·9 Mar

@peiqing001 Similar to Niko's github.com/nikopueringer/… ?

Español

863

Peiqing Yang@peiqing001·9 Mar

🚀 𝙈𝙖𝙩𝘼𝙣𝙮𝙤𝙣𝙚 2 is accepted to #CVPR2026! 🔥A stronger version of 𝙈𝙖𝙩𝘼𝙣𝙮𝙤𝙣𝙚 with finer details and enhanced robustness🔥 -🏡: pq-yang.github.io/projects/MatAn… -📜: arxiv.org/abs/2512.11782 -👩🏻‍💻: github.com/pq-yang/MatAny…

English

809

45.9K

byMAR.CO@MarcoTundo·8 Mar

@PurzBeats Cool idea! I've been experimenting with a Comfy CLI that any agent can pick up and use, with predefined workflows. I simply set node names with @ and the cli surfaces them to the agent, so it doesn't get lost in the workflows. Works great! github.com/BuffMcBigHuge/…

English

103

Purz.ai@PurzBeats·8 Mar

Did a quick experiment: by spinning up a local instance of ComfyUI, Claude Code can operate entirely headless — issuing JSON directly and interacting with the system through API calls. Pretty interesting to see how naturally an LLM can slot into a graph-based toolchain.

English

3.1K

byMAR.CO@MarcoTundo·3 Mar

@wildmindai It's amazing isn't it! Just a small note - I've had more success with 27B than 35B A3B. Something about the dense performing better, perhaps it's the llama.cpp settings? 🤷

English

638

Wildminder@wildmindai·3 Mar

Your own company, 24/7 on your own laptop. Seriously! Qwen 3.5 Small - for bigger aims. - No per‑token anxiety - Zero cloud latency - Always-on, offline agents - Unrestricted local vision - Scales with your hardware - Absolute data privacy 4B -> simple stuff 9B -> agentic work, eyes, data scout 35B -> heavy tasks

Qwen@Alibaba_Qwen

🚀 Introducing the Qwen 3.5 Small Model Series Qwen3.5-0.8B · Qwen3.5-2B · Qwen3.5-4B · Qwen3.5-9B ✨ More intelligence, less compute. These small models are built on the same Qwen3.5 foundation — native multimodal, improved architecture, scaled RL: • 0.8B / 2B → tiny, fast, great for edge device • 4B → a surprisingly strong multimodal base for lightweight agents • 9B → compact, but already closing the gap with much larger models And yes — we’re also releasing the Base models as well. We hope this better supports research, experimentation, and real-world industrial innovation. Hugging Face: huggingface.co/collections/Qw… ModelScope: modelscope.cn/collections/Qw…

English

702

74.2K

byMAR.CO@MarcoTundo·27 Şub

@cuylertech @sudoingX But 1 is the default value for -np. github.com/ggml-org/llama…

English

221

StubbyTech@cuylertech·26 Şub

@sudoingX The -np 1 was the one I was missing. So far so good 🙏 easy 122t/s

English

79K

Sudo su@sudoingX·26 Şub

this is the worst local AI will ever be. tomorrow it gets faster. next month the models get smarter. next year your GPU runs what a data center runs today. Qwen3.5-35B-A3B on a single 3090. told it to visualize its own expert routing. 256 experts, 8 active per token, rendered in 3D on the same GPU running inference. no API key. no subscription. no permission needed. closed AI isn't losing ground. it's losing the argument.

Sudo su@sudoingX

if open source models are hitting 113 tok/s at 262K context on a single rtx 3090s today, edge devices aren't far. every engineer, every researcher, every builder will run local inference. no API keys, no subscriptions, no permission needed. nothing is more dangerous to closed AI than open source that actually works. chinese labs have been shipping relentlessly. MoE, hybrid architectures, models designed to run on smaller hardware. constraints bred innovation. while others gatekeep, they ship. and the results speak for themselves. been running this little one for hours now and i just love it. this is actually good if you can steer it.

English

643

106.2K

byMAR.CO@MarcoTundo·24 Şub

@tom_doerr I like the idea, but scouring through that repo, yikes I'll pass on this one.

English

Tom Dörr@tom_doerr·23 Şub

Scans codebases to teach patterns to AI github.com/dadbodgeoff/dr…

English

126

8.3K

byMAR.CO@MarcoTundo·24 Şub

@AIWarper AceStep 1.5 is crazy in the audio world.

English

524

A.I.Warper@AIWarper·24 Şub

Is there anything worth playing with in open source right now? LTX was promising but honestly I have zero motivation to dive into that at this point.

English

7.5K

byMAR.CO@MarcoTundo·24 Şub

@BennyKokMusic Here's a summary of the bugs found and the fixes applied.

English

BennyKok@BennyKokMusic·24 Şub

Now I have the full picture. Let me write the plan.

English

1.2K

byMAR.CO@MarcoTundo·23 Şub

@ivanfioravanti @TheAhmadOsman Until you need faster prompt processing. GPUs prevail again.

English

Ivan Fioravanti ᯅ@ivanfioravanti·22 Şub

@TheAhmadOsman 512GB of GPUs will cost you a fortune, in buying and running. Mac Studios will silently win 🤷🏻‍♂️

English

2.1K

Ahmad@TheAhmadOsman·22 Şub

GPUs are great because theyʼre like Mac Studios for AI but they actually work

English

281

34.2K

byMAR.CO@MarcoTundo·23 Şub

@steipete @racheltnguyen @openclaw And for those who disable the browser tool and browser boolean, anything change or does the agent still want to run dead tools, causing it to decide to remove the params I explicitly set and restart gateway on its own lool?

English

Peter Steinberger 🦞@steipete·23 Şub

@racheltnguyen @openclaw 🙏 browser should work much more reliable!

English

2.7K

Peter Steinberger 🦞@steipete·23 Şub

New CHUNKY @openclaw beta is up. Will wait a few h before flipping the switch to catch regressions. Reply to this tweet if you find any blockers that were not in .21 github.com/openclaw/openc…

English

117

995

103.4K

byMAR.CO retweetledi

Andrej Karpathy@karpathy·21 Şub

First there was chat, then there was code, now there is claw. Ez

English

161

187

3.4K

358.7K

byMAR.CO@MarcoTundo·22 Şub

@steipete Has to downgrade my OpenClaw last night. Too many bugs are finding their way into releases. Perhaps we need more humans reviewing the code.

English

Peter Steinberger 🦞@steipete·22 Şub

Been wrangling a lot of time how to deal with the onslaught of PRs, none of the solutions that are out there seem made for our scale. I spun up 50 codex in parallel, let them analyze the PR and generate a JSON report with various signals, comparing with vision, intent (much higher signal than any of the text), risk and various other signals. Then I can ingest all reports into one session and run AI queries/de-dupe/auto-close/merge as needed on it. Same for Issues. P rompt R equests really are just issues with additional metadata. Don't even need a vector db. Was thinking way too complex for a while. There's like 8 PRs for auto-update in the last 2 days alone (still need to ingest 3k PRs, only have 1k so far).

English

425

211

4.1K

568.3K

byMAR.CO@MarcoTundo·20 Şub

@JsonBasedman Can't we just take the SF money and bring it over here? What's stopping us?

English

json@JsonBasedman·20 Şub

I kinda want to start a group chat for Canadians who just want to fucking win.

English

130

934

51.1K

byMAR.CO@MarcoTundo·20 Şub

@BennyKokMusic lmstudio is the most competent.

English

BennyKok@BennyKokMusic·19 Şub

Is there open source open router that we can self host?

English

841

byMAR.CO@MarcoTundo·16 Şub

There are many ways you can give it browser access. Two that I use is Agent-Browser from Vercel, and CDP connected to a local browser host. Brave Web Search is a bit week and the relay setup sucks.

English

byMAR.CO@MarcoTundo·16 Şub

@haithehuman @Bunagayafrost Haha quite accurate 😅

Italiano

Hai The Dude@haithehuman·16 Şub

@Bunagayafrost @MarcoTundo you

Bunagaya@Bunagayafrost·15 Şub

5 acres. Two humanoids. 3D printer. Starlink. Solar roof. A few GPUs running a local AGI. Security drones, chickens roam, fruit trees heavy with harvest. Kids run barefoot past potato rows, as robots fix the fence. Civilization optional. Family of 4 on the techno homestead.