Sudo su
8.4K posts

Sudo su
@sudoingX
GPU/local LLM. more RAM and OSS... everywhere

We gave people tiny computers at Code with Claude. Here are some of the small, delightful things they built:



On-Premise Business AI Center After my posts on the 2-GPU and 4-GPU builds, people reached out asking how to build an 8-GPU box for their businesses. Why? - Protect their IP - Protect customer data - Save on inference costs - Train their own models Here's how to build one: 🧵




my dgx spark just woke up and chose violence. step-3.5-flash-REAP-121B loaded. 121 billion parameters, 11B active per token, running locally on unified memory. It asked using hermes agent if i want to install dependencies and fire the test suite. i said yes from my phone while stuck in bangkok traffic. by the time i got home it was already executing code autonomously. this model is becoming my new favorite on spark. fast enough for agentic loops. smart enough for real work. if you have a spark and you're not running step-3.5 REAP, you're leaving performance on the table.



See you today at AI DEMO DAY! Updating today’s agenda | May 12, 2026 14:30 - The Living Fair Opens: Check-in via AI Passport and start your mission across the Showcase and Booth Zones. 15:00 - Opening Remarks by Abhisit Vejjajiva @Abhisit_DP former Prime Minister, Leader of Democrat Party and MP 15:10 | Forum 1: Built in Bangkok — Why Builders Are Choosing This City (Typoon LLM, Cysmiq, OnlyFounders) 16:00 | BKK Showcase Batch #1 16:40 | Forum 2: Scaling the Engine — Capital, Infrastructure, and Bangkok’s AI Future (Tiwa York, @KornGoThailand ) 17:20 | The Hero Workshop: Live Build - Vibe Coding Session by Claw Collective 18:20 | BKK Showcase Batch #2 19:00 | Closing Remarks by Korn Chatikavanij (Former Finance Minister, Deputy Leader of Democrat Party and MP)

look anon, those of you who kept saying local AI is not there yet, who said open source can't compete, who said you need cloud APIs to get anything serious done, look at this gameplay for one minute. every pixel on this screen was written by one model, in one shot, on a single rtx 3090 with 24gb of vram. the model is qwen 3.6 27b dense q4. the harness is hermes agent. the hardware is a single consumer card you can buy used for 900 dollars. the prompt is open source on github. every claim verifiable, on your own desk. if your local AI take is from 2024, update it. the consumer tier is shipping work that was supposed to need 8 gpus and an api key. open source moved the floor while the rest of the field was busy explaining why it cannot. 24gb tier owners are eating ramen with half boiled egg and double chocolate.



update: qwen 3.6 27b dense q4 just one shotted octopus invaders game on a single 3090. hermes agent drove the whole thing, ~41 tok/s gen 21gb vram at full 262k context, thinking mode on. one prompt in and the canonical multi-file space shooter benchmark out, the same exact prompt i ran on qwen 3.5 27b dense back in march on the same card. 3.5 needed one external scope bug fix before the game would even load on first play. 3.6 needed nothing. 11 of 11 files written, 2411 lines of code, zero steering interventions, zero external fixes, playable on first load. 16 minutes 41 seconds wall clock from prompt to playable. consumer tier king on a single 3090 is locked tonight, and the silicon underneath my desk did not change between march and now. the open source ecosystem just moved the floor. watch it ship itself, the full 16 minutes 41 seconds sped to 3 minutes 45, no human touched the keyboard between the first prompt and the final frame.



let me say this out loud here: there is absolutely zero reason to use openclaw in may 2026. a general agent exists. hermes agent does coding, video editing, marketing design, research, browser automation, terminal work. one tool, all under your roof.


@sudoingX What is the reason for using Openclaw at this point? I have had zero issues with Hermes.

@sudoingX wait qwen 3.6 27B runs on a single 3090 ?

i declare qwen 3.6 27b dense q4 the king of a single rtx 3090 card. not even close. this model is absolute beast on local ai, ruthless on agentic loops, owns its own thinking. anyone can use it on single 3090, the weights are open, the stack is reproducible, the prompt is canonical, every claim below is verifiable on your own hardware. the octopus invaders one shot you are seeing is the visible test. i run these models on workloads you wouldn't think to ask for and i couldn't show you if i wanted to, and qwen 3.6 27b dense q4 quietly does the heavy lifting on a single consumer card while the rest of the field is busy explaining why it cannot. if you think a different model is king on a single 3090 right now, name it. drop your card, drop your model, drop your numbers. the throne is not crowded.



