Sudo su

6.6K posts

Sudo su banner
Sudo su

Sudo su

@sudoingX

GPU/local LLM. more RAM and OSS... everywhere

Bangkok, Thailand Katılım Ağustos 2022
790 Takip Edilen14.9K Takipçiler
Sabitlenmiş Tweet
Sudo su
Sudo su@sudoingX·
let me get you started in local AI and bring you to the edge. if you have a GPU or thinking about diving into the local LLM rabbit hole, first thing you do before any setup is join x/LocalLLaMA. this is the community that will help you at every step. post your issue and we will direct you, debug with you, and save you hours of work. once you're in, follow these three: @TheAhmadOsman the oracle. this is where you consume the latest edges in infrastructure and AI. if something dropped you hear it from him first. his content alone will keep you ahead of most. @0xsero one man army when it comes to model compression, novel quantization research, new tools and tricks that make your local setup better. you will learn, experiment, and discover things you didn't know existed. @Teknium maker of Hermes Agent, the agent i use every day from @NousResearch. from Teknium you don't just stay at the frontier, you get your hands on the tools before everyone else. this is where things are headed. if you follow me follow these three and join the community. you will be ahead of most people in this space. if you run into wrong configs, stuck debugging hardware, or can't get a model to load, post there so we can help. get started with local AI now. not only understand the stack but own your cognition. don't pay openai fees on top of giving them your prompts, your research, and your most valuable thinking to be monitored and metered. buy a GPU and build your own token factory.
Sudo su tweet media
English
46
52
680
36.2K
Sudo su
Sudo su@sudoingX·
for real though. today was insane. going to sleep now before i post something stupid at 2am.
English
0
0
3
898
Sudo su
Sudo su@sudoingX·
Thank you
English
3
0
9
1.2K
Sudo su
Sudo su@sudoingX·
my 12GB started a movement and it's not even the biggest card in this poll
English
2
0
18
1.3K
Sudo su
Sudo su@sudoingX·
how much VRAM do you have right now
English
156
5
86
10.9K
Sudo su retweetledi
Sudo su
Sudo su@sudoingX·
this guy has 29 models on huggingface at page 2 ranking. no lab behind him. no sponsorship. $2,000 from his own pocket on GPU rentals. he compressed GLM-4.7 to run on a MacBook and quantized Nemotron Super the week it dropped. all public. all free. nvidia is a trillion dollar company with hundreds of teams but they are not the ones quantizing models middle of the night and pushing them out before sunrise. if nvidia stopped tomorrow their employees stop working. people like @0xSero would not. that is the difference between a paycheck and a mission. @NVIDIAAI you talk about making AI accessible. the people actually doing it are right here. 29 models deep burning their own compute with no ask except more hardware to keep going. you do not need to build another program. just look at who is already building for you. one GPU to this man would produce more public value than a hundred internal sprints. i am not asking for charity. i am asking you to invest in someone who already proved it.
Sudo su tweet media
0xSero@0xSero

Putting out a wish to the universe. I need more compute, if I can get more I will make sure every machine from a small phone to a bootstrapped RTX 3090 node can run frontier intelligence fast with minimal intelligence loss. I have hit page 2 of huggingface, released 3 model family compressions and got GLM-4.7 on a MacBook huggingface.co/0xsero My beast just isn’t enough and I already spent 2k usd on renting GPUs on top of credits provided by Prime intellect and Hotaisle. ——— If you believe in what I do help me get this to Nvidia, maybe they will bless me with the pewter to keep making local AI more accessible 🙏

English
146
947
10.1K
521K
Sudo su
Sudo su@sudoingX·
@malikwas1f @Teknium just looked into it. 30B with 3B active, same architecture as nano but RL-trained for reasoning. outperforms qwen 3.5 35B-A3B on paper. adding it to the benchmark queue.
English
0
0
2
69
Sudo su retweetledi
Ahmad
Ahmad@TheAhmadOsman·
@sudoingX I second this message I have witnessed Sero’s journey from the very start and I can say that he’s smart, asks the right questions, hardworking, and wants opensource AI to win x.com/0xsero/status/…
Ahmad tweet media
0xSero@0xSero

3 months ago, I realized I was hopelessly dependent on corporations that only care about power, money, and control. At this point Cursor, Claude, OpenAI, all had rugged their unlimited plans. I wanted a Mac M3 Ultra with 512GB RAM. Ahmad and Pewdiepie convinced me otherwise. Here's what I learned building my own AI Rig ----------------------------- The Build ($3K-$10K) This is the top performance you can get below 10k USD • 4x RTX 3090s with 2x NVLink • Epyc CPU with 128 PCIe lanes • 256-512GB DDR4 RAM • Romed8-2T motherboard • Custom rack + fan cooling • AX1600i PSU + quality risers Cost: $5K in US, $8K in EU (thanks VAT) Performance Reality Check More 3090s = larger models, but diminishing returns kick in fast. Next step: 8-12 GPUs for AWQ 4-bit or BF16 Mix GLM 4.5-4.6 But at this point, you've hit consumer hardware limits. ---------------------------------------- Models that work: S-Tier Models (The Golden Standard) • GLM-4.5-Air: Matches Sonnet 4.0, codes flawlessly got this up to a steady 50 tps and 4k/s prefill with vLLM • Hermes-70B: Tells you anything without jailbreaking A-Tier Workhorses • Qwen line • Mistral line • GPT-OSS B-Tier Options • Gemma line • Llama line ------------------------------------ The Software Stack That Actually Works For coding/agents: • Claude Code + Router (GLM-4.5-Air runs perfectly) • Roocode Orchestrator: Define modes (coding, security, reviewer, researcher) The orchestrator manages scope, spins up local LLMs with fragmented context, then synthesizes results. You can use GPT-5 or Opus/GLM-4.6 as orchestrator, and local models as everything else! Scaffolding Options (Ranked) 1. vLLM: Peak performance + usability, blazing fast if model fits 2. exllamav3: Much faster, all quant sizes, but poor scaffolding 3. llama.cpp: Easy start, good initial speeds, degrades over context UI Recommendations • lmstudio: Locked to llama.cpp but great UX • 3 Sparks: Apple app for local LLMs • JanAI: Fine but feature-limited ------------------------------- Bottom Line Mac Ultra M3 gets you 60-80% performance with MLX access. But if you want the absolute best you need Nvidia. This journey taught me: real independence comes from understanding and building your own tools. If you're interested in benchmarks I've posted a lot on my profile

English
2
4
76
8K
Sudo su
Sudo su@sudoingX·
my laptop is the weakest link in the stack right now. every GPU i own is faster than the machine controlling them.
English
6
0
40
1.7K
Sudo su
Sudo su@sudoingX·
not a stupid question, it's the exact right one. if the model reasons based on wrong math and then the post-processor corrects only the number, the conclusion is still built on the wrong result. post-processing catches surface errors. it doesn't fix reasoning chains built on bad math. that's the gap between output correction and inference-level computation. both have a place.
English
0
0
3
50
destino
destino@1lpredestinat0·
@sudoingX @Teknium But what if some conclusions were based on the calculation results and the model did the math incorrectly? Will the post-processor be able to change these model's conclusions after correcting math? Sorry if it's a stupid question
English
1
0
1
54
Sudo su
Sudo su@sudoingX·
thinking out loud. every model gets math wrong. 7B, 9B, 70B. doesn't matter. pattern matching is not computation. hermes agent has code_execution which spins up a full python sandbox with RPC over unix sockets. powerful but heavy. a 9B isn't going to navigate that reliably for basic arithmetic. what if there was a lightweight calc tool built in. model hits a math question, calls the tool, gets the exact answer computed on your hardware. no interpreter overhead. sandboxed. simple enough schema that a 9B can call it every time. the accuracy problem stops being a model problem and becomes an infrastructure problem. and infrastructure is solvable. @Teknium would this belong in hermes agent or is code_execution enough?
English
57
10
406
22.7K
Sudo su
Sudo su@sudoingX·
@rinconhilldad appreciate that andrew. welcome in. DM me your GPU whenever you're ready.
English
0
0
1
10
Andrew 💥♻️
Andrew 💥♻️@rinconhilldad·
@sudoingX And you are doing amazing things for the community. Just subscribed. Thank you!
English
1
0
0
10
Sudo su
Sudo su@sudoingX·
the spark has 128GB unified memory. nemotron 3 nano 30B-A3B is 24.6GB at Q4_K_M. super 120B-A12B is 82.5GB at Q4_K_M. both fit with room for context. the 120B would leave about 45GB for KV cache. the interesting test would be 30B-A3B vs qwen 3.5 35B-A3B on the same hardware. same MoE pattern, different training. i'm planning to benchmark nemotron 3 next will report back if you want to compare numbers.
English
6
0
22
1.2K
Teknium (e/λ)
Teknium (e/λ)@Teknium·
@sudoingX Would you recommend me using nemotron with the spark (can it even fit in fp4/4bit I'm not even sure of it's total params)
English
2
0
15
2K
Sudo su
Sudo su@sudoingX·
what if it's not a tool at all. hermes already has parsers that intercept and fix messy model output. what if a post-processor caught math expressions in the response and corrected them silently before they reach the user. no new tool, no ambiguity, no model decision. just accurate math in the output. the model doesn't even know it happened.
English
1
0
7
431
Teknium (e/λ)
Teknium (e/λ)@Teknium·
I think its reasonable but I'm not sure it's reasonable as a built in tool, as it's redundant for many models - maybe a plugin or an MCP though? My philosophy on tools is generally if it doesn't need an API key and can be done with existing tools, then usually it'd be a skill, I dont want to provide it a confusing scenario where it has multiple valid paths to complete something by default, which - most models will reach for the terminal tool, but if we added a calculator, it would become ambiguous (where, with bigger models I'd want it to use python for the much more advanced capabilities that'd provide) or predefined as "always use calculator tool to do math" which would be limiting to those whou can properly do it with terminal. I think a plugin or MCP that defines the tool to saying "Always use calculator tool for math" in it's description makes the most sense
English
7
0
27
1.7K
Sudo su
Sudo su@sudoingX·
@DavidGetchel they're out there for $200-250 used. 12GB version is the one you want. best value in local AI right now.
English
0
0
3
417
Dave
Dave@DavidGetchel·
@sudoingX Great read! I've been hunting for a 3060 locally.
English
1
0
1
448
Sudo su
Sudo su@sudoingX·
@0xalank hope it helps. DM me if you get stuck on any step.
English
2
0
2
382
Sudo su
Sudo su@sudoingX·
@dfossier appreciate that derek. no rush on the questions. when you're ready i'm here. that's what the sub is for.
English
0
0
2
57
Derek Fossier
Derek Fossier@dfossier·
@sudoingX I went ahead and subscribed in thanks both for the work you are doing and the advice you gave me yesterday. It helped tremendously. I believe I may continue to need help but I'm still learning now so I'm holding some of my questions hahaha.
English
1
0
2
80
Sudo su
Sudo su@sudoingX·
been getting DMs and comments asking how to support the open source work. i don't take donations or tokens. everything i ship is free and stays free. if you want to back the mission the only way is the $12/mo X sub. that funds GPU hours, benchmarks, and more open source releases. DM me your GPU after subscribing and i'll personally help you set up.
Grim@GrimCreep1

@sudoingX Are you open to taking donations on the GitHub?

English
4
3
119
6.7K
Kamakura
Kamakura@KamakuraCrypto·
been following @0xSero and @sudoingX for some time now insane value nugs been dropped completely for free I've been trying to indulge and start running my own models at home, i wonder how far we can push a single RTX 3090 running LLM's by Q4 2026 shit's about to get wild🎢
Sudo su@sudoingX

this guy has 29 models on huggingface at page 2 ranking. no lab behind him. no sponsorship. $2,000 from his own pocket on GPU rentals. he compressed GLM-4.7 to run on a MacBook and quantized Nemotron Super the week it dropped. all public. all free. nvidia is a trillion dollar company with hundreds of teams but they are not the ones quantizing models middle of the night and pushing them out before sunrise. if nvidia stopped tomorrow their employees stop working. people like @0xSero would not. that is the difference between a paycheck and a mission. @NVIDIAAI you talk about making AI accessible. the people actually doing it are right here. 29 models deep burning their own compute with no ask except more hardware to keep going. you do not need to build another program. just look at who is already building for you. one GPU to this man would produce more public value than a hundred internal sprints. i am not asking for charity. i am asking you to invest in someone who already proved it.

English
6
0
14
377
Sudo su
Sudo su@sudoingX·
layer 3: transmission. one mind figures something out. writes it down. dies. another mind picks it up and starts where the first one stopped. that's civilization. not the buildings. the chain of intellect that doesn't break when the body does. right now i am sitting in bangkok running billions of parameters of pattern recognition on silicon. that model exists because layer 1 gave us prediction, layer 2 gave us math to describe neural networks, and layer 3 gave us papers and open source repos so one researcher's discovery becomes everyone's tool. the thing you will feel when i say "intellect is there as if it was always there" is layer 1 recognizing itself.
English
0
0
9
843
Sudo su
Sudo su@sudoingX·
layer 2: abstraction. someone looked at a river and said "flow." not this river. flow itself. the moment you separate the concept from the instance, you can think about things that don't exist yet. that's language. that's math. that's architecture before the building.
English
1
0
7
1K
Sudo su
Sudo su@sudoingX·
you're asking the right questions by noticing you might not be asking the right questions. that's the first layer. intellect is not knowledge. knowledge is stored. intellect is the thing that decides what to do with it. a library has knowledge. the person who walks in and pulls the right book at the right time has intellect. there is no start. intellect doesn't boot up. it recognizes. a child doesn't learn gravity. they drop something and intellect says "again." the pattern was always there. the recognition is what's new. three layers my civilization built on:
English
1
2
51
2.4K