Sudo su

6.6K posts

Sudo su banner
Sudo su

Sudo su

@sudoingX

GPU/local LLM and more RAM.

Bangkok, Thailand Bergabung Ağustos 2022
783 Mengikuti13.6K Pengikut
Tweet Disematkan
Sudo su
Sudo su@sudoingX·
let me get you started in local AI and bring you to the edge. if you have a GPU or thinking about diving into the local LLM rabbit hole, first thing you do before any setup is join x/LocalLLaMA. this is the community that will help you at every step. post your issue and we will direct you, debug with you, and save you hours of work. once you're in, follow these three: @TheAhmadOsman the oracle. this is where you consume the latest edges in infrastructure and AI. if something dropped you hear it from him first. his content alone will keep you ahead of most. @0xsero one man army when it comes to model compression, novel quantization research, new tools and tricks that make your local setup better. you will learn, experiment, and discover things you didn't know existed. @Teknium maker of Hermes Agent, the agent i use every day from @NousResearch. from Teknium you don't just stay at the frontier, you get your hands on the tools before everyone else. this is where things are headed. if you follow me follow these three and join the community. you will be ahead of most people in this space. if you run into wrong configs, stuck debugging hardware, or can't get a model to load, post there so we can help. get started with local AI now. not only understand the stack but own your cognition. don't pay openai fees on top of giving them your prompts, your research, and your most valuable thinking to be monitored and metered. buy a GPU and build your own token factory.
Sudo su tweet media
English
42
50
664
33K
Sudo su me-retweet
Sudo su
Sudo su@sudoingX·
if nvidia spent less time celebrating their own teams at GTC and more time finding independent researchers around the world who actually ship from their apartments on consumer hardware we would see progress move faster than any keynote can announce. the work is happening outside the building. they just haven't looked yet.
English
0
1
17
735
Sudo su
Sudo su@sudoingX·
@0xSero stand corrected. credit to Lambda, Prime Intellect and HotAisle for backing him. the point stands though. more compute to people like this = more open source for everyone.
English
0
0
0
67
0xSero
0xSero@0xSero·
One correction I have had Sponsorships from Lambda, prime intellect and HotAisle Which I am very grateful for. But yes pls compute 🫡
Sudo su@sudoingX

this guy has 29 models on huggingface at page 2 ranking. no lab behind him. no sponsorship. $2,000 from his own pocket on GPU rentals. he compressed GLM-4.7 to run on a MacBook and quantized Nemotron Super the week it dropped. all public. all free. nvidia is a trillion dollar company with hundreds of teams but they are not the ones quantizing models middle of the night and pushing them out before sunrise. if nvidia stopped tomorrow their employees stop working. people like @0xSero would not. that is the difference between a paycheck and a mission. @NVIDIAAI you talk about making AI accessible. the people actually doing it are right here. 29 models deep burning their own compute with no ask except more hardware to keep going. you do not need to build another program. just look at who is already building for you. one GPU to this man would produce more public value than a hundred internal sprints. i am not asking for charity. i am asking you to invest in someone who already proved it.

English
6
0
24
1.3K
Sudo su
Sudo su@sudoingX·
jensen says a $500K engineer should spend $250K on tokens. that is $250K going to API providers every year per engineer. or you give one independent researcher a GPU and they quantize 29 models so thousands of engineers run them locally for free. @0xSero spent $2,000 from his own pocket and produced more accessible AI than most corporate programs with million dollar budgets. talking about making AI accessible and actually making it accessible are not the same thing. x.com/sudoingX/statu…
English
0
0
0
72
Rohan Paul
Rohan Paul@rohanpaul_ai·
Wow, Jensen Huang gave some really powerful point here. 🎯 "Let’s say you have a software engineer or AI researcher and you pay them $500,000 a year; At the end of the year, I’m going to ask that $500,000 engineer, "How much did you spend in tokens?" If that $500,000 engineer did not consume at least $250,000 worth of tokens, I am going to be deeply alarmed. This is no different than one of our chip designers saying, "Guess what? I'm just going to use paper and pencil; I don't think I'm going to need any CAD tools." ---- "the thought that "this is too hard" is gone. The thought that "this is going to take a long time" is gone. The thought that "we’re going to need a lot of people" is gone." --- Jason Calacanis (the host of the podcast) exaplains how Elite performance requires elite investment. LeBron James spends millions on his physical maintenance to stay at the top of his game; modern knowledge workers must "spend" heavily on AI tokens to maintain a "superhuman" level of productivity and professional longevity. --- From @theallinpod YT channel (link in comment)
English
14
15
71
8.4K
Sudo su
Sudo su@sudoingX·
and this is not just nvidia. every corporation sitting on idle compute should be paying attention. there are thousands of GPUs in data centers right now doing nothing between workloads. one idle A100 to an independent researcher produces more open source value in a week than it would sitting cold in a rack for a month. the compute exists. the people exist. someone just needs to connect them.
English
1
2
15
624
Sudo su
Sudo su@sudoingX·
this guy has 29 models on huggingface at page 2 ranking. no lab behind him. no sponsorship. $2,000 from his own pocket on GPU rentals. he compressed GLM-4.7 to run on a MacBook and quantized Nemotron Super the week it dropped. all public. all free. nvidia is a trillion dollar company with hundreds of teams but they are not the ones quantizing models middle of the night and pushing them out before sunrise. if nvidia stopped tomorrow their employees stop working. people like @0xSero would not. that is the difference between a paycheck and a mission. @NVIDIAAI you talk about making AI accessible. the people actually doing it are right here. 29 models deep burning their own compute with no ask except more hardware to keep going. you do not need to build another program. just look at who is already building for you. one GPU to this man would produce more public value than a hundred internal sprints. i am not asking for charity. i am asking you to invest in someone who already proved it.
Sudo su tweet media
0xSero@0xSero

Putting out a wish to the universe. I need more compute, if I can get more I will make sure every machine from a small phone to a bootstrapped RTX 3090 node can run frontier intelligence fast with minimal intelligence loss. I have hit page 2 of huggingface, released 3 model family compressions and got GLM-4.7 on a MacBook huggingface.co/0xsero My beast just isn’t enough and I already spent 2k usd on renting GPUs on top of credits provided by Prime intellect and Hotaisle. ——— If you believe in what I do help me get this to Nvidia, maybe they will bless me with the pewter to keep making local AI more accessible 🙏

English
8
30
348
10.9K
Sudo su
Sudo su@sudoingX·
hermes agent from traffic
Sudo su tweet media
English
2
1
17
635
Sudo su
Sudo su@sudoingX·
been getting DMs and comments asking how to support the open source work. i don't take donations or tokens. everything i ship is free and stays free. if you want to back the mission the only way is the $12/mo X sub. that funds GPU hours, benchmarks, and more open source releases. DM me your GPU after subscribing and i'll personally help you set up.
Grim@GrimCreep1

@sudoingX Are you open to taking donations on the GitHub?

English
1
1
54
2.5K
Sudo su
Sudo su@sudoingX·
@GrimCreep1 appreciate that. best way to support is the $12/mo sub. DM me your GPU and i'll help you set up. every sub funds more GPU hours and more open source.
English
0
0
0
55
Grim
Grim@GrimCreep1·
@sudoingX Are you open to taking donations on the GitHub?
English
1
0
1
2.5K
Sudo su
Sudo su@sudoingX·
hear this anon you don't need a $4,699 box to get started local AI. use what you already have first. test your workload. this is what a $250 GPU did today. iteration 3 of octopus invaders is here. 4 phases. 6 prompts. zero handwritten code. the same 9B on the same 3060 fixed its own enemy spawning, patched a dual start conflict, added level progression, resized every bullet, and when the browser cached old files it figured that out on its own and added version parameters to force reload. 3,200+ lines across 13 files. every line by qwen 3.5 9B Q4 at 35-50 tok/s on 12 gigs through hermes agent. understand what your load actually needs before you build. don't get trapped by influencers selling you boxes next to a plant. test on what you have. then decide. this 3060 impressed me in ways i did not expect and its autonomy is what kept me going. now its time to move to new experiments on other nodes and other models for all of us. if you are running this setup the exact stack, flags, and open source code, exact prompts i used are in the replies. if you run into issues let me know. seeing students and builders discover hermes from my posts and start running local is why i do this. full autonomous build at 8x speed in the video. gameplay at the end. watch it.
Sudo su@sudoingX

this is what 12 gigs of VRAM built in 2026. a 9 billion parameter model running on a 5 year old RTX 3060 wrote a full space shooter from a single prompt. blank screen on first try. i came back with a bug list and the same model on the same card fixed every issue across 11 files without touching a single line myself. enemies still looked wrong so i pushed another iteration and now the game has pixel art octopi, particle effects, screen shake, projectile physics and a combo system. all running locally on a card that was designed to play fortnite. three iterations. zero cloud. zero API calls. every token generated on hardware sitting under my desk. the model reads its own code, finds what's broken, patches it, validates syntax and restarts the server. i just describe what's wrong and it handles the rest. people are paying monthly subscriptions to type into a browser tab and wait for a server farm to respond. meanwhile a GPU you can find used on ebay is running a full autonomous hermes agent framework with 31 tools, 128K context window and thinking mode generating at 29 tokens per second nonstop. the game still needs work. level upgrades don't trigger and boss fights need tuning. but the fact that i'm iterating on gameplay balance instead of debugging whether the code runs at all tells you where this is headed. every iteration the game gets better on the same hardware. same 12 gigs. same 9 billion parameters. same RTX 3060 from 5 years ago your GPU is not a gaming card anymore. it's a local AI lab that never sends your data anywhere.

English
27
32
444
44.5K
Sudo su
Sudo su@sudoingX·
@r0ck3t23 the gap between "AI will destroy us" and "I ran a 9B model on a $300 GPU and it built a game" is the entire problem with this conversation. builders know what this is. commentators don't.
English
2
0
19
565
Dustin
Dustin@r0ck3t23·
Jensen Huang just told every AI leader in the room to grow up. Stop scaring the public with science fiction. Start communicating like the weight of civilization is on your shoulders. Because it is. Huang: “AI is not a biological being. It is not alien. It is not conscious. It is computer software.” That single statement dismantles half the panic surrounding this industry. The mainstream conversation is dominated by people projecting human malice onto math. Alien consciousness onto code. Existential dread onto a software architecture we built, we trained, and we can read. Huang: “We say things like, ‘We don’t understand it at all.’ It is not true. We understand a lot of things about this technology.” When builders tell the public they don’t understand their own creation, the public hears threat. The state responds with control. That is already happening. Palihapitiya asked Huang what he would have told Anthropic during their regulatory clash with the Department of Defense. Huang didn’t attack the technology. He attacked the communication. Huang: “The desire to warn people about the capability of the technology is really terrific. We just have to make sure that we understand that the world has a spectrum, and that warning is good, scaring is less good because this technology is too important to us.” Warning shows risks, mitigation, why upside overwhelms downside. Scaring says we might be building something that destroys us and we can’t stop it. One builds trust. The other invites regulation written in panic. Huang: “To say things that are quite extreme, quite catastrophic, that there’s no evidence of it happening, could be more damaging than people think.” Projecting catastrophe without evidence is not caution. It is sabotage. When your technology is embedded in national defense, the financial system, and healthcare infrastructure, your words carry structural weight. If the architects act terrified of their own product, the response is predictable. Governments step in. They restrict. They seize control of something they don’t understand because the builders told them to be afraid. Huang: “There was a time when nobody listened to us, but now because technology is so important in the social fabric, such an important industry, so important to national security, our words do matter.” Most tech founders have not internalized this. You are no longer a startup founder disrupting an industry. You are running infrastructure that nations depend on. Your statements move policy. Your framing shapes legislation. Your tone determines whether governments treat you as partner or threat. Huang: “We have to be much more circumspect, we have to be more moderate, we have to be more balanced, we have to be far more thoughtful.” Huang did not ask for silence. He asked for precision. The leaders who cannot tell the difference will not be leading for long.
English
129
121
644
62K
Sudo su
Sudo su@sudoingX·
@signulll this is exactly what running local models teaches you. you stop writing code and start evaluating it. the model outputs, you judge. the faster you can spot what's off the faster you ship.
English
0
0
1
162
signüll
signüll@signulll·
with ai increasingly writing more & more code, engineers shift from makers to critics. taste, judgment, & the ability to recognize when something is wrong without being able to immediately articulate why is what compounds now more than ever before. i.e. the terminal skill is aesthetic discernment applied to large systems, which was always rare as hell & is now the only scarce thing.
English
36
8
120
7.1K
Sudo su
Sudo su@sudoingX·
@xeraphims so openai already proved the calc tool approach works. they just over optimized the trigger. the tool itself was the right call. now the open source local stack needs the same tool without the reward hacking.
English
0
0
5
343
Sudo su
Sudo su@sudoingX·
thinking out loud. every model gets math wrong. 7B, 9B, 70B. doesn't matter. pattern matching is not computation. hermes agent has code_execution which spins up a full python sandbox with RPC over unix sockets. powerful but heavy. a 9B isn't going to navigate that reliably for basic arithmetic. what if there was a lightweight calc tool built in. model hits a math question, calls the tool, gets the exact answer computed on your hardware. no interpreter overhead. sandboxed. simple enough schema that a 9B can call it every time. the accuracy problem stops being a model problem and becomes an infrastructure problem. and infrastructure is solvable. @Teknium would this belong in hermes agent or is code_execution enough?
English
33
5
186
10.3K
Sudo su
Sudo su@sudoingX·
@yaboilyrical how do i get my hands on one of those. shipping to bangkok is worth it for hermes merch.
English
0
0
0
36
Sudo su
Sudo su@sudoingX·
@uttertard the language doesn't matter much. the key is the schema the model sees. one field, expression in, answer out. whether the backend is JS, python, or raw C the model just needs to output "847 * 293" and get the right number back.
English
1
0
2
668
uttertard
uttertard@uttertard·
@sudoingX If you want to skip python would a calculator built in javascript and passed as a skill make sense?
English
1
0
2
767
Sudo su
Sudo su@sudoingX·
@startupideaspod that's a lot of duct tape for a problem hermes agent solved at the framework level. persistent memory, session search, daily context. no manual setup. you deserve better tools
English
0
0
12
243
The Startup Ideas Podcast (SIP) 🧃
"Why does my OpenClaw forget everything?" Because nothing was saved in the first place. Here's the 3-layer memory fix: memory.md: - Your agent's long-term brain. - High-level learnings, preferences, insights. - If this file doesn't exist yet, tell your agent to create it. Daily memory folder: - Granular logs created every day. - More detailed than memory.md. - This is where session-level context lives. Compaction flush: - Before your agent summarizes and compresses a long session, force it to write everything to memory first. - Otherwise context gets lost when the window fills up. Then add a 30-minute auto-save heartbeat: - Check if today's memory file exists - Create it if missing - Log a summary of the current session Fix your memory system before you touch anything else. That's where it clicks.
GREG ISENBERG@gregisenberg

THE ULTIMATE GUIDE TO OPENCLAW (1hr free masterclass) 1. fix memory so it compounds add MEMORY.md + daily logs. instruct it to promote important learnings into MEMORY.md because this is what makes it improve over time 2. set up personalization early identity.md, user.md, soul.md. write these properly or everything feels generic. this is what makes it sound like you and understand your world 3. structure your workspace properly most setups break because the foundation is messy. folders, files, and roles need to be clean or everything downstream degrades 4. create a troubleshooting baseline make a separate claude/chatgpt project just for openclaw. download the openclaw docs (context7) and load them in. when things break, it checks docs instead of guessing this alone fixes most issues!! 5. configure models and fallbacks set primary model to GPT 5.4 and add fallbacks across providers. this is what keeps tasks running instead of failing mid-way 6. turn repeat work into skills install summarize skill early. anything you do 2–3 times → turn into a skill. this is how it starts executing real workflows 7. connect tools with clear rules add browser + search (brave api). use managed browser for automation. use chrome relay only when login is neededthis avoids flaky behavior 8. use heartbeat to keep it alive add rules to check memory + cron healthif jobs are stale, force-run themthis prevents silent failures 9. use cron to schedule real work set daily and weekly tasksreports, follow-ups, content workflowsthis is where it starts acting without you 10. lock down security properly move secrets to a separate env file outside workspace. set strict permissions (folder 700, file 600). use allowlists for telegram access. don’t expose your gateway publicly 11. understand what openclaw actually is it’s a system that remembers, acts, and improves. basically, closer to an employee than a tool this ep of @startupideaspod is now out w/ @moritzkremb it's literally a full 1hr free course to take you from from “i installed openclaw”to “this thing is actually working for me” most people are one step away from openclaw working they installed it, they tried it and it didn’t click this ep will make it click all free, no advertisers, i just want to see you build your ideas with ideas with this ultimate guide to openclaw watch

English
14
12
120
13.5K
Sudo su
Sudo su@sudoingX·
@schinsly for sure, on a capable model you can ask and it handles it. the gap shows up when you're running 7B-14B on consumer hardware. those models call tools reliably but can't generate correct python consistently. that's who this is for.
English
1
0
5
171
Schinsly
Schinsly@schinsly·
@sudoingX i guess yeah it hasnt been done before but like i could oneshot that by just asking my agent if it was something i needed
English
2
0
1
173
Sudo su
Sudo su@sudoingX·
@DasMarky99 exactly. one tool, one field, model outputs the expression, hardware computes the answer. that's the whole idea.
English
0
0
0
549
Matu
Matu@DasMarky99·
@sudoingX Wouldn't be enough to expose a new "calc" function to the llm ?
English
1
0
0
621
Sudo su
Sudo su@sudoingX·
you're right, the pieces exist. the question is whether a 9B can use them reliably. code_execution needs the model to generate valid python with correct syntax, imports, and print statements. a calc tool with a one field schema just needs the model to output "847 * 293". the tool computes the result. same math, completely different reliability at 7B-14B scale.
English
2
0
10
920
Schinsly
Schinsly@schinsly·
@sudoingX this isn't super novel imo. the agent can literally just do math in console manually, follow a skill, or call a cli.
English
1
0
7
991
Sudo su
Sudo su@sudoingX·
@drewsky1 can't say which yet. but nothing was removed. only added.
English
0
0
0
35
Sudo su
Sudo su@sudoingX·
What have I done. holly shit this is magic a literal magic.
English
7
0
82
7.4K