TechnoItch 🧙🏽‍♂️

6.5K posts

TechnoItch 🧙🏽‍♂️ banner
TechnoItch 🧙🏽‍♂️

TechnoItch 🧙🏽‍♂️

@TechnoItch

MAX BIDDING

Katılım Ocak 2018
983 Takip Edilen142 Takipçiler
Sudo su
Sudo su@sudoingX·
for the last few weeks i kept dodging the "is the dgx spark worth it?" question because i did not have one to test. now i do, and i can give you the real answer. it is an absolute workhorse. cranks 56 tok/s on a 30b model at q8 with full multimodal + tool calls on hermes agent, eats long prompts at 1,300 tok/s prefill, holds 256k context without breaking a sweat. 128gb unified memory unlocks model classes nothing else in the consumer tier even tries. low maintenance, silent, sits on the desk and works. if you can afford it and you are serious about local ai at the frontier consumer tier, this deserves a slot on your desk. one of the rare pieces of hardware that earns its price the moment it powers on
Sudo su@sudoingX

a week with the dgx spark, here is what is on it and what i have measured so far. nobody is really talking about this machine and it is quietly becoming the workhorse of my whole stack. hardware: nvidia gb10 sm_121, 124 gb unified lpddr5x at 273 gb/s, cuda 13.0 models on disk (305 gb total, 9 ggufs): > qwen 3.6 27b q4_k_m / q5_k_m / q8_0 / ud-q4_k_xl > nemotron 3 omni 30b-a3b q4_k_m / q8_0 / ud-q6_k / ud-q6_k_xl > deepseek v4-flash 158b q4_k_m (112 gb, flagship 128gb-tier test) terminal + shell environment: > zsh + oh-my-zsh + powerlevel10k theme > modern cli stack: bat, eza, ripgrep, fd, git-delta, tldr, neovim, fzf, autojump > 6 tmux sessions actively running for parallel agent work ml + agent stack: > llama.cpp built sm_121 against cuda 13 > uv + venv ml stack with pytorch 2.11.0+cu130 (aarch64) + transformers + diffusers + accelerate > hermes agent v0.11 with codex auth bridge > opencode for free-model overnight research > telegram gateway routing to nemotron q8 right now speeds verified so far: - nemotron 30b-a3b q8: 56 tok/s gen, 1,300 tok/s prefill, 96% gpu, 33gb in unified - qwen 27b dense q4: 40 tok/s consistent 90+ gb of unified memory still free. deepseek v4-flash 158b loading next as the real flagship test, multimodal omni testing once mmproj pulls, comfyui install in flight for the diffusion lane. honestly curious what the actual limit is on this box, i have not hit it yet.

English
48
14
294
36.2K
Sudo su
Sudo su@sudoingX·
three tools changed how i work forever. >tmux keeps sessions alive. >tailscale connects every machine. >termius puts it all in my pocket. this screenshot is me ssh'd into my dgx spark from my phone right now. four tmux sessions running agent, main, monitor, server. all alive while i walk, eat, ride, sleep. i have three machines on my tailscale mesh. dgx spark for heavy inference. rog scar 18 5090 for dev and creative work. old nodes for overnight local experiments that save me api costs while i sleep. i wake up to results. i check from my phone before my feet hit the floor. if you're building across multiple machines and not running this stack you're working harder than you need to. own your compute. orchestrate from anywhere.
Sudo su tweet media
English
57
24
514
28.5K
Sama Hoole
Sama Hoole@SamaHoole·
Chips fried in beef dripping were a different object to what passes for a chip today. Walk into a Whitby chippy in 1978. The fryer has been on since 11am. The fat in it is beef dripping, held at 180 degrees by a man in a white apron who has been frying chips since he was fifteen. There are no seed oils in the building. The idea would not occur to anyone. Thick-cut Maris Pipers, ninety seconds in the dripping. Dark gold at the edges, fluffy inside, crisp in a way that sets your teeth against them. Salt. Vinegar. Paper. Two bob. You eat them walking home along the harbour wall. The chip tastes of the chip and also of something underneath the chip, something deeper, something you don't have a name for because you are nine and nobody names it, it is just what chips taste like. That taste was beef dripping. By 2002, 90% of British chippies had switched to rapeseed, palm, or sunflower oil, on the advice of public health officials citing research since quietly retracted. A stable saturated fat used for ten thousand years, swapped for an industrial oil invented in 1911, oxidised at fryer temperatures for twelve hours a day. A seed-oil chip is lighter, flatter. The crust doesn't hold. The flavour stops at the potato. No deeper note. No roast beef on a Friday. Ask a British person under thirty what chips are supposed to taste like and they will describe, with complete sincerity, the chip they have always eaten. A chip their great-grandfather would have considered a practical joke. They cannot miss it, because the reference point was removed from the national palate before they were born. A handful of chippies still fry in dripping. The Magpie in Whitby. A few survivors in Yorkshire, Lancashire, the Black Country. Go. Drive. Queue. Eat them standing up, out of the paper. You will understand, in one bite, what was taken. The cow is still in the field. The suet is still at the butcher. The fryer could be switched back tomorrow. A whole country forgot what a chip was.
Sama Hoole tweet media
English
532
1.7K
7.9K
597.6K
Jordan Ross
Jordan Ross@jordan_ross_8F·
The agency owners who figure out Hermes in the next 90 days are going to look like geniuses in 2027. The problem is most agency owners don't have time to figure out the install, where to start, or what to actually hand it first. So my team built an 83-page playbook that does it for you. Inside: — The 5 daily prompts that turn it into a second brain — Plain English setup for Mac, Linux, and Android — How to lock it down without torching client data — 8 copy-paste workflows across reporting, outreach, sales, and ops — The cron trick that drops token spend by 90% Your competitors are sleeping on this. Comment HERMES and I'll send it.
GREG ISENBERG@gregisenberg

how to set up hermes agent step by step. built-in memory, 40+ tools, works on your phone, and what to think of hermes vs openclaw: 1. hermes is a personal AI agent that runs in your terminal. think of it like open claw but with built-in memory, 40+ tools out of the box, and 90% cheaper token costs. you install it with one command. 2. the 3 problems with open claw that hermes solves: no memory (you keep repeating yourself), constant gateway restarts, and zero visibility into what you're spending on tokens. 3. hermes remembers everything. every completed task gets saved to memory. it searches through past logs to find solutions. over time it literally gets smarter at your specific workflows. 4. connect it to open router. you see exact costs per model per task. free models rotate weekly. one founder went from $130 every five days on open claw to $10 on hermes. same output. 5. it comes preloaded with skills. apple notes, imessage, find my, browser, web search, image generation, cron jobs. no hunting for plugins. 6. connect it to obsidian so it reads your entire vault. connect it to gstack for your dev environment. create custom skills for your specific workflows. 7. the biggest money saver: have it write code once for recurring tasks. then it runs without burning tokens every time. stop paying an LLM to do the same scrape or report daily. 8. run it on android via telegram. name your agents. talk to them like coworkers. in this episode imran shows you how to set this up. 9. you can run it bare metal, in docker, or serverless on modal. pick your risk level. i begged @imranye to come on @startupideaspod and walk through the full installation live. he made it impossibly clear. if you've heard of Hermes Agent and want the clearest explanation of how to get set up like a pro let me know what you want me to cover on the next ep this is the best personal agent setup video on the internet right now. watch

English
815
92
1.1K
191.7K
Mass
Mass@MemoryReboot_·
@TheAhmadOsman Dual 3090 is a perfect starter pack into local LLMs rabbit hole
English
1
0
2
557
Ahmad
Ahmad@TheAhmadOsman·
There has never been a better time to build a 2x RTX 3090s machine & get an LLM up and running locally
English
40
9
347
15.3K
Sudo su
Sudo su@sudoingX·
published benchmarks tell you how a model did on someone else's tests. that's not what you need. every new model i consider hits the same battery on my machine before i post anything about it. same set of checks, every time. first i want to know if it runs fast enough at the context size i actually use, not at 128 tokens where everyone cherry picks their numbers. then i check if it emits tool calls reliably when the prompt needs one, or if it drifts to generic knowledge answers. then i push it past 50% context and see if instruction compliance holds up or falls apart. then the quant i want to run gets tested for both memory fit AND quality, because one without the other is a half-win. finally i check if it respects the system prompt or leaks it into user-facing text. binary pass/fail on every check. if the model fails two, i kill the post draft. if it fails one, i flag the failure mode publicly. if it fails none, it earns the full write-up. 30-45 minutes per model. the filter is worth way more than the time. most people run a model for five minutes, feel the vibe, then pick a side. that's why the noise is so loud. if you're running models locally and your results matter to you, build your own battery. five checks, same every time, binary outcomes. published numbers tell you what someone else tested. your tests tell you what you'll actually get. the difference shows up the moment you deploy.
English
5
2
50
4.3K
Nous Research
Nous Research@NousResearch·
We have partnered with @Xiaomi to bring their excellent MiMo V2 Pro model to Hermes Agent via the Nous Portal - completely free to use for the next 2 weeks! Access now on the latest version of Hermes Agent: 'hermes update'
English
168
173
1.9K
712.8K
Jaynit
Jaynit@jaynitx·
Michael Phelps won 23 Olympic gold medals using a mental technique most athletes ignore: "The biggest thing that really separated me through my career was my mental game. Everything that was in between my ears." Michael explains how he used visualization: "When I would visualize, I'd visualize every single thing getting up to a meet, probably a month or so in advance. What could happen. What I want to happen. And what I don't want to happen. Because when it happened, I was prepared for it." He describes the goal: "When I got to a swim meet, there's nothing I can control at that point except what I do. I can't control what anybody else does. So I want to know how the race could go, how I don't want the race to go, and in a perfect world, how the race should go. So I could get behind the block and not have to think about anything." His coach Bob Bowman reveals how they trained this skill: "When Michael was young, I gave his mom a book of progressive relaxation. Before he'd go to bed at night, she would read this progression of things: clench your fists, work through your whole body. He got so good she'd just open the book, say two things, and he'd be asleep." Bowman explains why visualization works: "The brain cannot distinguish between something that's vividly visualized and something that's real. By the time Michael steps up on the block at the Olympics, he's swum that race hundreds of times in his mind. All he has to do is shut everything down and it goes on autopilot." Michael adds the key detail most miss: "When I would visualize, it would be what you want it to be, what you don't want it to be, what it could be. So you're always ready for anything. If I have a suit rip, fine, I need another suit, put it on. Any small thing that could go wrong, I'm ready for."
Jaynit@jaynitx

x.com/i/article/2044…

English
48
1.1K
6.7K
1.2M
TechnoItch 🧙🏽‍♂️
TechnoItch 🧙🏽‍♂️@TechnoItch·
@sudoingX Hey Sudo, had a v quick question but x telling me I need to verify my account to DM 😤. Could you follow me and I won't have to do this. Or can post Q here?
English
0
0
0
8
Sudo su
Sudo su@sudoingX·
this is the migration pattern in one post. i see it every day. openclaw bloat burns tokens that are literally money burning for free. you deserve better tools for your cognition. drop the bloat. stop getting frustrated by api bills. make the move before your monthly budget is torched in days. hermes agent is the answer that's been under your nose the whole time.
Lotto@LottoLabs

First time with Openclaw Spending too much on tokens Buy a 3090 + install Hermes

English
10
2
58
6.6K
Jake Gilman
Jake Gilman@jakeglmn·
The simple swaps that actually work: - Filter or distill your tap water - Glass or ceramic food containers - Natural fibers (cotton, linen, hemp) - Cast iron or steel instead of non-stick - Ditch scented candles & air fresheners - French press instead of plastic coffee machine
English
11
32
618
205.3K
Jake Gilman
Jake Gilman@jakeglmn·
This is reproductive scientist Dr. Shanna Swan. After 30 years studying the global fertility collapse... She just went on Joe Rogan & revealed why men today have half the testosterone of their grandfathers. Here are the 7 most shocking findings: 1. It's not just men
English
81
340
4.1K
3.7M
Joel - coffee/acc
Joel - coffee/acc@JoelDeTeves·
My first impressions of Carnice-27B-GGUF by @kaiostephens It’s a solid model! Even more interesting is that it fits on a 24 GB card @ Q6_K with @spiritbuun’s TurboQuant fork AND image recognition works with the Unsloth mmproj file. My first task was to have it build a self learning system for managing my emails via SQLite and a structured dataset with semantic scoring system - it chewed through it without failing a single tool call. Config: -m Carnice-27b-Q6_K.gguf --mmproj mmproj-F16.gguf --n-gpu-layers 99 --ctx-size 65536 --cache-type-k turbo4 --cache-type-v turbo4 --fit on --jinja --reasoning-format auto --flash-attn on VRAM usage: 96% (23.1 GB) Speed: 24 tokens/second Verdict: it’s a good model, sir!
Joel - coffee/acc tweet media
English
8
5
84
4.7K
TechnoItch 🧙🏽‍♂️ retweetledi
Sari Arho Havrén
Sari Arho Havrén@SariArhoHavren·
U.S.-South Korea Relations Are at Breaking Point: “What good is the U.S. security guarantee against China when the United States cannot even handle a middle power such as Iran? What was the point of suffering through China’s economic retaliation to deploy THAAD in the name of upholding the U.S.-South Korea alliance when the United States makes a mockery of that suffering by unilaterally pulling THAAD to a different corner of the world? What good is a guarantor that asks you to spend your resources to cover the liability it created?” 1/2
English
181
1.2K
7.4K
749.5K
TechnoItch 🧙🏽‍♂️
@sudoingX Get the space invaders one-shot as benchmark, but would really love to hear more of what a 3090 with Qwen 27 Dense can achieve in terms of agentic workflows, coding etc.
English
0
0
0
139
Sudo su
Sudo su@sudoingX·
so 66% of you are on 24GB Vram or less. 414 votes in and the data is clear. 8-12GB: 31.6% 24GB: 34.5% 48GB+: 21.5% the benchmarks i run next are for the 66%. if you're on a 3060, 3090, or 4090 i'm testing the best model, quant, and config for your exact card. no enterprise content you can't use. data for hardware you actually own. this is what i'm building the article series around. your GPU, your config, your numbers.
Sudo su tweet media
Sudo su@sudoingX

what VRAM are you working with? i'm planning my next round of benchmarks and i want to test what matters for YOUR hardware. drop your exact GPU below. model, quant, what you're running. i'll tell you if there's something better for your setup.

English
51
21
435
26.4K
DonAlt
DonAlt@DonAlt·
Cred and I have partnered with @krakenfx They'll be sponsoring our Youtube channel from now on We've assured them we'd do one video a week and they've assured us they'll keep us to the promise Good way to make the show more regular Thanks Kraken ❤️ @TechnicalRoundup" target="_blank" rel="nofollow noopener">youtube.com/@TechnicalRoun
English
141
36
1.8K
148.7K
TechnoItch 🧙🏽‍♂️
@NeoAIForecast I meant this: “By default, Hermes uses @plasticlabs cloud and Neuromancer models for memory, extracting observations, recalling context, consolidating what it knows about you over time. Great service, but your data lives on their servers” is that true?
English
1
0
0
47
Neo
Neo@NeoAIForecast·
@TechnoItch Built-in memory is local, Honcho is optional memory layer, flexible between local or cloud depending on your deployment but needs to be configured.
English
1
0
0
32
TechnoItch 🧙🏽‍♂️
Is this true re: memory?
Elkim@ElkimXOC

I love Hermes Agent by @NousResearch but wanted to keep my memory data private. so I made a self-hosting setup for @honchodotdev, the memory layer that powers its cross-session learning. By default, Hermes uses @plastic_labs cloud and Neuromancer models for memory, extracting observations, recalling context, consolidating what it knows about you over time. Great service, but your data lives on their servers. This drops the full stack onto your own machine. No fork, just config files on top of upstream Honcho. - Any OpenAI-compatible LLM (OpenRouter, Venice, Together, or local via Ollama) - Primary + backup provider with automatic failover - Optional MCP server for Claude Code / Claude Desktop - One setup script, ~3 minutes github.com/elkimek/honcho…

English
1
0
3
302
TechnoItch 🧙🏽‍♂️
TechnoItch 🧙🏽‍♂️@TechnoItch·
Does this ComfyUI thing help us?
am.will@LLMJunky

Two incredible innovations in the local AI space in a span of three days. I am so excited. ComfyUI just shipped "Dynamic VRAM" and it seems like a big deal for anyone running models locally. The problem: large AI models can have many GB of weights. If your system lacks the necessary RAM, you'd normally hit memory crashes or grind to a halt on the page file. Instead of loading the entire model into memory at once, ComfyUI now reads the model file piece by piece directly from your SSD. Only the specific parts needed for the current step get pulled into memory. Everything else stays on disk until it's actually called for. On the GPU side, they built a smart system that loads weight data at the exact moment it's needed. If your GPU runs out of space, it doesn't crash. It uses a temporary workaround to finish the calculation, then cleans up after itself. It also keeps track of what didn't fit so it doesn't waste time trying to reload things that won't fit again. The other big improvement is for workflows that use multiple models. Previously, swapping between models would pile everything into system memory and bog your machine down. Now when a model gets swapped out of the GPU, it just goes back to the "read from disk when needed" state instead of sitting in RAM. The result: a 56GB model can now run on a machine with only 32GB of memory. No crashes, no slowdowns from swap. Available now for Nvidia GPUs on Windows and Linux, with AMD support on the way. No idea how fast this is, but this seems incredible. Cannot wait to get my workstation going.

English
0
0
2
238