ashpool

11.4K posts

ashpool

@4shpool

capitulating

127.0.0.1 Katılım Şubat 2021

4.8K Takip Edilen6.6K Takipçiler

Sabitlenmiş Tweet

ashpool@4shpool·21 Şub

Something the turtleneck boys won't appreciate - That's 3x tok/sec vs the guy with a 512 Mac Studio, BF16 and 16k context (armchair LLM dabbler, asks promethean intelligence "why is the sky blue?") - We're at half the price, opencode session >100k context, never compacting (chad, generational slop creator) You can just do things 1. use a lower quant 2. buy a gpu

ashpool@4shpool

Did some half-baked experiments with GPU power limits to see how it affected inference performance on Minimax M2.5. TL;DR: - unconstrained 350W/GPU limit on 6x RTX3090 gave the best performance, and perhaps counterintuitively, was most efficient - Minimax doesn't use all the power I give it. I attribute that to MoE requiring fewer operations per token, but idk - Nerfing your system in the name of reducing the power bill might not actually help you Blog: llmgarage.ai/power-limit-to…

English

2.1K

ashpool@4shpool·6m

@sudoingX 👇 x.com/i/status/20283…

ashpool@4shpool

Finished project "Ironhorse" so figured I'd flex a little That's 2kW of synthetic brain beyaatch

QME

Sudo su@sudoingX·1h

how much VRAM do you have right now

English

3.5K

ashpool@4shpool·2h

Here I go again Listening to Sean Paul's international affair While training my digital doppelganger / inevitable usurper

GIF

English

ashpool@4shpool·2h

@bnjmn_marie You are the only person on this planet that understands KV cache math. Thanks for sharing 🙏

English

Benjamin Marie@bnjmn_marie·2h

If you want to compute it by yourself, I show how to do this here: kaitchup.substack.com/p/the-kv-cache…

English

359

Benjamin Marie@bnjmn_marie·2h

Nemotron Cascade 2 has the same KV cache as Nemotron 3 Nano: attention layers: 6 kv heads: 2 head dimension: 128 So the KV cache grows by only 6,144 bytes per token per concurrent sequence. At 32k context, batch/concurrency = 1, that is about 0.20 GB...

English

2.1K

ashpool@4shpool·17h

@LLMJunky @TheAhmadOsman @0xSero @ivanfioravanti @ZenMagnets @NVIDIAAI No envy over here. None.

GIF

English

211

am.will@LLMJunky·20h

Finally proud to announce that I've joined the GPU Minor Leagues. 2 x RTX 6000 Pro. I have six months to pay off the second GPU lol. You are all TERRIBLE influences.

English

111

797

40.8K

ashpool@4shpool·1d

I must say Qwen3.5-35-a3b (I use unsloth IQ4_NL) absolutely cooks on a MacBook M4 Best model in the 30B class that I've used (opinion). Prefer MoE to dense for speed, ofc

English

378

ashpool@4shpool·1d

He said Quite literally "Use a lower quant" Q2 TRENCHERS REJOICE

Benjamin Marie@bnjmn_marie

Nemotron 3 Super GGUF evals: Q2 actually works. Model size goes from ~242 GB → ~53 GB, with only a tiny quality hit. Super impressive quantization robustness here, likely helped by NVFP4 pre-training, plus some very real Unsloth magic on top 🪄 Note: As usual, don't over interpret these results. +/- 10 points of relative error increase is not very meaningful. But above 10 points of increase, you can consider the model is bad.

English

249

ashpool@4shpool·2d

@0xSero 🔥🔥🔥

QME

0xSero@0xSero·3d

We did it boys, Kimi running on 8x 3090s and 256gb of DDR4 7~ tokens/s for generation. This is pre-reap enabled salience optimizations. This should get up to a stable 25~ tokens/s for generation and 200~ prefill I’m hoping, at least

English

152

24.1K

ashpool@4shpool·3d

@SpaceMatthieu @levelsio Ctrl+b + s --> pick the session. Also set up a 2x2 grid to increase the autism. The shortcuts are not intuitive to me but you just gotta memorize them

English

391

Matthieu Richard@SpaceMatthieu·3d

@levelsio Is there a good way to jump between tmux sessions on Termius? I find it quite hard to manage multiple codex/claude sessions on the go

English

223.2K

@levelsio@levelsio·3d

Are you guys aware I am coding mostly on my phone now all day via Termius to Claude Code on my server while I go with gf to the dentist, clothing store, cafe, etc. 😛✌️

rootkid ✌️@rootkid

@levelsio "You" ➡️ IP your Internet provider assigns you; not your servers IPs. If you had a static IP I'd like to know why you prefer Tailscale over just adding e.g. your company IP to the firewalls SSH whitelist.

English

320

2.1K

680.2K

ashpool@4shpool·4d

@gumsays You take 20-30k steps per day in Italy

English

197

gum@gumsays·4d

man how can I eat tons of bread butter and cheese in italy and not get bloated or fat but at home I become obese

English

7.2K

ashpool@4shpool·5d

@web4O The year is 2030 And Larry Ellison's net worth is eclipsed by an SF elevator mechanic who just goes by "Todd"

English

WhiskyTitan@web4O·5d

The year is 2030. You hire a plumber- 2 year/$68M He makes you fire all the other contractors. He builds a “team” the electrician, the carpenter, he picks them all. Halfway through the renovation he quits and makes you trade him to your neighbor for their gardener.

CG@cgtwts

Uber Founder,Travis Kalanick: when everything is fully automated in the world , that “plumbers become LeBron”. Listen to this, he’s actually right.

English

818

ashpool@4shpool·12 Mar

@walls_jason1 Teach a man to fish, he'll eat for a day Teach a man the NEC + UL 508A; he'll eat for a lifetime

English

Jason Walls@walls_jason1·12 Mar

Yesterday Mark Cuban reposted my work, DM'd me, and told me to keep telling my story. So here it is. I'm a Master Electrician. IBEW Local 369. 15 years pulling wire in Kentucky. Zero coding background. I didn't go to Stanford. I went to trade school. Every week I'd show up to a home where someone just bought a Tesla or a Rivian. And every time, someone had already told them they needed a $3,000-$5,000 panel upgrade to install a charger. 70% of the time? They didn't need it. The math is in the NEC — Section 220.82. Load calculations. But nobody was doing them for homeowners. Electricians upsell. Dealers don't know. And the homeowner just pays. I got angry enough to build something about it. I found @claudeai. No coding experience. I just started talking to it like I'd explain a job to an apprentice. "Here's how load calcs work. Here's the NEC code. Now help me build a tool that does this." 6 months later — @ChargeRight is live. Real software. Stripe payments. PDF reports. NEC 220.82 calculations automated. $12.99 instead of a $500 truck roll. I'm still pulling wire. I still take service calls. I wake up at 5:05 AM for work. But something shifted. Yesterday @vivilinsv published my story as Claude Builder Spotlight #1. Mark Cuban saw it. The Claude community showed up. And for the first time, I felt like this thing I built in my kitchen might actually matter. I'm not a tech founder. I'm a dad who wants to coach little league and be home for dinner. I just happened to build something that helps people. If you're in the trades and thinking about using AI — do it. The barrier isn't technical skill. It's believing you're allowed to try. EVchargeright.com

English

604

2.2K

16.3K

880.2K

ashpool@4shpool·12 Mar

@Layer_33_ 😿

QME

Layer33@Layer_33_·12 Mar

Another week, more DMs of validators shutting down. "Well, it's good to trim the fat, right?" Yes, but these validators offer something unique: network diversity. Solana is in crisis. We're here to fight back. Support Solana, stake with Layer33 👉 Layer33.com

English

938

ashpool@4shpool·12 Mar

I'm looking for $20k to buy two Nvidia 6000 pro I'm offering 0.001% of my company For building the machinery of technocratic authoritarianism

GIF

English

270

ashpool@4shpool·12 Mar

You guys really should read Player Piano. Is an easy read. Gif unrelated

GIF

English

191

ashpool@4shpool·12 Mar

@MeJohnC Awesome to see AMD get some airtime... Nice work! In case you're interested, here's the 122B model on 6x 3090 with llama.cpp server x.com/i/status/20317…

ashpool@4shpool

Qwen3.5-122B writes blog comparing qwen3.5-122B Q6_K_XL to qwen3.5-397B UD-TQ1_0 using Hermes Agent (local) Key takeaways: - 122b Q6_K_XL crammed onto 6x RTX3090 - 262k context at Q8 - 46-34 tok/sec across full context range - excellent small-midsize MoE model can recommend 👍

English

410

John C@MeJohnC·12 Mar

Got Qwen3.5-122B-A10B-UD-Q4_K_XL running on 4x AMD R9700 (128GB VRAM total) via patched vLLM/ROCm: - 262K context working - 262,000-token prefill: 437 tok/s - decode: 23.7 tok/s - roughly 4x 3090 / 4x 4090 class on decode - full system draw was just over 1000W, R9700's capped at 180W each.

stevibe@stevibe

Finally got my hands on the big one. Qwen3.5-122B-A10B — 122 billion parameters. Too big for any single consumer GPU. So I rented 4 of each... and then one professional card to see if brute force even matters. - 1x RTX PRO 6000 (96GB): 101.4 tok/s - 4x 5090 (128GB): 87.0 tok/s - 4x 4090 (96GB): 25.1 tok/s - 4x 3090 (96GB): 20.8 tok/s One single $8,500 card beat four RTX 5090s

English

6.4K

ashpool@4shpool·11 Mar

@buffalu__ @mert I haven't used the claws, but Hermes Agent is pretty fire hermes-agent.nousresearch.com

English

609