Joe Muller
10.3K posts

Joe Muller
@BosonJoe
Local AI enthusiast, part time philosopher


We’ve agreed to a partnership with @SpaceX that will substantially increase our compute capacity. This, along with our other recent compute deals, means that we’ve been able to increase our usage limits for Claude Code and the Claude API.


Gemma 4: Now up to 3x Faster. ⚡ Same quality, way more speed. Our new MTP drafters allow Gemma 4 to predict multiple tokens at once, effectively tripling your output speed without compromising intelligence.





dude! the new qwen 3.6-27b dense is hammering my single 3090 at 100% gpu utilization. the spiky pattern on nvtop is the hermes agent autonomously thinking, calling tools, reading results, thinking again. this model is so cool to talk to. waits for tool outputs, reads them, selfcorrects, keeps going. no stalls, no loops, no hand holding. anyone running a single 3090 or any 24gb tier card should try this. same llama.cpp flags from last sweep, same hermes agent install. three commands and you are watching your own hardware think.




Tax season is here and a connector is all it takes to make @claudeai way more useful. Checkout what we just shipped: Connect TurboTax or Aiwyn Tax (formerly Column Tax) to Claude to estimate your refund, see what you may owe, and get a better understanding on the forms before you file.







nvidia's 3B mamba destroyed alibaba's 3B deltanet on the same RTX 3090. only 24 days between releases. same active parameters, same VRAM tier, completely different architectures. nemotron cascade 2: 187 tok/s. flat from 4K to 625K context. zero speed loss. flags: -ngl 99 -np 1. that's it. no context flags, no KV cache tricks. auto-allocates 625K. qwen 3.5 35B-A3B: 112 tok/s. flat from 4K to 262K context. zero speed loss. flags: -ngl 99 -np 1 -c 262144 --cache-type-k q8_0 --cache-type-v q8_0. needed KV cache quantization to fit 262K. both models held a flat line across every context level. both architectures are context-independent. but nvidia's mamba2 is 67% faster at generating tokens on the exact same hardware and needs fewer flags to get there. same node, same GPU, same everything. the only variable is the model. gold medal math olympiad winner running at 187 tokens per second on single RTX 3090 a card from 6 years ago. nvidia cooked.















