FuzzyHat

4.4K posts

FuzzyHat

@FuzzyHatGG

Husband. Father. Views and Thoughts are my own!

Chicago, IL เข้าร่วม Temmuz 2013

1.4K กำลังติดตาม478 ผู้ติดตาม

FuzzyHat@FuzzyHatGG·1d

@ESEA @FACEIT @FACEITSupport @FACEITcs 40-50mins late and call 6 admins and we're forced to play the match cold while this other team just got done playing in another league. I'm sorry the rules aren't being upheld and admins are okay with it, and teams just get screwed over. 4/4

English

FuzzyHat@FuzzyHatGG·1d

@ESEA @FACEIT @FACEITSupport @FACEITcs and ready up in time. That's on them. Period. If my friends team wouldn't of played, they would've got the FFL automatically without anything. This happened to our team a few seasons back during playoffs. Where a team didn't show up and we got the FFW, but then they show up 3/4

English

FuzzyHat@FuzzyHatGG·1d

I'm sorry a friends team in @ESEA @FACEIT @FACEITSupport @FACEITcs got screwed over. Opponent failed to ready up in time. So they got the FFW. They called admin to rehost, and admin required it. Without the agreement from my friends team. They play, and then they submit a 1/4

English

FuzzyHat@FuzzyHatGG·1d

@dzamsgaglo If you find out let me know. I’m interested in updating this to run my OpenClaw instance with 31B on my GX10

English

James Kokou GAGLO@dzamsgaglo·1d

Bought a DGX Spark. NVIDIA lists Gemma 4 models as "supported" on their vLLM page but ships an NGC container (26.03) with Transformers < 5.0. Gemma 4 requires Transformers >= 5.5. So... supported where exactly? 🤔

English

FuzzyHat@FuzzyHatGG·2d

Went through my monthly subscriptions and really analyzing what I actually used. Was able to cancel ~$300 worth that I can actually do without. Kinda nuts. Review your stuff

English

FuzzyHat@FuzzyHatGG·2d

@bridgebench @bridgemindai Can you share your setup? vLLM or Ollama, any params or docker containers you’re running? I’m running Qwen3-Coder-Next-FP8 and I’m only seeing ~42 tok/s

English

147

Bridgebench@bridgebench·2d

Qwen3 Coder 30B just took #1 on the DGX Spark Bench speed rankings. Nearly double the next fastest model. 193ms time to first token. 82.3 tokens per second. Running locally on an NVIDIA DGX Spark. This is a coding model running on a $5,000 machine sitting on my desk. 82 tokens per second locally is getting dangerously close to usable for real vibe coding workflows. bridgebench.ai

English

144

8.9K

FuzzyHat@FuzzyHatGG·2d

@LocalsOnlyAI @bridgebench I’m not seeing 82 tok/s but willing to give that a try. I am running Qwen3-Coder-Next-FP8 and getting around 42 tok/s after running an auto-tuning.

English

localsonly@LocalsOnlyAI·2d

@bridgebench Just ordered the GX10. Anyone have any real comparison on running that vs the DGX Spark? For $1300 cheaper seemed like a decent deal.

English

426

FuzzyHat@FuzzyHatGG·4d

@TechMDAI @TheAhmadOsman Good to know. I’ll take a look at those. Which one do you suggest starting with?

English

TechMD@TechMDAI·4d

@FuzzyHatGG @TheAhmadOsman TBH I haven’t be having a lot luck with vllm. I have to circle back and integrate and test. I have been using llama cpp and LM studio.

English

Ahmad@TheAhmadOsman·4 Nis

Which model to use locally with Hermes agent? on Unified Memory Hardware* > Gemma 4 26B-A4B on GPUs > Qwen 3.5 27B * Mac Studio, DGX Spark, MacBook, etc

English

421

39.8K

FuzzyHat@FuzzyHatGG·4d

@TechMDAI @TheAhmadOsman Did you need to use the Gemma 4 vLLM docker container or could you use the normal one?

English

TechMD@TechMDAI·4 Nis

@TheAhmadOsman Experimenting with Gemma 4 26 on Spark currently

English

802

FuzzyHat@FuzzyHatGG·5d

@FreddieMorra @btctickr @bridgemindai I’m running an auto-tuning right now for MoE. Also trying to optimize Qwen3-Coder-Next-FP8 after that is done and I’ll let you know.

English

Freddie Morra@FreddieMorra·5d

@FuzzyHatGG @btctickr @bridgemindai Curious if any of that guidance helped at all to get your inference sped up?

English

BridgeMind@bridgemindai·5 Nis

Claude Code rate limited me so hard I bought a $5,000 NVIDIA DGX Spark. Arriving tomorrow. A personal AI supercomputer. Anthropic cut off OpenClaw users. Slashed Claude Opus 4.6 rate limits. Told $200/month Max plan customers to use less. Then gave us a credit as an apology. This is what happens when AI companies have too much power over your workflow. One update and your entire stack breaks. Local models are the only infrastructure no one can throttle. No rate limits. No 529 errors. No surprise policy changes. Tomorrow I'm testing the DGX Spark live on stream. Running local models through real vibe coding workflows. The goal is simple. Never depend on a single provider again.

English

389

107

2.3K

506.6K

FuzzyHat@FuzzyHatGG·5d

Running an auto-tuning sweep on the GX10 (GB10) for running @Alibaba_Qwen Qwen3-Coder-Next-FP8 locally. Will report back after it’s up and running.

English

FuzzyHat@FuzzyHatGG·6 Nis

@FreddieMorra @btctickr @bridgemindai Care to share any tuning settings. I’m new to setting up local models and I’ve done docker stuff in my normal and side projects. Running with VLLM or TensorRT?

English

Freddie Morra@FreddieMorra·6 Nis

@FuzzyHatGG @btctickr @bridgemindai For me the best 2 I have found are Nemotron 3 Super 120B (or Nano to run quickly with less accuracy) and Qwen3.5 122B. Both 4bit quants and tuning settings. For agentic coding. I don't care about much else right now.

English

FuzzyHat@FuzzyHatGG·5 Nis

@btctickr @bridgemindai I’m interested the setup you did. I also purchased a GX10 and looking for an overall model to use with my OpenClaw instance to remove API dependency

English

408