FuzzyHat

4.4K posts

FuzzyHat banner
FuzzyHat

FuzzyHat

@FuzzyHatGG

Husband. Father. Views and Thoughts are my own!

Chicago, IL เข้าร่วม Temmuz 2013
1.4K กำลังติดตาม478 ผู้ติดตาม
FuzzyHat
FuzzyHat@FuzzyHatGG·
@ESEA @FACEIT @FACEITSupport @FACEITcs 40-50mins late and call 6 admins and we're forced to play the match cold while this other team just got done playing in another league. I'm sorry the rules aren't being upheld and admins are okay with it, and teams just get screwed over. 4/4
English
0
0
0
25
FuzzyHat
FuzzyHat@FuzzyHatGG·
@ESEA @FACEIT @FACEITSupport @FACEITcs and ready up in time. That's on them. Period. If my friends team wouldn't of played, they would've got the FFL automatically without anything. This happened to our team a few seasons back during playoffs. Where a team didn't show up and we got the FFW, but then they show up 3/4
English
1
0
0
31
FuzzyHat
FuzzyHat@FuzzyHatGG·
I'm sorry a friends team in @ESEA @FACEIT @FACEITSupport @FACEITcs got screwed over. Opponent failed to ready up in time. So they got the FFW. They called admin to rehost, and admin required it. Without the agreement from my friends team. They play, and then they submit a 1/4
English
1
0
0
49
FuzzyHat
FuzzyHat@FuzzyHatGG·
@dzamsgaglo If you find out let me know. I’m interested in updating this to run my OpenClaw instance with 31B on my GX10
English
1
0
0
34
James Kokou GAGLO
James Kokou GAGLO@dzamsgaglo·
Bought a DGX Spark. NVIDIA lists Gemma 4 models as "supported" on their vLLM page but ships an NGC container (26.03) with Transformers < 5.0. Gemma 4 requires Transformers >= 5.5. So... supported where exactly? 🤔
English
2
0
2
96
FuzzyHat
FuzzyHat@FuzzyHatGG·
Went through my monthly subscriptions and really analyzing what I actually used. Was able to cancel ~$300 worth that I can actually do without. Kinda nuts. Review your stuff
English
1
0
0
27
FuzzyHat
FuzzyHat@FuzzyHatGG·
@bridgebench @bridgemindai Can you share your setup? vLLM or Ollama, any params or docker containers you’re running? I’m running Qwen3-Coder-Next-FP8 and I’m only seeing ~42 tok/s
English
0
0
3
147
Bridgebench
Bridgebench@bridgebench·
Qwen3 Coder 30B just took #1 on the DGX Spark Bench speed rankings. Nearly double the next fastest model. 193ms time to first token. 82.3 tokens per second. Running locally on an NVIDIA DGX Spark. This is a coding model running on a $5,000 machine sitting on my desk. 82 tokens per second locally is getting dangerously close to usable for real vibe coding workflows. bridgebench.ai
Bridgebench tweet media
English
24
9
144
8.9K
FuzzyHat
FuzzyHat@FuzzyHatGG·
@LocalsOnlyAI @bridgebench I’m not seeing 82 tok/s but willing to give that a try. I am running Qwen3-Coder-Next-FP8 and getting around 42 tok/s after running an auto-tuning.
English
0
0
1
46
localsonly
localsonly@LocalsOnlyAI·
@bridgebench Just ordered the GX10. Anyone have any real comparison on running that vs the DGX Spark? For $1300 cheaper seemed like a decent deal.
English
5
0
1
426
TechMD
TechMD@TechMDAI·
@FuzzyHatGG @TheAhmadOsman TBH I haven’t be having a lot luck with vllm. I have to circle back and integrate and test. I have been using llama cpp and LM studio.
English
1
0
1
40
Ahmad
Ahmad@TheAhmadOsman·
Which model to use locally with Hermes agent? on Unified Memory Hardware* > Gemma 4 26B-A4B on GPUs > Qwen 3.5 27B * Mac Studio, DGX Spark, MacBook, etc
English
55
11
421
39.8K
TechMD
TechMD@TechMDAI·
@TheAhmadOsman Experimenting with Gemma 4 26 on Spark currently
English
2
0
2
802
BridgeMind
BridgeMind@bridgemindai·
Claude Code rate limited me so hard I bought a $5,000 NVIDIA DGX Spark. Arriving tomorrow. A personal AI supercomputer. Anthropic cut off OpenClaw users. Slashed Claude Opus 4.6 rate limits. Told $200/month Max plan customers to use less. Then gave us a credit as an apology. This is what happens when AI companies have too much power over your workflow. One update and your entire stack breaks. Local models are the only infrastructure no one can throttle. No rate limits. No 529 errors. No surprise policy changes. Tomorrow I'm testing the DGX Spark live on stream. Running local models through real vibe coding workflows. The goal is simple. Never depend on a single provider again.
BridgeMind tweet media
English
389
107
2.3K
506.6K
FuzzyHat
FuzzyHat@FuzzyHatGG·
Running an auto-tuning sweep on the GX10 (GB10) for running @Alibaba_Qwen Qwen3-Coder-Next-FP8 locally. Will report back after it’s up and running.
English
0
0
1
34
FuzzyHat
FuzzyHat@FuzzyHatGG·
@FreddieMorra @btctickr @bridgemindai Care to share any tuning settings. I’m new to setting up local models and I’ve done docker stuff in my normal and side projects. Running with VLLM or TensorRT?
English
6
0
0
40
Freddie Morra
Freddie Morra@FreddieMorra·
@FuzzyHatGG @btctickr @bridgemindai For me the best 2 I have found are Nemotron 3 Super 120B (or Nano to run quickly with less accuracy) and Qwen3.5 122B. Both 4bit quants and tuning settings. For agentic coding. I don't care about much else right now.
English
1
0
1
65
FuzzyHat
FuzzyHat@FuzzyHatGG·
@btctickr @bridgemindai I’m interested the setup you did. I also purchased a GX10 and looking for an overall model to use with my OpenClaw instance to remove API dependency
English
1
0
1
408
btctickr
btctickr@btctickr·
@bridgemindai Hermes + Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF + TurboQuant = Freedom.
btctickr tweet media
14
5
93
8.5K
FuzzyHat
FuzzyHat@FuzzyHatGG·
@bridgemindai I’ve been toying with different models locally on my GX10, similar to the Spark. What are the models you’re looking at running?
English
0
0
0
19
FuzzyHat
FuzzyHat@FuzzyHatGG·
MCO + United + Clear + TSA Precheck made it through security in 6mins
English
0
0
0
60
FuzzyHat
FuzzyHat@FuzzyHatGG·
With @Google TurboQuant, is the move still to get an M3 Ultra with 256GB to run as many models as you can locally on a single Mac Studio?
English
0
0
0
31
FuzzyHat
FuzzyHat@FuzzyHatGG·
@gunkcs2 I’m sorry to hear that man. Wish you the best in your search!
English
0
0
0
20