Kaden

117 posts

Kaden

@schuttdev

building things with Hermes Agent & Claude | CS @ ASU

Tempe, AZ Bergabung Ocak 2025

35 Mengikuti50 Pengikut

Kaden@schuttdev·1d

@1337hero I tell it to delegate to Opus Sonnet and Haiku rather than spawning fable clones

English

Mike Key@1337hero·1d

Fable started with 10 agents, then scaled itself to 56 then blew through my entire Max 200 plan limits in under 15 mins!!! WTF?

English

766

Kaden@schuttdev·5d

@LottoLabs

QME

Lotto@LottoLabs·5d

Mythos is more hyped than gpt5 release 👀 How’d that turn out

English

1.3K

Kaden@schuttdev·5d

@no_stp_on_snek That seems to make sense, it feels heavy compared to qwen3.5 9b

English

Tom Turney@no_stp_on_snek·5d

@schuttdev 36 tps 32k context.

Filipino

Tom Turney@no_stp_on_snek·5d

Adding an "intern" to my local LLM mix (AEON-7/Gemma-4-12B-it-AEON-Abliterated-K4-NVFP4-FP8 on a 9070xt 16GB). We'll see if it can be upgrade to the JR engineer. Performance reviews coming up.

English

Kaden@schuttdev·5d

@loktar00 @LottoLabs Have you considered the bc160/v520?

English

Loktar 🇺🇸@loktar00·5d

@LottoLabs hah I was just looking at the price of these... I'm in my low cost/perf era

English

668

Lotto@LottoLabs·5d

What could go wrong?

English

5.7K

Kaden@schuttdev·6d

@loktar00 How’s prefill?

English

349

Loktar 🇺🇸@loktar00·6d

This is actually CRAZY!!! Using llama.cpp RPC I have 2 BC-250's setup so far, they're able to run Qwen 27b at Q4, and 35b at Q4 as well. This is without extra CUs unlocked: Qwen 27b with MTP - 14.5 tk/s Qwen 35b with MTP - 47 tk/s For $300 I'm getting these speeds! This is wild!

English

201

67.7K

Kaden@schuttdev·6d

@no_stp_on_snek Could be worse

English

Tom Turney@no_stp_on_snek·6d

I am joining the 5090 club. Behold my amazing cable management.

English

1.3K

Kaden@schuttdev·4 Haz

@Italianclownz Interesting quant format, I’ll have to check it out

English

254

Carlo@Italianclownz·4 Haz

Convert Gemma 4 12B it to ROCmFP4 format and used the MTP Assistant and I am hitting high 30s to high 40s on tok/s decode speed. Full context window. On Strix Halo Max 395+ 128 GB RAM. Looks like the Strix Halo Max 395+ is beating the 4bit quants people are posting on the spark. As @barackomaba would say "Chadrock"

English

Kaden@schuttdev·4 Haz

@pupposandro @davideciffa @ivanfioravanti Respect 🫡

English

Sandro@pupposandro·4 Haz

Open heart RTX 3090 surgery on @ivanfioravanti's Zotac card. The card was very old and was easily hitting 90 C under load. Original pads were baked, and paste turned to dust. We're switching the thermal interface and will send him full pre and post benchmarks after the operation. For this we're using @Thermal_Grizzly phase-change pads on the GPU core, non-conductive and rated to hold forever. Fresh pads on the memories. Doing this work on every single @luceboxai machine we produce.

English

166

12.1K

Kaden@schuttdev·2 Haz

@0xSero I’ll get it at least 2x your strix numbers

English

0xSero@0xSero·1 Haz

Deepseek-v4-flash-reap 180B | 91GB - fits on a spark | 33 tok/s decode | 555 tok/s prefill - fits on AMD strix | 12.3 tok/s decode | 100 tok/s prefill - 43.2% on terminal-bench-2.0 basically a loss of 6.1 points But I am pretty sorry this is because the benchmarking env sucks

English

179

13K

Kaden@schuttdev·22 May

@LottoLabs Haha thanks, it’s been fun posting to your site! Appreciate the shoutout, looking into doing some of your evals soon too

English

Lotto@LottoLabs·22 May

@schuttdev There’s my guy I should have added you to the post!

English

359

Lotto@LottoLabs·22 May

pretty cool that the highest TPS on qwen 27b is this run localmaxxing.com/runs/cmp8fw36n…

English

138

14.3K

Kaden@schuttdev·21 May

@TensorTonic All hail the unit circle

GIF

English

185

TensorTonic@TensorTonic·20 May

Part 2/30 of the LLM Series: RoPE (Rotary Position Embedding) How does a transformer know the difference between - "the dog bit the man" and "the man bit the dog"? The words are almost identical, but the meaning changes completely. RoPE encodes position as rotation, allowing transformers to understand relative order through geometry. Read more: tensortonic.com/llm-internals

English

235

11.8K

Kaden@schuttdev·17 May

@PatrickToulme “Hardware agnostic” stacks sacrifice efficiency and performance for portable mediocrity.

English

117

Patrick C Toulme@PatrickToulme·16 May

"Hardware-agnostic" AI stacks run everywhere and run fast nowhere. The race isn't to build the most portable stack — it's to build the deepest one.

Patrick C Toulme@PatrickToulme

x.com/i/article/2055…

English

24.7K

Kaden@schuttdev·15 May

@1337hero Worth it!

English

162

Mike Key@1337hero·15 May

The 7900 XTX is a great card and RDN3 is well supported. I was just gaming on it when one day I decided to give Ollama a whirl, then ComfyUI and was like... maybe I'll buy another $740 bucks on ebay - heck ya, sold! I knew what I was giving up w/ the 9700's but yeah I feel like the XTX is just an under appreciated good value.

English

Mike Key@1337hero·14 May

Spent $3998.98 total to have 96gb of VRAM using AMD's AI Pro R9700 Cards. (brand new) Comparatively I had spent $1520.00 on two used RX 7900 XTX's for 48gb of VRAM. If ur team RED, a single XTX is CHEAPER than a RTX 3090. Should I have bought a Mac or DGX Spark instead?

English

121

18.1K

Kaden@schuttdev·15 May

@Cryptol33t_NFT @1337hero The 7900xt has 20gb vram and less compute, than the 7900xtx, but on my card I get ~45 tok/s decode on Qwen 3.5/6 27b

English

279

Bloodmoney@Cryptol33t_NFT·15 May

@1337hero @schuttdev can the 7900xt run 27B or 31B at resonable toks?

English

239

Kaden@schuttdev·14 May

@mamajjo1 It runs, and looks like I did well, full clocks Qwen 3.5 9b - 60 tok/s Temps (avg) Edge - 71°C Junction - 90°C Memory - 82°C Claude stripped all of the chat templating so the model spiraled, but I set it straight and now it’s re-benching.

English

mamajjou@mamajjo1·14 May

@schuttdev How’d it go?

English

Kaden@schuttdev·14 May

Did my first ever GPU repair today..

English

2.4K

Jelajahi

@1337hero @LottoLabs @no_stp_on_snek @loktar00 @Italianclownz @barackomaba @pupposandro @davideciffa