Kaden

117 posts

Kaden banner
Kaden

Kaden

@schuttdev

building things with Hermes Agent & Claude | CS @ ASU

Tempe, AZ Katılım Ocak 2025
35 Takip Edilen50 Takipçiler
Kaden
Kaden@schuttdev·
@1337hero I tell it to delegate to Opus Sonnet and Haiku rather than spawning fable clones
English
1
0
1
43
Mike Key
Mike Key@1337hero·
Fable started with 10 agents, then scaled itself to 56 then blew through my entire Max 200 plan limits in under 15 mins!!! WTF?
Mike Key tweet media
English
5
1
10
766
Lotto
Lotto@LottoLabs·
Mythos is more hyped than gpt5 release 👀 How’d that turn out
English
5
0
21
1.3K
Kaden
Kaden@schuttdev·
@no_stp_on_snek That seems to make sense, it feels heavy compared to qwen3.5 9b
English
0
0
0
17
Tom Turney
Tom Turney@no_stp_on_snek·
Adding an "intern" to my local LLM mix (AEON-7/Gemma-4-12B-it-AEON-Abliterated-K4-NVFP4-FP8 on a 9070xt 16GB). We'll see if it can be upgrade to the JR engineer. Performance reviews coming up.
English
2
0
18
1K
Loktar 🇺🇸
Loktar 🇺🇸@loktar00·
@LottoLabs hah I was just looking at the price of these... I'm in my low cost/perf era
English
3
0
10
668
Lotto
Lotto@LottoLabs·
What could go wrong?
Lotto tweet media
English
10
1
44
5.7K
Loktar 🇺🇸
Loktar 🇺🇸@loktar00·
This is actually CRAZY!!! Using llama.cpp RPC I have 2 BC-250's setup so far, they're able to run Qwen 27b at Q4, and 35b at Q4 as well. This is without extra CUs unlocked: Qwen 27b with MTP - 14.5 tk/s Qwen 35b with MTP - 47 tk/s For $300 I'm getting these speeds! This is wild!
Loktar 🇺🇸 tweet mediaLoktar 🇺🇸 tweet media
English
39
14
201
67.7K
Tom Turney
Tom Turney@no_stp_on_snek·
I am joining the 5090 club. Behold my amazing cable management.
Tom Turney tweet media
English
6
0
32
1.3K
Kaden
Kaden@schuttdev·
@Italianclownz Interesting quant format, I’ll have to check it out
English
0
1
1
254
Carlo
Carlo@Italianclownz·
Convert Gemma 4 12B it to ROCmFP4 format and used the MTP Assistant and I am hitting high 30s to high 40s on tok/s decode speed. Full context window. On Strix Halo Max 395+ 128 GB RAM. Looks like the Strix Halo Max 395+ is beating the 4bit quants people are posting on the spark. As @barackomaba would say "Chadrock"
English
9
4
74
9K
Sandro
Sandro@pupposandro·
Open heart RTX 3090 surgery on @ivanfioravanti's Zotac card. The card was very old and was easily hitting 90 C under load. Original pads were baked, and paste turned to dust. We're switching the thermal interface and will send him full pre and post benchmarks after the operation. For this we're using @Thermal_Grizzly phase-change pads on the GPU core, non-conductive and rated to hold forever. Fresh pads on the memories. Doing this work on every single @luceboxai machine we produce.
Sandro tweet media
English
23
8
166
12.1K
Kaden
Kaden@schuttdev·
@0xSero I’ll get it at least 2x your strix numbers
English
0
0
1
50
0xSero
0xSero@0xSero·
Deepseek-v4-flash-reap 180B | 91GB - fits on a spark | 33 tok/s decode | 555 tok/s prefill - fits on AMD strix | 12.3 tok/s decode | 100 tok/s prefill - 43.2% on terminal-bench-2.0 basically a loss of 6.1 points But I am pretty sorry this is because the benchmarking env sucks
0xSero tweet media
English
12
6
179
13K
Kaden
Kaden@schuttdev·
@LottoLabs Haha thanks, it’s been fun posting to your site! Appreciate the shoutout, looking into doing some of your evals soon too
English
0
0
1
44
Lotto
Lotto@LottoLabs·
@schuttdev There’s my guy I should have added you to the post!
English
1
0
5
359
Kaden
Kaden@schuttdev·
Oh
Kaden tweet media
0
0
0
55
TensorTonic
TensorTonic@TensorTonic·
Part 2/30 of the LLM Series: RoPE (Rotary Position Embedding) How does a transformer know the difference between - "the dog bit the man" and "the man bit the dog"? The words are almost identical, but the meaning changes completely. RoPE encodes position as rotation, allowing transformers to understand relative order through geometry. Read more: tensortonic.com/llm-internals
English
6
32
235
11.8K
Kaden
Kaden@schuttdev·
@PatrickToulme “Hardware agnostic” stacks sacrifice efficiency and performance for portable mediocrity.
English
0
0
1
117
Mike Key
Mike Key@1337hero·
The 7900 XTX is a great card and RDN3 is well supported. I was just gaming on it when one day I decided to give Ollama a whirl, then ComfyUI and was like... maybe I'll buy another $740 bucks on ebay - heck ya, sold! I knew what I was giving up w/ the 9700's but yeah I feel like the XTX is just an under appreciated good value.
English
2
0
3
2K
Mike Key
Mike Key@1337hero·
Spent $3998.98 total to have 96gb of VRAM using AMD's AI Pro R9700 Cards. (brand new) Comparatively I had spent $1520.00 on two used RX 7900 XTX's for 48gb of VRAM. If ur team RED, a single XTX is CHEAPER than a RTX 3090. Should I have bought a Mac or DGX Spark instead?
Mike Key tweet mediaMike Key tweet media
English
31
2
121
18.1K
Kaden
Kaden@schuttdev·
@Cryptol33t_NFT @1337hero The 7900xt has 20gb vram and less compute, than the 7900xtx, but on my card I get ~45 tok/s decode on Qwen 3.5/6 27b
English
1
0
2
279
Kaden
Kaden@schuttdev·
@mamajjo1 It runs, and looks like I did well, full clocks Qwen 3.5 9b - 60 tok/s Temps (avg) Edge - 71°C Junction - 90°C Memory - 82°C Claude stripped all of the chat templating so the model spiraled, but I set it straight and now it’s re-benching.
English
0
0
1
72
Kaden
Kaden@schuttdev·
Did my first ever GPU repair today..
Kaden tweet media
English
2
0
6
2.4K