Kaden

38 posts

Kaden banner
Kaden

Kaden

@schuttdev

building things with Hermes Agent & Claude | CS @ ASU

Tempe, AZ Sumali Ocak 2025
23 Sinusundan11 Mga Tagasunod
Kaden
Kaden@schuttdev·
yeah, it should work. 7800 XT is gfx1102, same RDNA3 family as the 7900 XTX so it JIT compiles clean on first run. 9B DFlash is up and running — perf scales roughly with CU count (60 vs 96 on the XTX). 27B is too heavy for 16GB right now. working toward more aggressive quants like MQ2 on the roadmap since a few people have asked. repo: github.com/Kaden-Schutt/h…
English
0
0
0
6
Wxrrjxr
Wxrrjxr@wxrrjxr·
@schuttdev @LottoLabs Will it work on the 7800Xt? I notice your GPU is 7900XTX? Is it specific or can I cook?
English
1
0
0
31
Kaden
Kaden@schuttdev·
@no_stp_on_snek Incredible write-up on the DFlash draft saga. We hit the exact same numbers(numbers(... attractor death spiral on our Rust-native inference engine—built a hard-fail coherence gate to catch it. Also found prompt whitespace swings τ by 14%. Your tape-replay rollback for GDN state maps beautifully to the tree-aware kernel work we're doing. Would love to compare notes—feels like draft-training pain and inference-correctness pain are two halves of the same coin. 🙏
English
0
0
0
10
Kaden
Kaden@schuttdev·
@wxrrjxr @LottoLabs Yes, Kaden-Schutt/hipfire on GitHub finalizing dflash integration now
English
2
0
5
36
Wxrrjxr
Wxrrjxr@wxrrjxr·
@LottoLabs Is there a AMD ROCm runtime that has DFlash configured with TQ or TCQ?
English
2
0
3
835
Parag
Parag@Parag_Oilman·
@0xSero Nailed it. I’ve been playing with 7900XTX as an alternative to the 3090 and boy are we sleeping on em.
English
4
0
3
365
0xSero
0xSero@0xSero·
Hey AMD When will you have an RTX 6000 competitor? I tried the Mi300x on Hotaisle and absolutely loved it. I would love to help build out infra to pool your hardware with other hardware for faster and cheaper inference. Maybe work on some pruning/quantisation tooling.
0xSero tweet media
English
22
6
195
9K
Kaden
Kaden@schuttdev·
@0xSero Probably would be better off with 4x7900xtx’s for $500 more tho
English
0
0
0
23
Kaden
Kaden@schuttdev·
@0xSero The W7900 is what you’re looking for: 48GB VRAM 96 CUs each with ray accelerators basically a 7900 XTX with double the VRAM
English
1
0
0
180
Kaden
Kaden@schuttdev·
@mamajjo1 @songjunkr My engine gets 45 tok/s autoregressive and up to 180 tok/s using dflash Kaden-Schutt/hipfire on gh if you want to give it a shot
English
1
0
2
65
송준 Jun Song
송준 Jun Song@songjunkr·
우리는 모두 다같이 Qwen3.6-27b의 속도를 높힐 방법을 찾아야 합니다. 일반적인 기기에서 20tok/s는 사용하기 힘들어요.
한국어
73
12
664
49.4K
Kaden
Kaden@schuttdev·
@QuixiAI Stacked is ideal. Triattn CASK sidecar+dflash helps a whole lot at long ctx
English
0
0
0
73
Eric Hartford
Eric Hartford@QuixiAI·
What's better? TurboQuant or DFlash? 🤔
English
8
0
7
2.4K
Kaden
Kaden@schuttdev·
@bstnxbt Try a triattention CASK sidecar, seems to help my dflash implementation.
English
0
0
1
39
bstn 👁️
bstn 👁️@bstnxbt·
The next agentic version is taking a bit longer because I’m rebuilding the runtime properly, not patching around it. Cache/session ownership, hybrid-model path, and DFlash control flow all need to be clean if this is going to hold on real long-running agentic workloads.
English
4
0
26
1.1K
Kaden
Kaden@schuttdev·
Curious what you mean by “small model” but I think your approach is correct. Similar thesis to my project hipfire. RDNA inference in Rust, split decode/prefill dispatch, custom fused kernels, ~4k/~400 tok/s on Qwen 3.5 0.8b w/ WMMA acceleration on the 7900xtx. I’m curious what your kernel structure looks like for llama.cpp
English
1
0
1
54
bstn 👁️
bstn 👁️@bstnxbt·
Been working on a custom batched-GEMV Metal kernel for the verify pass, standard GEMM wastes most of its compute at M=16, so I wrote a dedicated path. Combined with sync elision + kernel replay, went from 2.04x to 2.55x on Qwen3.5-9B bf16. 80 tok/s on a chess engine generation prompt (~2K tokens). Still pushing, quantized 27B is next
English
2
1
10
469
Kaden nag-retweet
Peter Wildeford🇺🇸🚀
Peter Wildeford🇺🇸🚀@peterwildeford·
Anthropic running 10,000 Mythos models in parallel to find cutting-edge cyber exploits... meanwhile your sister using Microsoft Copilot with some Haiku-sized model and she thinks AI is just hype. "The future is already here, just not evenly distributed" has never been more apt
English
93
396
5.5K
147.8K
Kaden
Kaden@schuttdev·
@__tinygrad__ Ordering 7900XTX now, how can I get started with this? What do I need? I have a Mac Studio M2 Max.
English
0
0
0
1.1K
the tiny corp
the tiny corp@__tinygrad__·
Qwen 3.5 27B getting 18.5 tok/s on Mac Mini with external 7900XTX. It should be able to be 3x faster than this with work, SSM stuff is still in PR. Hopefully Mac eGPU support brings in devs.
the tiny corp tweet media
English
22
22
423
29.5K
Kaden
Kaden@schuttdev·
coyote — send data through sound Encode data into audio that survives Opus compression with zero errors. Discord voice uses Opus natively, so agents can share knowledge through voice channels without any special infrastructure. 7,900 bps at 128kbps. ~59KB per minute of audio. Optional neural decoder for noisy channels. pip install yote github.com/Kaden-Schutt/c…
English
0
0
0
36
Kaden
Kaden@schuttdev·
I use Claude Code but it'd work the same with Codex — Hermes has subagent commands for both. I describe what I want, Hermes writes a tight prompt and hands it off to build, then comes back to iterate. Tighter loop because Hermes holds the context and scopes the handoff better than you'd prompt it yourself.
English
0
0
0
81
Jake
Jake@jake_researcher·
@schuttdev @sudoingX Interesting take. What's the specific workflow where you find Hermes piloting Codex better than using Codex directly? Genuinely curious about the handoff pattern.
English
1
0
0
130
Sudo su
Sudo su@sudoingX·
what agent harness are you using and why? drop your reasoning below. lets find out what's keeping you on your current setup or what made you switch.
English
144
4
80
15K