Ryan D. Hatch
5.3K posts

Ryan D. Hatch
@rdkhatch
AI Privacy. Product @ Hydra. Family man. Saved by Grace.










this is what my setup looks like today. about to test qwen 3.6 27b dense q4 on a single rtx 3090 at ~41 tok/s gen, hermes agent driving. predecessor model qwen 3.5 dense q4 made it work in one iteration when i ran the same agentic build on the same card. i've been daily driving qwen 3.6 27b dense for weeks now, the model i keep coming back to. if 3.6 oneshots too, this becomes the best model that runs on a single rtx 3090. consumer tier king. firing the test now will report back soon.



Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA), And the first frontier model with a 12 million token context window which is: - 52x faster than FlashAttention at 1MM tokens - Less than 5% the cost of Opus Transformer-based LLMs waste compute by processing every possible relationship between words (standard attention). Only a small fraction actually matter. @subquadratic finds and focuses only on the ones that do. That's nearly 1,000x less compute and a new way for LLMs to scale.







So, don't take my opinion on LaunchApp Animus CLI. This is what grok says, if that sounds like something you want repo in my profile. If you want to get stuck in the AI slow lane, then do whatever it is you're doing.








