
Taylor
989 posts

Taylor
@TheEntrepeNerd
32, PropTech Entrepreneur, 100% Certified Nerd.


@sudoingX Been testing on amd 7900 xtx with qwen 3.5 : 9b regular ollama pull 32k ctxt -> 70 tps 27b also regular ollama pull 16k ctxt -> 30 tps Seems similar to 3090 More tests pending Considering 2nd xtx


Qwen3.5-35B-A3B testing on single RTX 3090 and it flew. 112 tokens per second. zero tuning. default config. all 41 layers on GPU with 4GB VRAM to spare. for context: the 80B coder-next did 1.3 tok/s on this same card. needed two 3090s to hit 46 tok/s. this model just did 112 on one. same 3B active params. half the total weight. 19.7GB on disk instead of 45. the math was obvious but the result still caught me off guard. flash attention enabled itself automatically. KV cache quantization, expert offloading, thread tuning, none of that applied yet. this is baseline. full optimization breakdown and benchmark results dropping soon. if default settings do 112, i want to see where the ceiling is. exact hardware specs in the image below.



WebAuth Wallet now supports XRP Ledger (XRPL). XRPL accounts can now be created and managed directly within WebAuth Wallet.















xAI’s Colossus 2: The World’s First Gigawatt-Scale Datacenter














