
michael_jay
858 posts

















the 5090 just woke up. gemma 4 31b dense loaded, 128k context, llama-server on port 8080, hermes agent ready on the other side. this laptop has two gpus, the intel i9 integrated for everyday work and the rtx 5090 mobile 24gb for ai. the 5090 sits idle most hours. right now it's spinning up hard, fans sucking air from every direction, my fingers getting cold from the airflow, the entire machine feels awake. next up: speed sweeps across every context size, then autonomous agentic tasks on hermes agent. then direct comparison against the qwen 3.5-27b dense numbers i ran on a 3090 earlier. then qwen 27b dense on this same 5090 after gemma is done. 24gb vs 24gb, different models, same room. and someone anon gave me this laptop. running verified benchmark data for every builder on a machine the internet bought. this is what 2026 looks like when you build in public.




- model: Qwen3.6-35B-A3B-UD-IQ4_XS.gguf - GPU: RTX 4090 - CUDA, f16 KV, flash attention on - n_gpu_layers=999, threads=8, batch=256, ubatch=256 - Prompt-only, 512 tokens: about 4995 tok/s - Generation-only, 128 tokens: about 180 tok/s - Mixed, 4096 prompt + 128 gen: about 2700 tok/s effective combined throughput - 512,0: 4976.8 to 4994.8 tok/s - 0,128: 179.36 to 179.95 tok/s - 4096,128: 2700.06 tok/s x.com/ErdalToprak/st…




i want to grow my ideas like a garden. a conceptual prototype inspired by this thread of tweets.





Introducing Claude Code Security, now in limited research preview. It scans codebases for vulnerabilities and suggests targeted software patches for human review, allowing teams to find and fix issues that traditional tools often miss. Learn more: anthropic.com/news/claude-co…





