Sunny
9.7K posts

Sunny
@sunnypause
I build the employees that never sleep, complain, or quit. AI agents for real businesses.





last time this qwen 3.5 MoE one shotted a full space shooter game. 3,483 lines across 10 files. ran on first load. zero steering. 112 tok/s on a single 3090. then i ran the same prompt on hermes 4.3 36B dense. similar size model, completely different architecture. it wrote 1,249 lines, declared done with empty files, needed three steering interventions, and the game didn't work. used 22% of available context and quit. nine posts and two GPU configs later the conclusion was clear: the bottleneck wasn't hardware. but that leaves a question. was that a dense architecture problem or a hermes 4.3 problem? qwen is the only family that ships both. 35B MoE with 3B active per token. and a 27B dense with all 27B active per token. same team. same training pipeline. different architecture. downloading qwen 3.5 27B dense now. Q4_K_M. same quant. same single RTX 3090. same octopus invaders prompt. if it finishes the game clean, hermes was the problem. if it fails the same way, dense architecture doesn't have the endurance for autonomous coding on consumer hardware regardless of who builds it. the tiebreaker.


LiteLLM HAS BEEN COMPROMISED, DO NOT UPDATE. We just discovered that LiteLLM pypi release 1.82.8. It has been compromised, it contains litellm_init.pth with base64 encoded instructions to send all the credentials it can find to remote server + self-replicate. link below
















