allegedly!
607 posts

allegedly!
@januarycomputer
student + photographer

BREAKING: OpenAI co-founder Andrej Karpathy joins Anthropic.



I'm not Muslim but this is part of a bigger issue that needs to be talked about :/

My entire job is now codex and managing codex threads, I’m genuinely curious what the software engineering job even is anymore. The value of my understanding of any system goes down every single day. Very weird times

Introducing Composer 2.5, our most powerful model yet. It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions. For the next week, we’re doubling the included usage of the model.



MoE vs dense offload on 8GB VRAM MoE offload is 10.8x faster than dense offload on 8GB VRAM. here's the proof. I tested Qwen3.6 35B A3B (MoE, 3B active) vs Qwen3.6 27B (dense, 27B active) on my RTX 4060 Ti 8GB. the numbers: >MoE (-ncmoe 30): 35.4 tok/s >dense (-ngl 20): 3.28 tok/s ratio: 10.8x it gets worse at longer context. at 24K tokens, the gap is 16.7x. MoE has zero context degradation (SSM layers), dense loses -35.4%. why: MoE expert offload keeps the hot path (3B active params) entirely in VRAM. only inactive experts move to CPU when selected. dense layer offload splits every layer across GPU and CPU. every token bounces through PCIe for all 64 layers. the bandwidth bottleneck is fatal. quality is slightly better on dense (5/6 vs 4/6). the 27B model has the best hallucination resistance of all 9 models I tested. if you have 8GB VRAM and a model that doesn't fit: MoE with expert offload, not dense with layer offload.





Mixtape, the industry plant game accused of review inflation, received A$90,000 in public funding from VicScreen, an agency of the Victorian state government in Australia. Despite its short playtime and flop Steam player count, it has earned widespread high review scores including IGN’s 10/10, along with praise from many journalists and influencers.
















