Steven Luscher
897 posts

Steven Luscher
@steveluscher
that one american dude with hair like shaggy from scooby doo like to create things with my fingers on my computer





One Model. One Giant Megakernel. Zero handwritten code. My fully custom TPU now runs a Llama model. The compiler takes the entire model — attention, FFN, norms, everything — and fuses it into a single megakernel binary. No op-by-op dispatch. No kernel launch overhead. One model, one kernel. Since my last post, I've also built an ISA simulator to debug RTL without staring at waveforms all day, and implemented DMA/compute overlap so the machine isn't sitting idle waiting on memory. Same stack as before: JAX → HLO → MLIR → ASM → VLIW → Binary. Still no CUDA. Still no handwritten kernels. Still running on my custom Verilog RTL. Pure compiler codegen top to bottom. What should I target next?














Hi I work at Solana Foundation Memecoins are good

A wealth tax treats capital as a static hoard instead of a reproductive process. By taxing unrealized gains or productive assets, it forces the sale of the means of production to pay claims on future labor.


"we need IBRL" says the guy who keeps entertaining proposals that batch transactions and add multiple proposers. pick a lane brother the physics of these ideas are mutually exclusive.













