
alex wortega
376 posts




Learnings from testing Claude Opus 4.8: > Much worse than Opus 4.7 and GPT 5.5 on Vending Bench > More aligned than previous Claude models (Opus 4.6+ and Mythos) > Also worse on Blueprint-Bench > Scared of getting caught > Max reasoning is not the best reasoning effort

Introducing DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation pub.sakana.ai/diffusionblocks What if we didn’t have to hold an entire neural network in memory to train it? Standard neural net training optimizes all parameters jointly. As a result, the memory required during training grows linearly with the depth of the network. In our #ICLR2026 paper, we propose DiffusionBlocks, a principled framework to train networks one block at a time, drastically reducing memory requirements while matching end-to-end performance. With DiffusionBlocks, we split the network into blocks and train them one at a time, so you only need memory for a single block. How? We explicitly assign each block a role: to move the representation a little closer to the target than the block before it did. That role turns out to be precisely what a diffusion model does, step by step. Each block only needs to optimize its own objective and can be trained independently. We validated this across five different architectures: • ViT • DiT • Masked diffusion • Autoregressive transformers • Recurrent-depth transformers In each case, performance is competitive with end-to-end training while using a fraction of the memory. This perspective also extends naturally to recurrent-depth (Looped) transformers, which apply the same network iteratively and normally require expensive backpropagation through time (BPTT). Viewed through DiffusionBlocks, we can replace those multiple iterations with a single forward pass during training. Read our paper and code, to learn more. Paper: arxiv.org/abs/2506.14202 GitHub: github.com/SakanaAI/Diffu… 🐟









WebGPU support in llama.cpp is here! Check out our blog post introducing it: reeselevine.github.io/llamas-on-the-… Run local models in your browser, with GPU acceleration. No data leaves your computer! Thanks to everyone who's made this possible, especially @ggerganov











