
bycloud
1.6K posts

bycloud
@bycloudai
I make youtube videos on cool AI research /// AI papers newsletter https://t.co/Xn7GMDbQSd /// paper recap @TheAITimeline /// https://t.co/yigZMs32sO


Huawei says that the upcoming Mate 90, set to be released in September this year, will feature a “Tao’s Law” chip that achieves performance comparable to traditional 3nm chips. This is not something they can brag, as many industry experts will surely buy the Mate 90 for testing. If this proves to be true, it would be reasonable to expect Huawei to reach the equivalent of traditional 1.4nm chip performance by around 2031, as they have claimed days ago.




Introducing DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation pub.sakana.ai/diffusionblocks What if we didn’t have to hold an entire neural network in memory to train it? Standard neural net training optimizes all parameters jointly. As a result, the memory required during training grows linearly with the depth of the network. In our #ICLR2026 paper, we propose DiffusionBlocks, a principled framework to train networks one block at a time, drastically reducing memory requirements while matching end-to-end performance. With DiffusionBlocks, we split the network into blocks and train them one at a time, so you only need memory for a single block. How? We explicitly assign each block a role: to move the representation a little closer to the target than the block before it did. That role turns out to be precisely what a diffusion model does, step by step. Each block only needs to optimize its own objective and can be trained independently. We validated this across five different architectures: • ViT • DiT • Masked diffusion • Autoregressive transformers • Recurrent-depth transformers In each case, performance is competitive with end-to-end training while using a fraction of the memory. This perspective also extends naturally to recurrent-depth (Looped) transformers, which apply the same network iteratively and normally require expensive backpropagation through time (BPTT). Viewed through DiffusionBlocks, we can replace those multiple iterations with a single forward pass during training. Read our paper and code, to learn more. Paper: arxiv.org/abs/2506.14202 GitHub: github.com/SakanaAI/Diffu… 🐟














