Rowland Oti 🅨
10.2K posts

Rowland Oti 🅨
@rowlandoti
Memoirs of a fool. I post not for your edification, but for my re-education. #

Introducing DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation pub.sakana.ai/diffusionblocks What if we didn’t have to hold an entire neural network in memory to train it? Standard neural net training optimizes all parameters jointly. As a result, the memory required during training grows linearly with the depth of the network. In our #ICLR2026 paper, we propose DiffusionBlocks, a principled framework to train networks one block at a time, drastically reducing memory requirements while matching end-to-end performance. With DiffusionBlocks, we split the network into blocks and train them one at a time, so you only need memory for a single block. How? We explicitly assign each block a role: to move the representation a little closer to the target than the block before it did. That role turns out to be precisely what a diffusion model does, step by step. Each block only needs to optimize its own objective and can be trained independently. We validated this across five different architectures: • ViT • DiT • Masked diffusion • Autoregressive transformers • Recurrent-depth transformers In each case, performance is competitive with end-to-end training while using a fraction of the memory. This perspective also extends naturally to recurrent-depth (Looped) transformers, which apply the same network iteratively and normally require expensive backpropagation through time (BPTT). Viewed through DiffusionBlocks, we can replace those multiple iterations with a single forward pass during training. Read our paper and code, to learn more. Paper: arxiv.org/abs/2506.14202 GitHub: github.com/SakanaAI/Diffu… 🐟

Microsoft and Safaricom are taking over the SaaS market. The other day Safaricom was launching a School Management System. A whole Safaricom. This adds another reason for me to continue being serious with my Plumbing course.



Morning bathrobe rant: tubes on pegs.

🚨Data Breach Alert ‼️ 𝗧𝗲𝗮𝗺𝗣𝗖𝗣 𝗖𝗹𝗮𝗶𝗺𝘀 𝗦𝗮𝗹𝗲 𝗼𝗳 𝗚𝗶𝘁𝗛𝘂𝗯 𝗜𝗻𝘁𝗲𝗿𝗻𝗮𝗹 𝗦𝗼𝘂𝗿𝗰𝗲 𝗖𝗼𝗱𝗲 TeamPCP hacking group claimed the compromise and sale of GitHub internal data, allegedly including around 4,000 private repositories containing source code related to GitHub’s main platform and internal organizations. Threat actor: TeamPCP Sector: ICT Data exposure (claimed): Approximately 4,000 private repositories Data type: Source code Observed: May 19, 2026 Status: Pending verification ESIX©: 7.96 Full details and impact assessment on HackRisk.io



Atlassian just reported $1.79B in quarterly revenue and serves 350,000+ customers then fired the engineer who built their infrastructure He shares the whole thing a breakdown of Atlassian’s playbook: > Envoy over enterprise load balancers > sidecars for auth + logging + rate limits > DynamoDB + SQS > automated VM deployments

New blackboard lecture w @ericjang11 He walks through how to build AlphaGo from scratch, but with modern AI tools. Sometimes you understand the future better by stepping backward. AlphaGo is still the cleanest worked example of the primitives of intelligence: search, learning from experience, and self-play. You have to go back to 2017 to get insight into how the more general AIs of the future might learn. Once he explained how AlphaGo works, it gave us the context to have a discussion about how RL works in LLMs and how it could work better – naive policy gradient RL has to figure out which of the 100k+ tokens in your trajectory actually got you the right answer, while AlphaGo’s MCTS suggests a strictly better action every single move, giving you a training target that sidesteps the credit assignment problem. The way humans learn is surely closer to the second. Eric also kickstarted an Autoresearch loop on his project. And it was very interesting to discuss which parts of AI research LLMs can already automate pretty well (implementing and running experiments, optimizing hyperparameters) and which they still struggle with (choosing the right question to investigate next, escaping research dead ends). Informative to all the recent discussion about when we should expect an intelligence explosion, and what it would look like from the inside. Timestamps: 0:00:00 – Basics of Go 0:08:06 – Monte Carlo Tree Search 0:31:53 – What the neural network does 1:00:22 – Self-play 1:25:27 – Alternative RL approaches 1:45:36 – Why doesn’t MCTS work for LLMs 2:00:58 – Off-policy training 2:11:51 – RL is even more information inefficient than you thought 2:22:05 – Automated AI researchers



If you're marrying a wealthy person and they ask you to sign a prenup, make sure there is a fidelity clause there—if anyone cheats, 50% of their assets go to the other partner after divorce.

If you're marrying a wealthy person and they ask you to sign a prenup, make sure there is a fidelity clause there—if anyone cheats, 50% of their assets go to the other partner after divorce.

if your agent doesn't write design specs like this your ngmi















