
Humblefarms 𝔽rAI
8.4K posts

Humblefarms 𝔽rAI
@Humblefarms1
Crypto Explorer | Navigating the Crypto Space | locked in #Crypto #DeFi


LLaDA2.0 converts a normal LLM into a diffusion model that writes faster by filling many blanks together at 100B scale. Their 100B model reports 535 tokens per second, about 2.1 times faster than similar autoregressive baselines. Autoregressive models predict the next token, a small chunk of text, from the previous ones, so generation is forced step by step. Diffusion language models train on corrupted text where many tokens are masked, and they learn to recover the missing parts using both left and right context. It starts from an already trained autoregressive model and gradually changes the masking pattern, first small blocks, then whole sequences, then small blocks again. During training, it also stops the model from reading across document boundaries, which matters when many short texts are packed together. For instruction tuning, meaning training it to follow prompts, and for speed, it uses paired masks so every token gets trained, and it pushes the model to make confident guesses so many blanks can be filled at once. ---- Paper Link – arxiv. org/abs/2512.15745 Paper Title: "LLaDA2.0: Scaling Up Diffusion Language Models to 100B"




















