Biqing Qi retweetledi

Found an interesting next model architecture exploration work from Shanghai AI Lab: SDAR, a new paradigm that converts trained AR models into blockwise diffusion models for FAST parallel decoding!
✅ AR's training efficiency
✅ Diffusion's inference speed
The 30B MoE model even beats pure AR baselines on GPQA and ChemBench.
HF Papers: huggingface.co/papers/2510.06…
Model(1.7B/4B/8B/30B-A3B):huggingface.co/collections/Je…

English













