
Google presents VISTA A Test-Time Self-Improving Video Generation Agent
Do Xuan Long
101 posts

@dxlong2000
Student Researcher @Google & CS PhD @NUSingapore | Prev. @amazon, @NTUsg

Google presents VISTA A Test-Time Self-Improving Video Generation Agent









Google presents VISTA A Test-Time Self-Improving Video Generation Agent



Google presents VISTA A Test-Time Self-Improving Video Generation Agent




Token crisis: solved. ✅ We pre-trained diffusion language models (DLMs) vs. autoregressive (AR) models from scratch — up to 8B params, 480B tokens, 480 epochs. Findings: > DLMs beat AR when tokens are limited, with >3× data potential. > A 1B DLM trained on just 1B tokens hits 56% HellaSwag & 33% MMLU — no tricks, no cherry-picks. > No saturation: more repeats = more gains. 🚨 ”x.openreview.net” We also dissected the serious methodological flaws in our parallel work “Diffusion Beats Autoregressive in Data-Constrained Settings” — let’s raise the bar for open review! 🔗 Blog & details: jinjieni.notion.site/Diffusion-Lang… 18 🧵s ahead: