Do Xuan Long

101 posts

Do Xuan Long

@dxlong2000

Student Researcher @Google & CS PhD @NUSingapore | Prev. @amazon, @NTUsg

Katılım Aralık 2021

464 Takip Edilen232 Takipçiler

Sabitlenmiş Tweet

Do Xuan Long@dxlong2000·22 Eki

Thanks @_akhaliq for sharing our work! 🎥 Really excited about how video prompts can be optimized for SOTA video generation models, potentially helping many many users save time and avoid laboring over video details. ⏱️✨ 📺 More videos here: g-vista.github.io

AK@_akhaliq

Google presents VISTA A Test-Time Self-Improving Video Generation Agent

English

703

Do Xuan Long retweetledi

𝐊𝐚𝐦𝐢𝐥 𝐏𝐚𝐰𝐥𝐢𝐤@plKamilPawlik·12 May

Badacze z Google Cloud AI Research i National University of Singapore niedawno opublikowali papier o A²RD – systemie do generowania względnie długich materiałów wideo za pomocą AI... i trzeba przyznać, że wygląda to naprawdę przyzwoicie. Demka pokazują pojedyncze sceny trwające nawet około minuty bez typowego dla AI rozpadu wizualnego, a ich jakość jest zaskakująco dobra; w pewnym sensie bardziej przypomina mi to rendery z programów graficznych niż typowy AI slop. Polecam sprawdzić samemu.

Polski

2.3K

Do Xuan Long retweetledi

Tomas Pfister@tomaspfister·18 May

If you’ve tried making AI video >30s, you know the nightmare. Bouncing between tools, manual stitching, and fighting "identity drift" where faces morph every frame. We decided to automate the entire crew. Meet Co-Director. 🧵👇 (1/9)

English

698

Tengxiao Liu@TengxiaoLiu·25 Mar

Auto research is on 🔥 We give algorithmic problems (like circle packing) to general coding agents, let it run overnight. 🌙 Agents reach SoTA. But more importantly: we analyze 100+ hours of trajectories to understand how it gets there 🧵

English

32.1K

Do Xuan Long@dxlong2000·26 Mar

@TengxiaoLiu It looks fun 😆

English

105

Do Xuan Long retweetledi

Jiefeng Chen@jiefengchen1·20 Mar

My team at Google Cloud AI Research is looking for a Student Researcher Intern to dive deep into coding agents. We’re looking for someone who doesn’t just read about agentic workflows but builds them. What we’re looking for: Academic Rigor: Currently pursuing a Ph.D. with a strong publication record. Technical Chops: Excellent coding skills are a must. Agent Experience: If you’ve built or experimented with coding agents (like Claude Code, GEMINI CLI or similar frameworks), we want to talk to you. Come help us push the boundaries of LLM-based software engineering. 🚀 If this sounds like a fit, feel free to DM me or send your CV directly to jiefengc@google.com #Google #AIResearch #CodingAgents #LLMs #MachineLearning

English

160

14.9K

Do Xuan Long@dxlong2000·2 Mar

@NiJinjie @GoogleDeepMind @YiTayML @quocleix Huge congrats @NiJinjie ✨

English

375

Jinjie Ni@NiJinjie·2 Mar

Life update: I’ve joined @GoogleDeepMind as a research scientist to work on ✨gemini scaling and RL, under the leadership of Yi Tay (@YiTayML) and Quoc Le (@quocleix). I feel extremely fortunate to be on the critical path towards AGI and can't wait to help push the frontier of gemini capabilities! 🚀

English

1.2K

91.7K

Do Xuan Long retweetledi

Rohan Paul@rohanpaul_ai·30 Eki

New Google paper builds a video generator that improves itself at test time by rewriting the prompt while it runs. It first turns the user prompt into a simple timeline of scenes with duration, characters, actions, environment, camera, sounds, and mood. It then makes several videos and picks the best using head to head comparisons that swap the order to avoid bias. The picker also applies hard penalties for broken physics, random text on screen, extra scene cuts, or voice and music that were not requested. After that, 3 separate judges score the winner on visuals, audio, and context, and a meta judge merges the notes into clear issues. A reasoning agent converts those issues into short prompt edits that keep the user’s intent and target the exact failures. The system repeats this loop, generates new candidates, and keeps the best until further edits stop helping. On single scene and multi scene tests with Veo 3 and Veo 2, it raises visual quality, motion realism, prompt match, and audio quality. Across stronger baselines, it reaches up to 60% pairwise wins, and humans choose its results in 66.4% of trials. ---- Paper – arxiv. org/abs/2510.15831v1 Paper Title: "VISTA: A Test-Time Self-Improving Video Generation Agent"

English

6.5K

Do Xuan Long retweetledi

wing.nus@wing_nus·27 Eki

🤔 Why do Transformers and Mamba (SSMs) fail differently on long context? 🔎 How do they mix and reshape context across depth? 🚀 No one had a unified, token + layer-level view — until now! 🔗 Paper: arxiv.org/pdf/2510.06640 🧵 👇 More in thread #Transformers #Mamba #NLP

English

342

Do Xuan Long@dxlong2000·25 Eki

Join work with @wanxingchen_, Hootan Nakhost, @chl260, @tomaspfister, @sercanarik!

English

Do Xuan Long@dxlong2000·22 Eki

AK@_akhaliq

Google presents VISTA A Test-Time Self-Improving Video Generation Agent

English

703

Do Xuan Long@dxlong2000·25 Eki

Thanks for sharing our work!

Louis Gleeson@aigleeson

🚨 Google just dropped the most advanced self-improving video AI ever built. It’s called VISTA, and it literally rewrites its own prompts to make every new generation better than the last. No retraining. No fine-tuning. Just pure test-time self-reflection. Here’s how it works: → Turns your idea into a full scene-by-scene storyboard → Generates multiple video candidates → Runs a tournament to find the best one → Then critiques itself visually, audibly, contextually before trying again Each loop = sharper visuals, tighter storytelling, more aligned motion. The results? 60% win rate vs Veo 3 and 66.4% human preference. This isn’t “text-to-video.” This is video that learns from itself.

English

497

Do Xuan Long retweetledi

wing.nus@wing_nus·24 Eki

Ever wondered *how* language models understand discourse relations 🧠⚡️🔍? We address this long-standing question in our #EMNLP2025 paper: “Discursive Circuits: How Do Language Models Understand Discourse Relations?” By @YisongMiao and @knmnyn #NLProc #Discourse 🧵1/n

English

Yisong Miao @ EMNLP Suzhou@YisongMiao·21 Eki

My lab mate @dxlong2000 goes to intern at Google only for a short while, and see what he has cooked 😍🧑‍🍳

AK@_akhaliq

Google presents VISTA A Test-Time Self-Improving Video Generation Agent

English

420

Do Xuan Long@dxlong2000·22 Eki

@godofprompt Thanks so much for featuring our work! More exciting videos are 🙌: g-vista.github.io.

English

108

Do Xuan Long retweetledi

God of Prompt@godofprompt·22 Eki

Holy shit...Google just dropped a self-improving video generation agent 🤯 It’s called VISTA, and it literally rewrites its own prompts to make videos better every single generation. No retraining. No fine-tuning. Just pure test-time self-reflection. Here’s how it works: → Breaks your idea into a full scene-by-scene plan → Generates multiple videos → Judges them in a tournament → Then critiques itself visually, audibly, and contextually before trying again Each loop = smarter, sharper, more aligned video. The results? A 60% win rate against SOTA models like Veo 3 and 66.4% human preference. This isn’t just text-to-video. This is video that learns from itself.

English

103

27.8K

Do Xuan Long@dxlong2000·21 Eki

@YisongMiao Thank you so much bro :::

English

Do Xuan Long@dxlong2000·13 Eki

@agihippo Industrial roles prefer industrial experiences?

Català

yi@agihippo·12 Eki

What's with the young undergraduates in Singapore these days fomo farming internships? I had zero internships and I still turned out pretty fine.

English

7.2K

Do Xuan Long@dxlong2000·10 Ağu

@agihippo oh yeah that’s true!

English

351

yi@agihippo·10 Ağu

@dxlong2000 Google has a nice gym

English

764

yi@agihippo·10 Ağu

a while back NUS offered me a fancy professor title (honorary) but i rejected it because there was no point at all with such a title. but now i realised i could have just taken it so i could book the badminton courts there. damnit.

English

6.9K

Do Xuan Long@dxlong2000·10 Ağu

so huge 😱, congrats @NiJinjie, @michaelqshieh and the team!

Jinjie Ni@NiJinjie

Token crisis: solved. ✅ We pre-trained diffusion language models (DLMs) vs. autoregressive (AR) models from scratch — up to 8B params, 480B tokens, 480 epochs. Findings: > DLMs beat AR when tokens are limited, with >3× data potential. > A 1B DLM trained on just 1B tokens hits 56% HellaSwag & 33% MMLU — no tricks, no cherry-picks. > No saturation: more repeats = more gains. 🚨 ”x.openreview.net” We also dissected the serious methodological flaws in our parallel work “Diffusion Beats Autoregressive in Data-Constrained Settings” — let’s raise the bar for open review! 🔗 Blog & details: jinjieni.notion.site/Diffusion-Lang… 18 🧵s ahead:

English

415

Keşfet

@TengxiaoLiu @NiJinjie @GoogleDeepMind @YiTayML @quocleix @wanxingchen_ @chl260 @tomaspfister