Yunhe Gao (@Claude47243643) - Twitter Profili | Zamantika Mersobahis Locabet

Yunhe Gao retweetledi

Physion Labs Official@EvelynZ5699647·26 Mar

🚨 “𝐖𝐡𝐢𝐜𝐡 𝐠𝐞𝐧𝐞𝐫𝐚𝐭𝐞𝐝 𝐯𝐢𝐝𝐞𝐨 𝐥𝐨𝐨𝐤𝐬 𝐛𝐞𝐭𝐭𝐞𝐫?” is the wrong question. And yet, that’s exactly what arena-style evaluation asks — and much of multimodal AI is still judged this way. The problem is that it captures visual preference, but fails to measure whether the scene is actually coherent — whether objects behave consistently, interactions make sense, or events follow causal structure. The real challenge isn’t visual quality. It’s whether a model can produce outputs that are correctly grounded across space, time, objects, and interactions — in other words, 𝐝𝐞𝐭𝐚𝐢𝐥𝐞𝐝 𝐦𝐮𝐥𝐭𝐢𝐦𝐨𝐝𝐚𝐥 𝐠𝐫𝐨𝐮𝐧𝐝𝐢𝐧𝐠 𝐚𝐧𝐝 𝐫𝐞𝐚𝐬𝐨𝐧𝐢𝐧𝐠. At 🚀 𝐏𝐡𝐲𝐬𝐢𝐨𝐧 𝐋𝐚𝐛𝐬 🚀, in collaboration with researchers from 𝐒𝐭𝐚𝐧𝐟𝐨𝐫𝐝, 𝐌𝐈𝐓, and 𝐇𝐚𝐫𝐯𝐚𝐫𝐝 -- including Peiyu Jing, Hong-Xing "Koven" Yu, Fangqiang Ding, Fan Nie, Weimin Wang, Yilun Du, James Zou, Jiajun Wu, and Bing Shuai -- we analyzed state-of-the-art video generation models. What we found is hard to ignore: 𝐚𝐜𝐫𝐨𝐬𝐬 𝐥𝐞𝐚𝐝𝐢𝐧𝐠 𝐯𝐢𝐝𝐞𝐨 𝐠𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐨𝐧 𝐦𝐨𝐝𝐞𝐥𝐬, 𝟴𝟯.𝟯% 𝐨𝐟 𝐞𝐱𝐨𝐜𝐞𝐧𝐭𝐫𝐢𝐜 𝐯𝐢𝐝𝐞𝐨𝐬 𝐚𝐧𝐝 𝟵𝟯.𝟱% 𝐨𝐟 𝐞𝐠𝐨𝐜𝐞𝐧𝐭𝐫𝐢𝐜 𝐯𝐢𝐝𝐞𝐨𝐬 𝐜𝐨𝐧𝐭𝐚𝐢𝐧 𝐩𝐡𝐲𝐬𝐢𝐜𝐚𝐥 𝐢𝐧𝐜𝐨𝐧𝐬𝐢𝐬𝐭𝐞𝐧𝐜𝐢𝐞𝐬. These are not just visual artifacts, but failures in object interactions, temporal continuity, and causal structure. Many are subtle, but fundamentally wrong. This reveals a critical gap. We’ve made massive progress in making videos look better, but far less progress in making them actually grounded and consistent. The uncomfortable truth is that “looks right” does not mean “is right,” and preference does not imply understanding. We’re releasing 🎬 𝐏𝐇𝐘𝐒𝐈𝐎𝐍-𝐄𝐕𝐀𝐋, the first human-centered benchmark for physical realism in AI-generated video. It includes over 10,000 expert reasoning traces, spans 22 fine-grained physical phenomena, provides temporally grounded annotations, and enables direct comparison between human and model reasoning. 📄 Paper: arxiv.org/pdf/2603.19607 🤗 Dataset: huggingface.co/datasets/Physi… 🖼️ Preview: youtube.com/watch?v=Vbn_W3… 𝐈𝐟 𝐰𝐞 𝐤𝐞𝐞𝐩 𝐨𝐩𝐭𝐢𝐦𝐢𝐳𝐢𝐧𝐠 𝐟𝐨𝐫 𝐚𝐩𝐩𝐞𝐚𝐫𝐚𝐧𝐜𝐞, 𝐰𝐞’𝐥𝐥 𝐠𝐞𝐭 𝐦𝐨𝐫𝐞 𝐜𝐨𝐧𝐯𝐢𝐧𝐜𝐢𝐧𝐠 𝐢𝐥𝐥𝐮𝐬𝐢𝐨𝐧𝐬 — 𝐧𝐨𝐭 𝐦𝐨𝐫𝐞 𝐫𝐞𝐥𝐢𝐚𝐛𝐥𝐞 𝐬𝐲𝐬𝐭𝐞𝐦𝐬. And for world models, robotics, and real-world deployment, that’s a fundamental failure. We’re open-sourcing the dataset and releasing the paper today. This is a step toward a new standard: not just generating what looks good, but generating what is actually 𝐠𝐫𝐨𝐮𝐧𝐝𝐞𝐝, 𝐜𝐨𝐧𝐬𝐢𝐬𝐭𝐞𝐧𝐭, 𝐚𝐧𝐝 𝐜𝐨𝐫𝐫𝐞𝐜𝐭. #AI #VideoGeneration #MultimodalAI #AIEvaluation #AIBenchmark #WorldModels #DeepLearning 🚀🐶

YouTube

English

11.2K

Yunhe Gao

Keşfet