Physion Labs Official

6 posts

Physion Labs Official

Physion Labs Official

@Physion_Labs

Physion Labs is building the trust layer for video generation and world models.

Bellevue, WA Katılım Temmuz 2023
31 Takip Edilen161 Takipçiler
Physion Labs Official
Physion Labs Official@Physion_Labs·
We've spoken with hundreds of ad creatives, marketing designers, filmmakers, and animation teams — and heard the same thing: the outputs look great… until they don't 😅. When they fail, it's incredibly hard to tell why. Is it the prompt, the model, or the world itself quietly breaking? That ambiguity is the real bottleneck. Physion-Atlas 1.0 introduces a more objective, diagnostic way to evaluate video world models — moving beyond high-level comparisons to surface what actually matters. It disentangles prompt misalignment from physical and visual inconsistencies, grounding every judgment in explicit spatiotemporal evidence. Not just which output is better, but what breaks, when, where, and why. From abstract comparisons → diagnosable reality 🔍 📄 Blog: physionlabs.ai/blog/physion-a… 📝 Evaluate your model: docs.google.com/forms/d/e/1FAI…
English
4
18
32
2.6K
Physion Labs Official
Physion Labs Official@Physion_Labs·
We’ve been quietly blown away by the response to Galileo-0: physionlabs.ai/blog/galileo-0 Over the past few days, we’ve seen waiting list signups from a wide range of teams — from video generation startups and frontier model developers to video platforms and studios. We’re grateful for the curiosity and openness to engage. It’s still early, and we have a lot to learn. Over the coming week, we’ll start reaching out and connecting with teams on the waiting list to better understand your workflows, challenges, and where a world critic like Galileo-0 can be most useful. If you’ve signed up — thank you. Looking forward to the conversations ahead. #Galileo0 #PhysionLabs #WorldCritic
English
1
3
6
308
Kangfu Mei
Kangfu Mei@KangfuM·
@EvelynZ5699647 This is a great work. Any plan of applying this into video generative model as RL rewards?
English
2
0
0
370
Physion Labs Official
Physion Labs Official@Physion_Labs·
🚀🚀🚀We're excited to introduce Galileo 0 (lnkd.in/gJttjEr5) — our first research preview of a world critic for AI-generated video, which already outperforms Qwen 3.5-Plus, Gemini 3.1 Pro, Pegasus 1.2, and GPT 5.4 on physical consistency reasoning 🚀🚀🚀 Galileo doesn't just score outputs. It diagnoses them — identifying what failed, when it failed, where it happened, and why it broke the rules of the world. This is a step toward a new paradigm: generate → critique → refine → repeat — where models don't just produce worlds, but learn to keep them consistent over time. 𝐖𝐡𝐚𝐭 𝐦𝐚𝐤𝐞𝐬 𝐭𝐡𝐢𝐬 𝐦𝐢𝐥𝐞𝐬𝐭𝐨𝐧𝐞 𝐞𝐯𝐞𝐧 𝐦𝐨𝐫𝐞 𝐦𝐞𝐚𝐧𝐢𝐧𝐠𝐟𝐮𝐥: We built Galileo 0 — along with our datasets (including our public Physion-Eval benchmark), evaluation pipeline, and early pilots — with less than $200K total spend in 3 months. No massive training clusters. No billion-dollar budgets. Just a small, relentless team, strong conviction — and yes, at one point, a five-day stretch of not showering to get this model out. Because we believe reliability will become core infrastructure for world models. In a world where billions are being poured into generation, the missing piece isn't more pixels — it's better critics 😊 #PhysionLabs #Galileo0 #WorldModels
English
11
40
278
21.3K
vehas
vehas@amvehas·
@EvelynZ5699647 are you sure this is not already in all of rl env in every video/visual gen AI lab ever ?
English
1
0
1
547
Physion Labs Official
Physion Labs Official@Physion_Labs·
🚨 “𝐖𝐡𝐢𝐜𝐡 𝐠𝐞𝐧𝐞𝐫𝐚𝐭𝐞𝐝 𝐯𝐢𝐝𝐞𝐨 𝐥𝐨𝐨𝐤𝐬 𝐛𝐞𝐭𝐭𝐞𝐫?” is the wrong question. And yet, that’s exactly what arena-style evaluation asks — and much of multimodal AI is still judged this way. The problem is that it captures visual preference, but fails to measure whether the scene is actually coherent — whether objects behave consistently, interactions make sense, or events follow causal structure. The real challenge isn’t visual quality. It’s whether a model can produce outputs that are correctly grounded across space, time, objects, and interactions — in other words, 𝐝𝐞𝐭𝐚𝐢𝐥𝐞𝐝 𝐦𝐮𝐥𝐭𝐢𝐦𝐨𝐝𝐚𝐥 𝐠𝐫𝐨𝐮𝐧𝐝𝐢𝐧𝐠 𝐚𝐧𝐝 𝐫𝐞𝐚𝐬𝐨𝐧𝐢𝐧𝐠. At 🚀 𝐏𝐡𝐲𝐬𝐢𝐨𝐧 𝐋𝐚𝐛𝐬 🚀, in collaboration with researchers from 𝐒𝐭𝐚𝐧𝐟𝐨𝐫𝐝, 𝐌𝐈𝐓, and 𝐇𝐚𝐫𝐯𝐚𝐫𝐝 -- including Peiyu Jing, Hong-Xing "Koven" Yu, Fangqiang Ding, Fan Nie, Weimin Wang, Yilun Du, James Zou, Jiajun Wu, and Bing Shuai -- we analyzed state-of-the-art video generation models. What we found is hard to ignore: 𝐚𝐜𝐫𝐨𝐬𝐬 𝐥𝐞𝐚𝐝𝐢𝐧𝐠 𝐯𝐢𝐝𝐞𝐨 𝐠𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐨𝐧 𝐦𝐨𝐝𝐞𝐥𝐬, 𝟴𝟯.𝟯% 𝐨𝐟 𝐞𝐱𝐨𝐜𝐞𝐧𝐭𝐫𝐢𝐜 𝐯𝐢𝐝𝐞𝐨𝐬 𝐚𝐧𝐝 𝟵𝟯.𝟱% 𝐨𝐟 𝐞𝐠𝐨𝐜𝐞𝐧𝐭𝐫𝐢𝐜 𝐯𝐢𝐝𝐞𝐨𝐬 𝐜𝐨𝐧𝐭𝐚𝐢𝐧 𝐩𝐡𝐲𝐬𝐢𝐜𝐚𝐥 𝐢𝐧𝐜𝐨𝐧𝐬𝐢𝐬𝐭𝐞𝐧𝐜𝐢𝐞𝐬. These are not just visual artifacts, but failures in object interactions, temporal continuity, and causal structure. Many are subtle, but fundamentally wrong. This reveals a critical gap. We’ve made massive progress in making videos look better, but far less progress in making them actually grounded and consistent. The uncomfortable truth is that “looks right” does not mean “is right,” and preference does not imply understanding. We’re releasing 🎬 𝐏𝐇𝐘𝐒𝐈𝐎𝐍-𝐄𝐕𝐀𝐋, the first human-centered benchmark for physical realism in AI-generated video. It includes over 10,000 expert reasoning traces, spans 22 fine-grained physical phenomena, provides temporally grounded annotations, and enables direct comparison between human and model reasoning. 📄 Paper: arxiv.org/pdf/2603.19607 🤗 Dataset: huggingface.co/datasets/Physi… 🖼️ Preview: youtube.com/watch?v=Vbn_W3… 𝐈𝐟 𝐰𝐞 𝐤𝐞𝐞𝐩 𝐨𝐩𝐭𝐢𝐦𝐢𝐳𝐢𝐧𝐠 𝐟𝐨𝐫 𝐚𝐩𝐩𝐞𝐚𝐫𝐚𝐧𝐜𝐞, 𝐰𝐞’𝐥𝐥 𝐠𝐞𝐭 𝐦𝐨𝐫𝐞 𝐜𝐨𝐧𝐯𝐢𝐧𝐜𝐢𝐧𝐠 𝐢𝐥𝐥𝐮𝐬𝐢𝐨𝐧𝐬 — 𝐧𝐨𝐭 𝐦𝐨𝐫𝐞 𝐫𝐞𝐥𝐢𝐚𝐛𝐥𝐞 𝐬𝐲𝐬𝐭𝐞𝐦𝐬. And for world models, robotics, and real-world deployment, that’s a fundamental failure. We’re open-sourcing the dataset and releasing the paper today. This is a step toward a new standard: not just generating what looks good, but generating what is actually 𝐠𝐫𝐨𝐮𝐧𝐝𝐞𝐝, 𝐜𝐨𝐧𝐬𝐢𝐬𝐭𝐞𝐧𝐭, 𝐚𝐧𝐝 𝐜𝐨𝐫𝐫𝐞𝐜𝐭. #AI #VideoGeneration #MultimodalAI #AIEvaluation #AIBenchmark #WorldModels #DeepLearning 🚀🐶
YouTube video
YouTube
English
3
17
50
12.5K