See Right (Robustness):BiPS enforces perceptual consistency via a bi-directional KL divergence constraint.
This aligns predictive distributions between noisy and focused views, effectively mitigating hallucinations caused by visual noise. 🧠
🚀 New Research: Efficient & Robust MLLMs via Bi-directional Perceptual Shaping (BiPS)
High-res inputs in Multimodal LLMs cause compute bottlenecks & noise sensitivity. Our new framework solves the "redundancy vs. utility" trade-off.
See Less, See Right. 🧵👇
🏗️ Generative Design
Bridging the gap between parametric history and 3D geometry:
🔹 CADMorph: Geometry-Driven Parametric CAD Editing via a Plan-Generate-Verify Loop
#GenerativeAI#CAD
🧬 AI for Science & Healthcare
🔹 MIRA: Medical Time Series Foundation Model for Real-World Health Data
🔹 Generating Full-field Evolution of Physical Dynamics from Irregular Sparse Observations
🔹 Functional Complexity-adaptive Temporal Tensor Decomposition
🚀 We are heading to #NeurIPS2025 in San Diego!
Excited to announce my group has 7 accepted papers this year, tackling the frontiers of Agentic AI, AI for Health & Science, and Generative Design.
A breakdown of our work 🧵👇
#AI#MachineLearning#MicrosoftResearch
Second fix: Flexible, Non-Linear Reasoning.
No more rigid, one-way chains! PixelCraft has a "Planner" and an "Image Memory" (a "cognitive whiteboard").
This lets the system adaptively revisit any prior visual step, backtrack from errors, and explore different reasoning branches
MLLMs are great, but they're surprisingly bad at reading charts and geometry. A tiny "perceptual slip" can wreck the whole reasoning process.
We're thrilled to introduce PixelCraft 👾, a new multi-agent system to solve this.
Key Insight: Perfect self-verification isn't required. We frame reasoning as a probabilistic process.
As long as the chance of improvement is > chance of degradation, the model can converge to the correct answer.
Result: An 8B model beat its 600B teacher on AIME.
💥Thrilled to share our new work Reinforce-Ada, which fixes signal collapse in GRPO
🥳No more blind oversampling or dead updates. Just sharper gradients, faster convergence, and stronger models.
⚙️ One-line drop-in. Real gains.
arxiv.org/html/2510.0499…github.com/RLHFlow/Reinfo…
🔑 Under the hood
• Grounded latent via a proprio Forward-Dynamics Model → deeper motion understanding
• Joint diffusion policy where latent & low-level actions co-evolve → long-horizon reasoning
• Superior performance on SIMPLER, LIBERO and gripper & dexterous-hand 🏆
The impact is clear: Geometry Forcing substantially improves visual quality and 3D consistency over baseline methods, slashing the FVD score from 364 to 243 on a long-term video generation task.
Read the full paper here: arxiv.org/pdf/2507.07982
Our solution, Geometry Forcing, aligns the video model’s internal representations with features from a pretrained geometric foundation model (VGGT).
We introduce two new objectives, Angular Alignment and Scale Alignment, to enforce geometric consistency during training.
Video diffusion models are often blind to 3D geometry. We taught them to see.
Excited to share our new work, Geometry Forcing, a method for generating stunningly consistent and 3D-aware video.
Project Page 👇 geometryforcing.github.io#ComputerVision#AI#3DModeling