Jiahao Lu (@FFzzf08) - Twitter Profili | Zamantika Mersobahis Locabet

Jiahao Lu retweetledi

Wang Zhao@WangZhao_0849·12 May

🚀🚀 Introducing Pixal3D (SIGGRAPH’26) — a new pixel-aligned image-to-3D generation paradigm for high-fidelity 3D asset creation. Today’s Image-to-3D has become pretty good at producing plausible 3D assets. But plausibility is not enough. Fidelity is a hidden bottleneck. ❓A generated model may look “about right,” yet still fail to truly align with the input pixels. Can we make 3D generation as faithful as reconstruction, while still allowing it to complete the unseen? Pixal3D is our answer. 💡We believe the core bottleneck behind fidelity is 2D–3D correspondence. Most 3D-native generators synthesize shapes in canonical space and inject image cues through cross-attention, forcing the model to implicitly search for which pixels correspond to which 3D regions. 🍀Pixal3D takes a different route. Instead of generating in canonical space, Pixal3D generates directly in pixel-aligned camera space — what you see is what you get. The generated 3D asset is aligned with the input view from the start. ☕️Meanwhile, Pixal3D introduces back-projection-based image condition scheme - explicitly back-projects multi-scale pixel features into 3D voxels, thus resolving the 2D-3D association problem. The input image is no longer just a prompt - it becomes a geometric anchor. 🚩Pixal3D shows that pixel-aligned 3D generation is not only feasible and scalable, but also significantly improves fidelity, pushing 3D-native generation closer to reconstruction-level faithfulness. It also naturally extends to multi-view and scene-level 3D generation. ✅Faithful to the input view. ✅Generative for the unseen. Closer to reconstruction-level fidelity, with the creativity of 3D generation. Pixal3D also represents an effort towards the unification of 3D generation and reconstruction. 📢Paper, code, and demo are fully released — try it out and let us know your feedback! 🌐Project page: ldyang694.github.io/projects/pixal… 🤗Huggingface Demo: huggingface.co/spaces/Tencent… 💻Code: github.com/TencentARC/Pix… 📄Paper: arxiv.org/abs/2605.10922

English

27

144

1.2K

180.5K

Jiahao Lu retweetledi

Yuan Liu@YuanLiu41955461·14 Nis

🚀 Introducing CoMoVi! From a start image & text prompt, it simultaneously generates realistic human videos and corresponding 3D motion sequences. ✨ No reference videos needed to extract skeletons anymore! 🧠 By co-generating motion and video, CoMoVi directly inherits the massive generalization power of video gen models, making it adaptable to various diverse text prompts! 🌍 This co-generation approach also makes CoMoVi look like a human-centric World Action Model (WAM), simulating not just the visual world, but the physical state of human actions within it. arxiv: arxiv.org/abs/2601.10632 HF page: huggingface.co/papers/2601.10… Project page: igl-hkust.github.io/CoMoVi/ Code: github.com/IGL-HKUST/CoMo…

English

1

7

46

3.3K

Jiahao Lu retweetledi

Ruijie Zhu@zhruji101109·10 Nis

🎉🎉🎉 MotionCrafter was selected as a CVPR 2026 Highlight🔥 paper !

Ruijie Zhu@zhruji101109

🚀 Excited to share our latest work MotionCrafter! 🌟 The first Video Diffusion-based framework for joint geometry and motion estimation. 📄 Paper: arxiv.org/abs/2602.08961 🌐 Project page: ruijiezhu94.github.io/MotionCrafter_… #ComputerVision #3DVision #3DReconstruction #OpenSource #arXiv

English

2

3

23

1.8K

Jiahao Lu retweetledi

Ruijie Zhu@zhruji101109·12 Mar

🎥 Demo Video for MotionCrafter (CVPR 2026) How much do video diffusion models know about the 4D world? Watch the demo to find the answer👇 youtu.be/oc0fRoZTyk8 #CVPR2026 #ComputerVision #3DVision

YouTube

English

0

1

21

2.7K

Jiahao Lu@FFzzf08·12 Mar

Video diffusion model has strong priors about our physical world. It’s a good way to reconstruct 4D world with it.

Ruijie Zhu@zhruji101109

🎥 Demo Video for MotionCrafter (CVPR 2026) How much do video diffusion models know about the 4D world? Watch the demo to find the answer👇 youtu.be/oc0fRoZTyk8 #CVPR2026 #ComputerVision #3DVision

English

0

3

217

Jiahao Lu retweetledi

⚡AI Search⚡@aisearchio·8 Mar

What a crazy week in AI! 🚀 LTX 2.3 GPT 5.4 FireRed Edit 1.1 Kiwi Edit HY WU Qwen 3.5 small Cuda Agent CubeComposer Helios Spatial T2I Spectrum Utonia & more! Watch the full recap: youtu.be/KRE8JqTAEQk

YouTube

English

5

14

154

8K

Jiahao Lu retweetledi

Alexandre Morgand@Almorgand·9 Mar

"Track4World: Feedforward World‑centric Dense 3D Tracking of All Pixels" TL;DR: feed‑forward model that predicts pixel‑level 2D and 3D dense flows for holistic world‑centric 3D tracking from monocular video, outperforming prior flow and tracking baselines.

English

2

11

87

4.6K

Jiahao Lu@FFzzf08·7 Mar

Why Track4World? 1️⃣ Dense world-centric tracking 2️⃣ Supports DA3/Pi3/MoGe 3️⃣ Efficient 3D correlation 4️⃣ 2D-to-3D supervision bypasses 3D GT scarcity! #ComputerVision #3DTracking #SceneFlow #OpticalFlow

Yuan Liu@YuanLiu41955461

Excited to share Track4World, feedforward 3D tracking of all pixels in the world-centric coordinate system. Code has been released, and welcome to try it! Homepage: jiah-cloud.github.io/Track4World.gi… Code: github.com/TencentARC/Tra… Paper: arxiv.org/abs/2603.02573

English

0

3

7

1K

Jiahao Lu retweetledi

Yuan Liu@YuanLiu41955461·6 Mar

Excited to share Track4World, feedforward 3D tracking of all pixels in the world-centric coordinate system. Code has been released, and welcome to try it! Homepage: jiah-cloud.github.io/Track4World.gi… Code: github.com/TencentARC/Tra… Paper: arxiv.org/abs/2603.02573

English

4

44

266

18.3K

Jiahao Lu retweetledi

Chuanxia Zheng@ChuanxiaZ·13 Şub

How much do video diffusion models know about the 4D world? By introducing a 4D VAE, we jointly estimate geometry and motion from videos using a large-scale pretrained VDM. - paper: arxiv.org/pdf/2602.08961 - page: ruijiezhu94.github.io/MotionCrafter_… - code: github.com/TencentARC/Mot…

Ruijie Zhu@zhruji101109

🚀 Excited to share our latest work MotionCrafter! 🌟 The first Video Diffusion-based framework for joint geometry and motion estimation. 📄 Paper: arxiv.org/abs/2602.08961 🌐 Project page: ruijiezhu94.github.io/MotionCrafter_… #ComputerVision #3DVision #3DReconstruction #OpenSource #arXiv

English

0

16

96

10.8K

Jiahao Lu retweetledi

Ruijie Zhu@zhruji101109·22 Şub

#CVPR2026 Accepted by CVPR 2026🎉 Final rating: 6 5 5 Thanks to the reviewers and ACs for their recognition!

Ruijie Zhu@zhruji101109

🚀 Excited to share our latest work MotionCrafter! 🌟 The first Video Diffusion-based framework for joint geometry and motion estimation. 📄 Paper: arxiv.org/abs/2602.08961 🌐 Project page: ruijiezhu94.github.io/MotionCrafter_… #ComputerVision #3DVision #3DReconstruction #OpenSource #arXiv

English

1

4

54

7.3K

Jiahao Lu retweetledi

Wildminder@wildmindai·5 Mar

Track4World. Feedforward world-centric dense 3D tracking; - tracks every pixel in 3D. - 16-frame sequences in 3.4s with 14GB VRAM; - Depth Anything v3 as backbone. jiah-cloud.github.io/Track4World.gi…

English

4

25

204

18.9K

Jiahao Lu retweetledi

AIQUEST@AiquestAcademy·4 Mar

Track4World: what if you could track every single pixel's 3D movement in a video, accurately and instantly? this new model turns any regular video into a detailed 3D scene, figuring out teh precise 3D path of everything moving in the frame, fast. it's like rebuilding the entire world from a single clip! 🤯 code and demo are available.

English

1

2

142

Jiahao Lu retweetledi

AI Bites | YouTube Channel@ai_bites·16 Oca

CoMoVi, a co-generative framework that couples two video diffusion models (VDMs) to generate 3D human motions and videos synchronously within a single diffusion denoising loop. the generation of 3D human motions and 2D human videos is intrinsically coupled. 3D motions provide the structural prior for plausibility and consistency in videos, while pre-trained video models offer strong generalization capabilities for motions, which necessitate coupling their generation processes. CoMoVi is based on this. Paper Title: CoMoVi: Co-Generation of 3D Human Motions and Realistic Videos Project: igl-hkust.github.io/CoMoVi/ Link: arxiv.org/abs/2601.10632

English

0

1

4

131

Jiahao Lu retweetledi

Yuan Liu@YuanLiu41955461·13 Oca

Excited to share our recent work, UniSH, which unifies dynamic 3D scene reconstruction and SMPL estimation within a single framework. (Left-top is input video). Code has been released! github.com/murphylmf/UniSH Project page: murphylmf.github.io/UniSH/ Paper: arxiv.org/abs/2601.01222

English

4

43

322

19.2K

Jiahao Lu retweetledi

Ying Shan@yshan2u·31 Ara

🚀🚀We’re building a new Applied Research Team in Tencent IEG for Game AI, with a research culture similar to ARC Lab. This newly formed team focuses on research-driven Game AI, operating at the intersection of fundamental research and large-scale game environments. Our goal is to develop principled models that can understand, simulate, and act within complex virtual worlds—while remaining grounded enough to eventually shape real games. Our research directions include (but are not limited to): 🎮 Interactive & Dynamic World Modeling — learning, simulating, and reasoning about evolving game worlds 🤖 NPC World-to-Action Modeling — connecting world understanding to decision and action, with strong ties to Embodied AI and agent behavior 🌍 Game Scene Generation — generative modeling of diverse, controllable, and scalable game scenes We are looking for researchers with the following minimum qualifications: ✨ A recent Ph.D. in related fields ✨ 5+ top conference or journal papers ✨ 1000+ GitHub stars 🌟 Evidence of a “make it work” mindset We are also open to strong graduate students for intern positions. Feel free to DM me or contact: hanswen@tencent.com.

English

6

12

152

12.8K

Jiahao Lu retweetledi

Yuan Liu@YuanLiu41955461·26 Ara

Happy to share our new work, MVInverse, a feedforward framework for multiview PBR material estimation at ~10 fps. Multiview ViT (like VGGT, Pi3) can also do material estimation! Paper: arxiv.org/abs/2512.21003 Homepage: maddog241.github.io/mvinverse-page/ Code: github.com/Maddog241/mvin…

English

2

41

270

11.9K

Jiahao Lu retweetledi

Justin Ryan ᯅ@justinryanio·18 Ara

People are underestimating @Apple in AI. I just ran Apple’s new SHARP model locally and watched my photos turn into 3D Gaussian splats in seconds, then stepped inside them on Vision Pro. This feels like the beginning of something special. You really have to try it.

English

91

231

3K

255.2K

Jiahao Lu retweetledi

AI at Meta@AIatMeta·16 Ara

🔉 Introducing SAM Audio, the first unified model that isolates any sound from complex audio mixtures using text, visual, or span prompts. We’re sharing SAM Audio with the community, along with a perception encoder model, benchmarks and research papers, to empower others to explore new forms of expression and build applications that were previously out of reach. 🔗 Learn more: go.meta.me/568e5d

English

406

916

6.4K

1.2M

Jiahao Lu

Keşfet