Wei-Chiu Ma

472 posts

Wei-Chiu Ma

Wei-Chiu Ma

@weichiuma

Assistant Professor @Cornell @CornellCIS Prev: Postdoc @allen_ai @uwcse; PhD @MIT_CSAIL; Sr. Research Scientist @UberATG @Waabi_ai

เข้าร่วม Ağustos 2014
218 กำลังติดตาม2.3K ผู้ติดตาม
ทวีตที่ปักหมุด
Wei-Chiu Ma
Wei-Chiu Ma@weichiuma·
I've been wanting to make 3D reconstructions not just realistic, but also **interactable** and **actionable** for years. Thanks to @XHongchi97338, we're now a step closer! Introducing DRAWER — a framework for the automatic construction of realistic, interactive digital twins.
Hongchi Xia@hongchix

Glad to introduce our #CVPR2025 paper "DRAWER", allowing one to create a realistic and interactable digital twin from a video of a static scene without any interactions with the environment. It unlocks many opportunities in gaming and robotics! Webpage: drawer-art.github.io

English
1
21
115
13K
Wei-Chiu Ma รีทวีตแล้ว
Jiawei Yang
Jiawei Yang@JiaweiYang118·
Two months ago, I vaguely posted a number: 0.9 FID, one-step, pixel space. Now it is 0.75, and can be even lower. Many wonder how. I thought it might end as a small FID prank: simple and deliberate. It started with one question: can FID be optimized directly, and what does it reveal? Introducing FD-loss.
Jiawei Yang tweet media
English
48
135
780
136.8K
Wei-Chiu Ma รีทวีตแล้ว
AK
AK@_akhaliq·
Seeing Fast and Slow Learning the Flow of Time in Videos paper: huggingface.co/papers/2604.21…
English
1
15
66
9.2K
Wei-Chiu Ma รีทวีตแล้ว
Gene Chou
Gene Chou@gene_ch0u·
Introducing CityRAG! We wanted video generative models to be grounded in the real world — if I’m in London, I want to look around and actually see Big Ben. CityRAG generates videos of cities featuring real buildings and roads, with arbitrary weather, people, and cars. 1/N page: cityrag.github.io paper: arxiv.org/abs/2604.19741
English
6
49
244
35.3K
Wei-Chiu Ma รีทวีตแล้ว
Yinghao Xu
Yinghao Xu@YinghaoXu1·
🎉 After one year of teamwork, we are excited to release our 3D foundation model — LingBot-Map! Unlike DA3/VGGT, LingBot-Map is a purely autoregressive model for streaming 3D reconstruction ⚡ It achieves ~20 FPS on 518×378 resolution over sequences exceeding 10,000 frames — and beyond 🚀 Two key insights behind LingBot-Map: 🔑 Keep SLAM's structural wisdom: build Geometric Context Attention with long-context modeling while maintaining a compact streaming state 🔑 Make everything end-to-end learnable — no optimization, no post-processing Let's check out our demos 👇
English
96
493
4.7K
1.4M
Wei-Chiu Ma รีทวีตแล้ว
Jitendra MALIK
Jitendra MALIK@JitendraMalikCV·
In robotics manipulation, we see many cherry picked demos but no standardized benchmarks. I suggest using STT (Success weighted by normalized inverse Task Time), analogous to SPL from navigation, replacing length by time to do task relative to a human e.g. for "Pick up anything" on random household objects. arxiv.org/pdf/1807.06757
Jitendra MALIK tweet media
English
11
32
435
51K
Wei-Chiu Ma รีทวีตแล้ว
Songyou Peng
Songyou Peng@songyoupeng·
I gave an award talk @3DVconf that might be interested to some people. I took a step back and shared a few personal stories from my 10-year journey, reflecting on the profound impact of people, luck (you need a lot!), grit, and the art of giving up. (1/2)
Songyou Peng tweet media
English
9
44
347
19.2K
Wei-Chiu Ma รีทวีตแล้ว
Hang Zhao
Hang Zhao@zhaohang0124·
Our recent findings on World Action Models (WAMs): the core advantage of WAMs is not test-time “imagination” of futures, but the training-time supervision from future video prediction. We propose Fast-WAM, which makes inference simple, fast, and policy-centric.
Hang Zhao tweet media
English
10
87
819
155.8K
Wei-Chiu Ma รีทวีตแล้ว
Rundong Luo
Rundong Luo@LuoRundong0122·
Working on synthetic data for computer vision? Submit to the 3rd Synthetic Data for Computer Vision Workshop at @CVPR 🚀 A reminder that submissions are open until March 12. If you have relevant work for ECCV, we’d also love to see it submitted here. 🏆 We will select a Best Paper Award 📝 Submissions will NOT be included in the CVPR proceedings, so there are no double-submission concerns Website: syndata4cv.github.io #CVPR2026 #ECCV #ComputerVision #SyntheticData
English
0
10
38
6.3K
Wei-Chiu Ma รีทวีตแล้ว
DENG Lab @ SJTU
DENG Lab @ SJTU@SJTUDengLab·
🚀 Introduce Think-Then-Generate (T2G): Transforming Qwen-Image into an open-source NanoBanana! Qwen-Image excels in text rendering, but in practice, we find it struggling with implicit and non-descriptive prompts (see figure). We identify that a closed-source LLM is necessary to rewrite prompts descriptively for the diffusion transformer (DiT) renderer. However, the disconnect between the LLM and DiT can lead to imperfections. 🔧 Our solution: T2G—Think First, Generate Second! T2G overcomes this by empowering the text encoder in Qwen-Image itself (i.e., Qwen2.5-VL) to think first, then generate with DiT, and introducing a multimodal GRPO (Dual-GRPO) strategy to enhance seamless, self-driven reasoning in Qwen2.5-VL. 🎨 Check out some results first: - Idiom Comics: Input “A multi-panel comic showing ‘playing the lute to a cow’” -> Not just images of cows and instruments, but a dynamic, narrative-driven comic with accurate evolution of the idiom’s context. - Math & Physics Teaching Example: Input “A math teacher explaining the equation 2x − 4 = 10 on the blackboard.” -> Not random elements, but a fully structured blackboard with clear steps and a teacher scene, accurately capturing the teaching process. 📖 Paper: arxiv.org/abs/2601.10332 💻 Github: github.com/SJTU-DENG-Lab/…
DENG Lab @ SJTU tweet mediaDENG Lab @ SJTU tweet media
English
1
9
29
7.6K
Wei-Chiu Ma รีทวีตแล้ว
Rundong Luo
Rundong Luo@LuoRundong0122·
📣 SynData4CV @ CVPR 2026 is recruiting reviewers! Join our workshop on how to generate, evaluate, and responsibly use synthetic data for CV (and beyond). ✔️ ~2–3 papers to review ✔️ Non-archival 👉 Sign up: docs.google.com/forms/d/e/1FAI…
English
0
2
4
906
Wei-Chiu Ma รีทวีตแล้ว
Luchao Qi
Luchao Qi@QiLuchao·
🎉Thrilled to share our new work: Over++: Generative Video Compositing for Layer Interaction Effects! 🎉 We introduce a method for generating realistic interaction effects between foreground and background layers, with both mask-based and prompt-based control.
English
2
10
134
42.7K
Wei-Chiu Ma รีทวีตแล้ว
Arhan Jain
Arhan Jain@prodarhan·
Excited to introduce PolaRiS, a real-to-sim recipe for turning short real-world videos into high fidelity simulation environments for scalable and reliable zeroshot generalist policy evaluation. polaris-evals.github.io (1/N 🧵)
English
8
48
236
64.9K
Wei-Chiu Ma รีทวีตแล้ว
Mengye Ren
Mengye Ren@mengyer·
Should you use an LLM verifier to help sample solutions at test time? Our new study reveals some key insights. TL;DR: Choose a verifier from a **different** model family than the solver!
Jack Lu ✈️ ICLR 2026@Jacklu_me

Wondering how to get the most out of LLM test-time verification? New study: “When Does Verification Pay Off? A Closer Look at LLMs as Solution Verifiers". 🔍 37 models, 9 datasets 🔥 Self vs intra-family vs cross-family verification Result: verify across families! 🧵👇

English
3
6
30
6.3K
Wei-Chiu Ma รีทวีตแล้ว
Rundong Luo
Rundong Luo@LuoRundong0122·
(1/7) 🌑✨ Can we create super cool art using objects in our daily lives? We show that combining GenAI with physics modeling unlocks a whole new creative space, where physical shadows of real-world objects complements line drawings! 🌐 Project page: red-fairy.github.io/ShadowDraw/
English
1
4
15
1.7K
Wei-Chiu Ma รีทวีตแล้ว
Jingkang Wang
Jingkang Wang@wangjksjtu·
Can we reconstruct dynamic driving scenes without any labels? Check our #NeurIPS2025 paper Flux4D, a flow-based, generalizable, unsupervised 4D reconstruction model that scales to real-world driving data! Website: waabi.ai/flux4d Arxiv: arxiv.org/abs/2512.03210 [1/n]
English
1
13
53
11.1K
Wei-Chiu Ma รีทวีตแล้ว
David Bau
David Bau@davidbau·
Despite the complexity of the inference process, diffusion models are remarkably controllable by individual SAE vectors. This poster is being presented at the #Neurips2025 Mexico City Satellite Conference. Meet @ViaSurkov in Mexico, or coauthor @wendlerch in San Diego!
Chris Wendler@wendlerch

I am very excited to share that our paper, "One-Step is Enough: Sparse Autoencoders for Text-to-Image Diffusion Models" will be presented at #NeurIPS2025! @ViaSurkov is presenting it at #MexIPS2025: 📍𝐈𝐟 𝐲𝐨𝐮 𝐚𝐫𝐞 𝐚𝐭𝐭𝐞𝐧𝐝𝐢𝐧𝐠 𝐍𝐞𝐮𝐫𝐈𝐏𝐒 𝐢𝐧 𝐌𝐞𝐱𝐢𝐜𝐨 𝐂𝐢𝐭𝐲, 𝐩𝐥𝐞𝐚𝐬𝐞 𝐬𝐭𝐨𝐩 𝐛𝐲! Date: Thursday, Dec 4, 2025 Time: 11:00 AM – 2:00 PM PST Location: Foyer (Mexico City Poster Session) Come visit @ViaSurkov it's his first conference and he will be happy to explain his amazing work. Sadly, #NeurIPS2025 does not allow for parallel presentation in San Diego. However, I am in San Diego and happy to meet up / chat. Please don't hesitate to reach out here or via ch.wendler@northeastern.edu. Once again, a big shout out to our brilliant students Viacheslav Surkov and Antonio Mari who did phenomenal work here and pushed this work (that started as a class project more than a year ago) all the way to pass the high threshold of #NeurIPS2025. Also, I want to thank manifund.org (@andyarditi and @ryan_kidd44 in particular) for helping us to finance Viacheslav Surkov's conference trip. Please find more information about our work below. We have so many amazing interactive materials (e.g., 3x huggingface demo spaces) for you to check out. Most of our implementations are open-sourced (RIEBench on FLUX, which we added to our appendix during the NeurIPS rebuttal is currently missing but we plan to add it ASAP). Me demoing the demo attached.

English
2
4
26
3.6K
Wei-Chiu Ma รีทวีตแล้ว
Georgia Gkioxari
Georgia Gkioxari@georgiagkioxari·
Some of the tech behind SAM 3D that I’m particularly excited about: 1⃣ Existing 3D datasets (Objaverse-XL, ProcTHOR, etc.) are great for teaching “3D priors” (basic shape & appearance). But they’re not enough to fully bridge the gap to the real world, where scenes are cluttered, objects are occluded, tiny, and generally messy. 2⃣Enter our model-in-the-loop 3D data engine: model ➜ predicts 3D from real images ➜ humans quickly vet good candidates (yes/no only) ➜ vetted 3D goes back into training ➜ improved model re-enters the loop. A virtuous cycle that boosts 3D annotation quality, labeling speed, and model performance, without requiring 3D tools or design expertise. 3⃣3D objectives are tricky: no closed-form differentiable loss fully captures “good 3Dness” (symmetries, smoothness, completeness). So we borrow from the LLM playbook and post-train with human preference data. This alignment hardly shows up in metrics (which inherit the same limitations as the losses) but it dramatically improves the perceived quality of the 3D outputs. More details in the paper.
Georgia Gkioxari@georgiagkioxari

3Dfy anything from a single image! Very thrilled to announce SAM 3D. From an input image, select any object you want, 3Dfy it! Blog: ai.meta.com/blog/sam-3d/ Demo: aidemos.meta.com/segment-anythi…

English
11
27
277
32.2K
Wei-Chiu Ma รีทวีตแล้ว
Fei-Fei Li
Fei-Fei Li@drfeifei·
AI’s next frontier is Spatial Intelligence, a technology that will turn seeing into reasoning, perception into action, and imagination into creation. But what is it? Why does it matter? How do we build it? And how can we use it? Today, I want to share with you my thoughts on building and using world models to unlock spatial intelligence in this essay below. 1/n
English
281
727
3.5K
869.9K