Wei-Chiu Ma

472 posts

Wei-Chiu Ma

@weichiuma

Assistant Professor @Cornell @CornellCIS Prev: Postdoc @allen_ai @uwcse; PhD @MIT_CSAIL; Sr. Research Scientist @UberATG @Waabi_ai

เข้าร่วม Ağustos 2014

218 กำลังติดตาม2.3K ผู้ติดตาม

ทวีตที่ปักหมุด

Wei-Chiu Ma@weichiuma·22 Nis

I've been wanting to make 3D reconstructions not just realistic, but also **interactable** and **actionable** for years. Thanks to @XHongchi97338, we're now a step closer! Introducing DRAWER — a framework for the automatic construction of realistic, interactive digital twins.

Hongchi Xia@hongchix

Glad to introduce our #CVPR2025 paper "DRAWER", allowing one to create a realistic and interactable digital twin from a video of a static scene without any interactions with the environment. It unlocks many opportunities in gaming and robotics! Webpage: drawer-art.github.io

English

115

13K

Wei-Chiu Ma รีทวีตแล้ว

Jiawei Yang@JiaweiYang118·2d

Two months ago, I vaguely posted a number: 0.9 FID, one-step, pixel space. Now it is 0.75, and can be even lower. Many wonder how. I thought it might end as a small FID prank: simple and deliberate. It started with one question: can FID be optimized directly, and what does it reveal? Introducing FD-loss.

English

135

780

136.8K

Wei-Chiu Ma รีทวีตแล้ว

Yuntian Deng@yuntiandeng·11 Nis

Glad to see followups to neural-os.com, but disappointed that neither the blog (with 34 refs) nor the code repo acknowledged NeuralOS, even tho the released data code appears to build directly on top of ours. That omission is hard to understand given our shared vision.

Jürgen Schmidhuber@SchmidhuberAI

Neural Computers arxiv.org/abs/2604.06425

English

663

248.6K

Wei-Chiu Ma รีทวีตแล้ว

AK@_akhaliq·24 Nis

Seeing Fast and Slow Learning the Flow of Time in Videos paper: huggingface.co/papers/2604.21…

English

9.2K

Wei-Chiu Ma รีทวีตแล้ว

Gene Chou@gene_ch0u·22 Nis

Introducing CityRAG! We wanted video generative models to be grounded in the real world — if I’m in London, I want to look around and actually see Big Ben. CityRAG generates videos of cities featuring real buildings and roads, with arbitrary weather, people, and cars. 1/N page: cityrag.github.io paper: arxiv.org/abs/2604.19741

English

244

35.3K

Wei-Chiu Ma รีทวีตแล้ว

Yinghao Xu@YinghaoXu1·16 Nis

🎉 After one year of teamwork, we are excited to release our 3D foundation model — LingBot-Map! Unlike DA3/VGGT, LingBot-Map is a purely autoregressive model for streaming 3D reconstruction ⚡ It achieves ~20 FPS on 518×378 resolution over sequences exceeding 10,000 frames — and beyond 🚀 Two key insights behind LingBot-Map: 🔑 Keep SLAM's structural wisdom: build Geometric Context Attention with long-context modeling while maintaining a compact streaming state 🔑 Make everything end-to-end learnable — no optimization, no post-processing Let's check out our demos 👇

English

493

4.7K

1.4M

Wei-Chiu Ma รีทวีตแล้ว

Jitendra MALIK@JitendraMalikCV·7 Nis

In robotics manipulation, we see many cherry picked demos but no standardized benchmarks. I suggest using STT (Success weighted by normalized inverse Task Time), analogous to SPL from navigation, replacing length by time to do task relative to a human e.g. for "Pick up anything" on random household objects. arxiv.org/pdf/1807.06757

English

435

51K

Wei-Chiu Ma รีทวีตแล้ว

Songyou Peng@songyoupeng·27 Mar

I gave an award talk @3DVconf that might be interested to some people. I took a step back and shared a few personal stories from my 10-year journey, reflecting on the profound impact of people, luck (you need a lot!), grit, and the art of giving up. (1/2)

English

347

19.2K

Wei-Chiu Ma รีทวีตแล้ว

Hang Zhao@zhaohang0124·23 Mar

Our recent findings on World Action Models (WAMs): the core advantage of WAMs is not test-time “imagination” of futures, but the training-time supervision from future video prediction. We propose Fast-WAM, which makes inference simple, fast, and policy-centric.

English

819

155.8K

Wei-Chiu Ma รีทวีตแล้ว

Rundong Luo@LuoRundong0122·9 Mar

Working on synthetic data for computer vision? Submit to the 3rd Synthetic Data for Computer Vision Workshop at @CVPR 🚀 A reminder that submissions are open until March 12. If you have relevant work for ECCV, we’d also love to see it submitted here. 🏆 We will select a Best Paper Award 📝 Submissions will NOT be included in the CVPR proceedings, so there are no double-submission concerns Website: syndata4cv.github.io #CVPR2026 #ECCV #ComputerVision #SyntheticData

English

6.3K

Wei-Chiu Ma รีทวีตแล้ว

DENG Lab @ SJTU@SJTUDengLab·18 Oca

🚀 Introduce Think-Then-Generate (T2G): Transforming Qwen-Image into an open-source NanoBanana! Qwen-Image excels in text rendering, but in practice, we find it struggling with implicit and non-descriptive prompts (see figure). We identify that a closed-source LLM is necessary to rewrite prompts descriptively for the diffusion transformer (DiT) renderer. However, the disconnect between the LLM and DiT can lead to imperfections. 🔧 Our solution: T2G—Think First, Generate Second! T2G overcomes this by empowering the text encoder in Qwen-Image itself (i.e., Qwen2.5-VL) to think first, then generate with DiT, and introducing a multimodal GRPO (Dual-GRPO) strategy to enhance seamless, self-driven reasoning in Qwen2.5-VL. 🎨 Check out some results first: - Idiom Comics: Input “A multi-panel comic showing ‘playing the lute to a cow’” -> Not just images of cows and instruments, but a dynamic, narrative-driven comic with accurate evolution of the idiom’s context. - Math & Physics Teaching Example: Input “A math teacher explaining the equation 2x − 4 = 10 on the blackboard.” -> Not random elements, but a fully structured blackboard with clear steps and a teacher scene, accurately capturing the teaching process. 📖 Paper: arxiv.org/abs/2601.10332 💻 Github: github.com/SJTU-DENG-Lab/…

English

7.6K

Wei-Chiu Ma รีทวีตแล้ว

Rundong Luo@LuoRundong0122·5 Oca

📣 SynData4CV @ CVPR 2026 is recruiting reviewers! Join our workshop on how to generate, evaluate, and responsibly use synthetic data for CV (and beyond). ✔️ ~2–3 papers to review ✔️ Non-archival 👉 Sign up: docs.google.com/forms/d/e/1FAI…

English

906

Wei-Chiu Ma รีทวีตแล้ว

Luchao Qi@QiLuchao·24 Ara

🎉Thrilled to share our new work: Over++: Generative Video Compositing for Layer Interaction Effects! 🎉 We introduce a method for generating realistic interaction effects between foreground and background layers, with both mask-based and prompt-based control.

English

134

42.7K

Wei-Chiu Ma รีทวีตแล้ว

Arhan Jain@prodarhan·18 Ara

Excited to introduce PolaRiS, a real-to-sim recipe for turning short real-world videos into high fidelity simulation environments for scalable and reliable zeroshot generalist policy evaluation. polaris-evals.github.io (1/N 🧵)

English

236

64.9K

Wei-Chiu Ma รีทวีตแล้ว

Mengye Ren@mengyer·6 Ara

Should you use an LLM verifier to help sample solutions at test time? Our new study reveals some key insights. TL;DR: Choose a verifier from a **different** model family than the solver!

Jack Lu ✈️ ICLR 2026@Jacklu_me

Wondering how to get the most out of LLM test-time verification? New study: “When Does Verification Pay Off? A Closer Look at LLMs as Solution Verifiers". 🔍 37 models, 9 datasets 🔥 Self vs intra-family vs cross-family verification Result: verify across families! 🧵👇

English

6.3K

Wei-Chiu Ma รีทวีตแล้ว

Rundong Luo@LuoRundong0122·5 Ara

(1/7) 🌑✨ Can we create super cool art using objects in our daily lives? We show that combining GenAI with physics modeling unlocks a whole new creative space, where physical shadows of real-world objects complements line drawings! 🌐 Project page: red-fairy.github.io/ShadowDraw/

English

1.7K

Wei-Chiu Ma รีทวีตแล้ว

Jingkang Wang@wangjksjtu·4 Ara

Can we reconstruct dynamic driving scenes without any labels? Check our #NeurIPS2025 paper Flux4D, a flow-based, generalizable, unsupervised 4D reconstruction model that scales to real-world driving data! Website: waabi.ai/flux4d Arxiv: arxiv.org/abs/2512.03210 [1/n]

English

11.1K

Wei-Chiu Ma รีทวีตแล้ว

David Bau@davidbau·3 Ara

Despite the complexity of the inference process, diffusion models are remarkably controllable by individual SAE vectors. This poster is being presented at the #Neurips2025 Mexico City Satellite Conference. Meet @ViaSurkov in Mexico, or coauthor @wendlerch in San Diego!

Chris Wendler@wendlerch

I am very excited to share that our paper, "One-Step is Enough: Sparse Autoencoders for Text-to-Image Diffusion Models" will be presented at #NeurIPS2025! @ViaSurkov is presenting it at #MexIPS2025: 📍𝐈𝐟 𝐲𝐨𝐮 𝐚𝐫𝐞 𝐚𝐭𝐭𝐞𝐧𝐝𝐢𝐧𝐠 𝐍𝐞𝐮𝐫𝐈𝐏𝐒 𝐢𝐧 𝐌𝐞𝐱𝐢𝐜𝐨 𝐂𝐢𝐭𝐲, 𝐩𝐥𝐞𝐚𝐬𝐞 𝐬𝐭𝐨𝐩 𝐛𝐲! Date: Thursday, Dec 4, 2025 Time: 11:00 AM – 2:00 PM PST Location: Foyer (Mexico City Poster Session) Come visit @ViaSurkov it's his first conference and he will be happy to explain his amazing work. Sadly, #NeurIPS2025 does not allow for parallel presentation in San Diego. However, I am in San Diego and happy to meet up / chat. Please don't hesitate to reach out here or via ch.wendler@northeastern.edu. Once again, a big shout out to our brilliant students Viacheslav Surkov and Antonio Mari who did phenomenal work here and pushed this work (that started as a class project more than a year ago) all the way to pass the high threshold of #NeurIPS2025. Also, I want to thank manifund.org (@andyarditi and @ryan_kidd44 in particular) for helping us to finance Viacheslav Surkov's conference trip. Please find more information about our work below. We have so many amazing interactive materials (e.g., 3x huggingface demo spaces) for you to check out. Most of our implementations are open-sourced (RIEBench on FLUX, which we added to our appendix during the NeurIPS rebuttal is currently missing but we plan to add it ASAP). Me demoing the demo attached.

English

3.6K

Wei-Chiu Ma@weichiuma·4 Ara

A very interesting approach to tackle real-world tracking problems (where objects' states change)!

Bharath Hariharan@BharathHarihar3

Continuing with NeurIPS: On Friday (12/5, 11AM-2PM, #4703), @YihongSun_ will be presenting our #NeurIPS2025 work on tracking objects through state changes.

English

1.5K

Wei-Chiu Ma รีทวีตแล้ว

Georgia Gkioxari@georgiagkioxari·20 Kas

Some of the tech behind SAM 3D that I’m particularly excited about: 1⃣ Existing 3D datasets (Objaverse-XL, ProcTHOR, etc.) are great for teaching “3D priors” (basic shape & appearance). But they’re not enough to fully bridge the gap to the real world, where scenes are cluttered, objects are occluded, tiny, and generally messy. 2⃣Enter our model-in-the-loop 3D data engine: model ➜ predicts 3D from real images ➜ humans quickly vet good candidates (yes/no only) ➜ vetted 3D goes back into training ➜ improved model re-enters the loop. A virtuous cycle that boosts 3D annotation quality, labeling speed, and model performance, without requiring 3D tools or design expertise. 3⃣3D objectives are tricky: no closed-form differentiable loss fully captures “good 3Dness” (symmetries, smoothness, completeness). So we borrow from the LLM playbook and post-train with human preference data. This alignment hardly shows up in metrics (which inherit the same limitations as the losses) but it dramatically improves the perceived quality of the 3D outputs. More details in the paper.

Georgia Gkioxari@georgiagkioxari

3Dfy anything from a single image! Very thrilled to announce SAM 3D. From an input image, select any object you want, 3Dfy it! Blog: ai.meta.com/blog/sam-3d/ Demo: aidemos.meta.com/segment-anythi…

English

277

32.2K

Wei-Chiu Ma รีทวีตแล้ว

Fei-Fei Li@drfeifei·10 Kas

AI’s next frontier is Spatial Intelligence, a technology that will turn seeing into reasoning, perception into action, and imagination into creation. But what is it? Why does it matter? How do we build it? And how can we use it? Today, I want to share with you my thoughts on building and using world models to unlock spatial intelligence in this essay below. 1/n

English

281

727

3.5K

869.9K

ค้นพบ

@3DVconf @CVPR @ViaSurkov @wendlerch @elonmusk @BarackObama @taylorswift13 @cristiano