Wufei Ma

102 posts

Wufei Ma

@wufeima

PhD student at @CCVLatJHU @JHU | Prev intern: Amazon FAR, Google Research, Meta, MSRA, Megvii

Baltimore, MD Katılım Ağustos 2019

431 Takip Edilen162 Takipçiler

Sabitlenmiş Tweet

Wufei Ma@wufeima·30 Kas

I will be at #NeurIPS2025 next week to present our SpatialReasoner. Looking forward to catching up with friends old and new! 🤠

English

895

Wufei Ma@wufeima·5d

@iScienceLuvr Depends on whether you thinking about image-/video-only tasks or multi-modal tasks

English

369

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·5d

Honestly I feel like image/video SSL doesn't receive enough attention, especially compared to LLMs, diffusion, etc.

English

106

12.2K

Wufei Ma@wufeima·24 Nis

@giffmana The latency to cost centers was even longer—enough to charge all medium responses with high-comparable costs.

English

1.5K

Lucas Beyer (bl16)@giffmana·24 Nis

bruh

Boris Cherny@bcherny

We take these reports incredibly seriously. In my time on the team, this has probably been the most complex investigation we’ve had. The root causes were not obvious, and there were many confounders.

English

872

108K

Wufei Ma retweetledi

Tommie Kerssies@tommiekerssies·9 Nis

World models are heavy. They don't need to be. Each frame is encoded as 1024 spatial tokens. What if it were just 1? In our #CVPR2026 Highlight from Amazon FAR, we compress frames into "delta" tokens for efficient generative world modeling. Paper, code & models below ↓ (1/7)

English

583

50.7K

Wufei Ma@wufeima·20 Mar

@giffmana I only see five fingers. Finally the four-finger and six-finger problems are solved now. 🥳🙃

English

Lucas Beyer (bl16)@giffmana·20 Mar

Ok I take back what i was saying about fingers the other day lol

English

164

29K

Wufei Ma@wufeima·25 Oca

@shaneguML Translated into Chinese: 🐮🐎

English

Shane Gu@shaneguML·24 Oca

In a recent chat with a Gemini VP regarding hiring philosophy, one trait he emphasized: the combination of low ego and high competence. We are no longer in an era defined by individual papers or claims of ownership. Success today requires a 'last mile' mindset—a relentless focus on doing whatever work is necessary to deliver world-class models. A team member who pairs high contribution with low ego simplifies and energizes the entire organization. In this hyper-competitive frontier, the delta between contribution and ego has become a key metric for identifying the talent that actually moves the needle.

English

975

148.9K

Wufei Ma retweetledi

Jianwen Xie@jianwen_xie·4 Ara

🔥We introduce SpatialReasoner, a novel large vision-language model (LVLM) that address 3D spatial reasoning with explicit 3D representations shared between stages -- 3D perception, computation, and reasoning. @NeurIPSConf 📄Paper: arxiv.org/pdf/2504.20024 🔗Project: spatial-reasoner.github.io 💻Code: github.com/johnson111788/… ✨ Highlights: ✅SpatialReasoner uses a two-stage training pipeline: supervised fine-tuning (to learn 3D perception and computation) + reinforcement learning (to build generalizable 3D reasoning). ✅~9.2% higher than Gemini 2.0 on 3DSRBench, and much better generalization to novel spatial question types. #NeurIPS2025 #GenerativeAI #VLM #Reasoning #SpatialIntelligence #AIResearch @NeurIPSConf @LambdaAPI @JHUCompSci @wufeima @CCVLatJHU

English

794

Wufei Ma@wufeima·30 Kas

🔗 spatial-reasoner.github.io

QME

Wufei Ma@wufeima·30 Kas

I will be at #NeurIPS2025 next week to present our SpatialReasoner. Looking forward to catching up with friends old and new! 🤠

English

895

Wufei Ma@wufeima·28 Kas

@hot_tamales32 @diyerxx So they intentionally ‘cheated’ in their data and still decided to publicly release it?

English

429

Lei Yang@diyerxx·27 Kas

Got burned by an Apple ICLR paper — it was withdrawn after my Public Comment. So here’s what happened. Earlier this month, a colleague shared an Apple paper on arXiv with me — it was also under review for ICLR 2026. The benchmark they proposed was perfectly aligned with a project we’re working on. I got excited after reading it. I immediately stopped my current tasks and started adapting our model to their benchmark. Pulled a whole weekend crunch session to finish the integration… only to find our model scoring absurdly low. I was really frustrated. I spent days debugging, checking everything — maybe I used it wrong, maybe there was a hidden bug. During this process, I actually found a critical bug in their official code: * When querying the VLM, it only passed in the image path string, not the image content itself. The most ridiculous part? After I fixed their bug, the model's scores got even lower! The results were so counterintuitive that I felt forced to do deeper validation. After multiple checks, the conclusion held: fixing the bug actually made the scores worse. At this point I decided to manually inspect the data. I sampled the first 20 questions our model got wrong, and I was shocked: * 6 out of 20 had clear GT errors. * The pattern suggested the “ground truth” was model-generated with extremely poor quality control, leading to tons of hallucinations. * Based on this quick sample, the GT error rate could be as high as 30%. I reported the data quality issue in a GitHub issue. After 6 days, the authors replied briefly and then immediately closed the issue. That annoyed me — I’d already wasted a ton of time, and I didn’t want others in the community to fall into the same trap — so I pushed back. Only then did they reopen the GitHub issue. Then I went back and checked the examples displayed in the paper itself. Even there, I found at least three clear GT errors. It’s hard to believe the authors were unaware of how bad the dataset quality was, especially when the paper claims all samples were reviewed by annotators. Yet even the examples printed in the paper contain blatant hallucinations and mistakes. When the ICLR reviews came out, I checked the five reviews for this paper. Not a single reviewer noticed the GT quality issues or the hallucinations in the paper's examples. So I started preparing a more detailed GT error analysis and wrote a Public Comment on OpenReview to inform the reviewers and the community about the data quality problems. The next day — the authors withdrew the paper and took down the GitHub repo. Fortunately, ICLR is an open conference with Public Comment. If this had been a closed-review venue, this kind of shoddy work would have been much harder to expose. So here’s a small call to the community: For any paper involving model-assisted dataset construction, reviewers should spend a few minutes checking a few samples manually. We need to prevent irresponsible work from slipping through and misleading everyone. Looking back, I should have suspected the dataset earlier based on two red flags: * The paper’s experiments claimed that GPT-5 has been surpassed by a bunch of small open-source models. * The original code, with a ridiculous bug, produced higher scores than the bug-fixed version. But because it was a paper from Big Tech, I subconsciously trusted the integrity and quality, which prevented me from spotting the problem sooner. This whole experience drained a lot of my time, energy, and emotion — especially because accusing others of bad data requires extra caution. I’m sharing this in hopes that the ML community remains vigilant and pushes back against this kind of sloppy, low-quality, and irresponsible behavior before it misleads people and wastes collective effort. #ICLR #ICLR2026 #NeurIPS #CVPR #openreview #MachineLearning #LLM #VLM

English

212

2.5K

397K

Wufei Ma@wufeima·12 Kas

@bremen79 Doesn't this violate the double-blind policy, since there's likely only one paper with a 10/2/2/0 score?

English

2.9K

Francesco Orabona@bremen79·12 Kas

We got 10,0,2,2 for a paper. This will be "fun".............

Francesco Orabona@bremen79

ICLR reviews are out, probably by paper id. Good luck arguing with the reviewers 😅

English

257

71.9K

Wufei Ma@wufeima·7 Eki

Join us at #ICCV2025 for the 1st Embodied Spatial Reasoning Workshop! We're thrilled to host amazing speakers from industry and academia, featuring Sifei Liu, @xiaolonw, @xf1280, and @kate_saenko_, to discuss frontiers of spatial reasoning, embodied agents, and robotics! 🔗 tinyurl.com/yn7b6mu6

English

10K

Wufei Ma retweetledi

JHU Computer Science@JHUCompSci·16 Eki

“3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark” by @wufeima, @johnson111788, @jieneng_chen, @YuilleAlan, @Celso_M_de_Melo, and more presents the first comprehensive 3D spatial reasoning benchmark: 3dsrbench.github.io (6/10)

English

128

Wufei Ma@wufeima·7 Eki

Tagging for visibility 🙏 @ICCVConference @CCVLatJHU @JHUCompSci @HopkinsDSAI @JohnsHopkins @AdamKortylewski

English

251

Wufei Ma@wufeima·21 Eyl

@jon_barron Win-win-win for vim users 😎

English

Wufei Ma@wufeima·17 Ağu

@xwang_lk Having the same limit for junior and senior researchers doesn't make much sense. 3 first-author submissions from a junior researcher seem far more extreme than 15 submissions from a senior phd advisor.

English

Xin Eric Wang (hiring postdoc)@xwang_lk·16 Ağu

Shall we limit the number of papers any author can submit to a conference?

English

4.4K

Wufei Ma@wufeima·28 Tem

@WenhuChen Meanwhile, some academic video understanding papers evaluate 16-frame models on videos of several minutes. 🤔

English

127

Wenhu Chen@WenhuChen·27 Tem

It's interesting that they claimed that they can do unlimited context to process 1 hour videos. But their evaluation is on a bunch of benchmarks with 10s videos.

Shawn Shen@shawnshenjx

We’re setting new benchmarks. Video understanding with ultra-low hallucinations on an unlimited context window. Why does that matter? Even Gemini, the best LLM for video, maxes out at 1 hour. Our context window is virtually unlimited. Yes, you read that right.

English

9.1K

Wufei Ma@wufeima·24 Tem

@taiyasaki Comforting young grad students after a bad review is part of the job. Comforting a whole associate professor having a meltdown on main? That’s new. Real tenure-track excellence for casually sprinkling in some racial undertones. 😹

English

3.8K

Keşfet

@iScienceLuvr @giffmana @shaneguML @NeurIPSConf @LambdaAPI @JHUCompSci @CCVLatJHU @hot_tamales32