Wufei Ma

102 posts

Wufei Ma banner
Wufei Ma

Wufei Ma

@wufeima

PhD student at @CCVLatJHU @JHU | Prev intern: Amazon FAR, Google Research, Meta, MSRA, Megvii

Baltimore, MD Katılım Ağustos 2019
431 Takip Edilen162 Takipçiler
Sabitlenmiş Tweet
Wufei Ma
Wufei Ma@wufeima·
I will be at #NeurIPS2025 next week to present our SpatialReasoner. Looking forward to catching up with friends old and new! 🤠
Wufei Ma tweet media
English
1
2
9
895
Wufei Ma
Wufei Ma@wufeima·
@iScienceLuvr Depends on whether you thinking about image-/video-only tasks or multi-modal tasks
English
0
0
0
369
Tanishq Mathew Abraham, Ph.D.
Honestly I feel like image/video SSL doesn't receive enough attention, especially compared to LLMs, diffusion, etc.
English
7
1
106
12.2K
Wufei Ma
Wufei Ma@wufeima·
@giffmana The latency to cost centers was even longer—enough to charge all medium responses with high-comparable costs.
English
0
0
0
1.5K
Wufei Ma retweetledi
Tommie Kerssies
Tommie Kerssies@tommiekerssies·
World models are heavy. They don't need to be. Each frame is encoded as 1024 spatial tokens. What if it were just 1? In our #CVPR2026 Highlight from Amazon FAR, we compress frames into "delta" tokens for efficient generative world modeling. Paper, code & models below ↓ (1/7)
Tommie Kerssies tweet media
English
12
72
583
50.7K
Wufei Ma
Wufei Ma@wufeima·
@giffmana I only see five fingers. Finally the four-finger and six-finger problems are solved now. 🥳🙃
English
0
0
1
29
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
Ok I take back what i was saying about fingers the other day lol
Lucas Beyer (bl16) tweet media
English
13
4
164
29K
Shane Gu
Shane Gu@shaneguML·
In a recent chat with a Gemini VP regarding hiring philosophy, one trait he emphasized: the combination of low ego and high competence. We are no longer in an era defined by individual papers or claims of ownership. Success today requires a 'last mile' mindset—a relentless focus on doing whatever work is necessary to deliver world-class models. A team member who pairs high contribution with low ego simplifies and energizes the entire organization. In this hyper-competitive frontier, the delta between contribution and ego has become a key metric for identifying the talent that actually moves the needle.
English
38
73
975
148.9K
Wufei Ma retweetledi
Jianwen Xie
Jianwen Xie@jianwen_xie·
🔥We introduce SpatialReasoner, a novel large vision-language model (LVLM) that address 3D spatial reasoning with explicit 3D representations shared between stages -- 3D perception, computation, and reasoning. @NeurIPSConf 📄Paper: arxiv.org/pdf/2504.20024 🔗Project: spatial-reasoner.github.io 💻Code: github.com/johnson111788/… ✨ Highlights: ✅SpatialReasoner uses a two-stage training pipeline: supervised fine-tuning (to learn 3D perception and computation) + reinforcement learning (to build generalizable 3D reasoning). ✅~9.2% higher than Gemini 2.0 on 3DSRBench, and much better generalization to novel spatial question types. #NeurIPS2025 #GenerativeAI #VLM #Reasoning #SpatialIntelligence #AIResearch @NeurIPSConf @LambdaAPI @JHUCompSci @wufeima @CCVLatJHU
Jianwen Xie tweet mediaJianwen Xie tweet mediaJianwen Xie tweet media
English
0
5
9
794
Wufei Ma
Wufei Ma@wufeima·
I will be at #NeurIPS2025 next week to present our SpatialReasoner. Looking forward to catching up with friends old and new! 🤠
Wufei Ma tweet media
English
1
2
9
895
Wufei Ma
Wufei Ma@wufeima·
@hot_tamales32 @diyerxx So they intentionally ‘cheated’ in their data and still decided to publicly release it?
English
2
0
1
429
Lei Yang
Lei Yang@diyerxx·
Got burned by an Apple ICLR paper — it was withdrawn after my Public Comment. So here’s what happened. Earlier this month, a colleague shared an Apple paper on arXiv with me — it was also under review for ICLR 2026. The benchmark they proposed was perfectly aligned with a project we’re working on. I got excited after reading it. I immediately stopped my current tasks and started adapting our model to their benchmark. Pulled a whole weekend crunch session to finish the integration… only to find our model scoring absurdly low. I was really frustrated. I spent days debugging, checking everything — maybe I used it wrong, maybe there was a hidden bug. During this process, I actually found a critical bug in their official code: * When querying the VLM, it only passed in the image path string, not the image content itself. The most ridiculous part? After I fixed their bug, the model's scores got even lower! The results were so counterintuitive that I felt forced to do deeper validation. After multiple checks, the conclusion held: fixing the bug actually made the scores worse. At this point I decided to manually inspect the data. I sampled the first 20 questions our model got wrong, and I was shocked: * 6 out of 20 had clear GT errors. * The pattern suggested the “ground truth” was model-generated with extremely poor quality control, leading to tons of hallucinations. * Based on this quick sample, the GT error rate could be as high as 30%. I reported the data quality issue in a GitHub issue. After 6 days, the authors replied briefly and then immediately closed the issue. That annoyed me — I’d already wasted a ton of time, and I didn’t want others in the community to fall into the same trap — so I pushed back. Only then did they reopen the GitHub issue. Then I went back and checked the examples displayed in the paper itself. Even there, I found at least three clear GT errors. It’s hard to believe the authors were unaware of how bad the dataset quality was, especially when the paper claims all samples were reviewed by annotators. Yet even the examples printed in the paper contain blatant hallucinations and mistakes. When the ICLR reviews came out, I checked the five reviews for this paper. Not a single reviewer noticed the GT quality issues or the hallucinations in the paper's examples. So I started preparing a more detailed GT error analysis and wrote a Public Comment on OpenReview to inform the reviewers and the community about the data quality problems. The next day — the authors withdrew the paper and took down the GitHub repo. Fortunately, ICLR is an open conference with Public Comment. If this had been a closed-review venue, this kind of shoddy work would have been much harder to expose. So here’s a small call to the community: For any paper involving model-assisted dataset construction, reviewers should spend a few minutes checking a few samples manually. We need to prevent irresponsible work from slipping through and misleading everyone. Looking back, I should have suspected the dataset earlier based on two red flags: * The paper’s experiments claimed that GPT-5 has been surpassed by a bunch of small open-source models. * The original code, with a ridiculous bug, produced higher scores than the bug-fixed version. But because it was a paper from Big Tech, I subconsciously trusted the integrity and quality, which prevented me from spotting the problem sooner. This whole experience drained a lot of my time, energy, and emotion — especially because accusing others of bad data requires extra caution. I’m sharing this in hopes that the ML community remains vigilant and pushes back against this kind of sloppy, low-quality, and irresponsible behavior before it misleads people and wastes collective effort. #ICLR #ICLR2026 #NeurIPS #CVPR #openreview #MachineLearning #LLM #VLM
Lei Yang tweet media
English
53
212
2.5K
397K
Wufei Ma
Wufei Ma@wufeima·
@bremen79 Doesn't this violate the double-blind policy, since there's likely only one paper with a 10/2/2/0 score?
English
2
0
4
2.9K
Wufei Ma
Wufei Ma@wufeima·
Join us at #ICCV2025 for the 1st Embodied Spatial Reasoning Workshop! We're thrilled to host amazing speakers from industry and academia, featuring Sifei Liu, @xiaolonw, @xf1280, and @kate_saenko_, to discuss frontiers of spatial reasoning, embodied agents, and robotics! 🔗 tinyurl.com/yn7b6mu6
Wufei Ma tweet media
English
2
19
95
10K
Wufei Ma
Wufei Ma@wufeima·
@xwang_lk Having the same limit for junior and senior researchers doesn't make much sense. 3 first-author submissions from a junior researcher seem far more extreme than 15 submissions from a senior phd advisor.
English
0
0
0
58
Wufei Ma
Wufei Ma@wufeima·
@WenhuChen Meanwhile, some academic video understanding papers evaluate 16-frame models on videos of several minutes. 🤔
English
0
0
1
127
Wufei Ma
Wufei Ma@wufeima·
@taiyasaki Comforting young grad students after a bad review is part of the job. Comforting a whole associate professor having a meltdown on main? That’s new. Real tenure-track excellence for casually sprinkling in some racial undertones. 😹
English
0
0
26
3.8K