Mu Cai

359 posts

Mu Cai

@MuCai7

Research @thinkymachines | Previous: multimodal, agents @GoogleDeepMind

Mountain View Katılım Mayıs 2019

1.5K Takip Edilen3.4K Takipçiler

Mu Cai retweetledi

Harris Zhang@HyperStorm9682·1d

🚨 Your Embedding Model is SMARTer Than You Think! Single-vector models actually hide powerful multi-vector capabilities in their frozen hidden states. We introduce SMART, a framework that unlocks this ability for SoTA multimodal retrieval. 🧵👇 🔗 huggingface.co/papers/2605.24…

English

15.5K

Mu Cai retweetledi

Mira Murati@miramurati·19 May

Collaborative AI runs on interactivity: machines and people, working in real time, across every modality. Solving it takes a community, join us.

Thinking Machines@thinkymachines

We are offering grants of $100,000 + Tinker credits to researchers advancing the field of human-AI interactivity. Submit your proposals by June 19th! thinkingmachines.ai/news/interacti…

English

114

1.5K

244.3K

Mu Cai@MuCai7·20 May

Wow, always high quality papers from Xueyan and Yuheng, could be a good measure for video generation!

Xueyan Zou@xyz2maureen

🔥Excited to share the first released work from our IEI lab! Congrats to @AnteaWu 🎉 This work is motivated by the lack of quantitative evaluation for physics alignment in video world models. With tools like MegaSam and CoTracker, we can directly reconstruct dynamic 3D scenes, enabling quantitative evaluation of physical alignment. Both code and data are released — feel free to try it out! It should work, but if it doesn’t, contact @AnteaWu directly : )

English

3.9K

Mu Cai@MuCai7·19 May

Call for high quality realtime video/audio full duplex evals! The whole field needs them! Come submit here!

Thinking Machines@thinkymachines

We are offering grants of $100,000 + Tinker credits to researchers advancing the field of human-AI interactivity. Submit your proposals by June 19th! thinkingmachines.ai/news/interacti…

English

1.9K

Mu Cai retweetledi

Thinking Machines@thinkymachines·19 May

We are offering grants of $100,000 + Tinker credits to researchers advancing the field of human-AI interactivity. Submit your proposals by June 19th! thinkingmachines.ai/news/interacti…

English

194

1.6K

569.5K

Mu Cai@MuCai7·19 May

@atasteoff Big congratulations!

English

179

Shilong Liu@atasteoff·18 May

Career Update: I will join the Department of Electrical Engineering at Columbia University as a tenure-track Assistant Professor, starting in Fall 2027. My research will focus primarily on computer vision, self-evolving agents, and world models for embodied AI. I will be recruiting PhD students for Fall 2027. Motivated research interns, visiting students, and collaborators are also very welcome to reach out. More information: lsl.zone

English

586

55.1K

Mu Cai retweetledi

Rowan Zellers@rown·11 May

We are so back!

Thinking Machines@thinkymachines

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. thinkingmachines.ai/blog/interacti…

English

547

52.7K

Mu Cai@MuCai7·12 May

@yong_jae_lee @yu_zhuoran32720 Congrats, Zhuoran!

Català

258

Yong Jae Lee@yong_jae_lee·12 May

Great to be back in Madison last weekend for @yu_zhuoran32720’s PhD graduation! Zhuoran is my 9th PhD student and did really cool work on how data is processed in multimodal models + better ways to use synthetic & unlabeled data. Congrats again Zhuoran - see you in SF soon :)

English

5.6K

Mu Cai@MuCai7·12 May

My first share since joining @thinkymachines. Fun working with this team on real-time multimodal interaction. Vision in turn-based models felt like flipping through photos — continuous video is a different problem. Visual proactivity is essential — grateful to have worked on this alongside @liliyu_lili, @rown , and the rest of the team!

Thinking Machines@thinkymachines

English

159

10.5K

Mu Cai retweetledi

Thinking Machines@thinkymachines·11 May

English

460

1.9K

15.7K

7.6M

Mu Cai@MuCai7·7 May

@Mononofu @elonmusk @SpaceX Congrats Julian!

Español

121

Julian Schrittwieser@Mononofu·7 May

Very excited to be partnering with @elonmusk @SpaceX! Visionary engineering + Claude is going to be awesome, scaling is continuing for a long time!

Claude@claudeai

We’ve agreed to a partnership with @SpaceX that will substantially increase our compute capacity. This, along with our other recent compute deals, means that we’ve been able to increase our usage limits for Claude Code and the Claude API.

English

1.6K

146.9K

Mu Cai@MuCai7·30 Nis

@Yihe__Deng All the best!

English

270

Yihe Deng@Yihe__Deng·30 Nis

Last day at xAI. For a new grad, the past six months have been an irreplaceable experience. I feel fortunate to have made the decision to join this journey with xAI, and grateful for how much I was able to learn here in such a short, dense period of time. I'm proud of what we built, and what the multimodal team continues to build. I have deep faith in this team. No matter where I go next, I'll always look forward to seeing what my friends here pull off and bring into the world. I'm especially grateful to my captains along the way -- people I look up to, trust deeply, and who placed trust in my potential. I truly appreciate all the friends I met here, and the time we spent building together. And thanks xAI for the opportunity, and for giving me the space to learn, contribute, and grow. In the end, the greatest treasure is indeed the journey itself: the problems worth solving, and the people worth building with. Now, it's time to step into the uncertainty of what comes next.

English

549

36K

Mu Cai retweetledi

Logan Kilpatrick@OfficialLoganK·2 Nis

Introducing Gemma 4, our series of open weight (Apache 2.0 licensed) models, which are byte for byte the most capable open models in the world! Gemma 4 is build to run on your hardware: phones, laptops, and desktops. Frontier intelligence with a 26B MOE and a 31B Dense model!

English

287

593

6.2K

524.8K

Mu Cai@MuCai7·29 Mar

@CatGodSandHive Exactly! And this is why we think computer vision community has ignored this important direction: multiscale upon pixel space!

English

CatGod@CatGodSandHive·28 Mar

@MuCai7 So you're saying multiscale on pixels works better than on features? That's a plot twist, catnip for my curiosity, am I dreaming?

English

138

Mu Cai@MuCai7·28 Mar

🤯 Upgrade your pretrained visual encoder with <10 lines of code. This is what vision researchers have ignored: Can you imagine multiscale upon pixel space can work so well?! Remember, we are not doing multiscale upon feature space! 🏠Project Page: MuRF-VFM.github.io 📷 Paper: arxiv.org/abs/2603.25744 Get uniform improvements upon MLLM, Seg, Depth with similar computation cost.

Bocheng Zou@bochengzou

🔥 Upgrade your frozen vision encoders with <10 lines of code! Single-scale inference throws away vital details. Enter MuRF 🚀: a simple, training-free plug-in for instant, massive gains in MLLMs, Seg & Depth. 🤯 1/6

English

158

19.1K

Mu Cai@MuCai7·29 Mar

Good question, we have efficiency analysis in the paper! And it is straight forward: For MLLM: MuRF holds the same number of tokens as as single scale due to its design, leading to the same computation cost in LLM part. Empirically, we observed that MuRF achieves similar VRAM usuage, training and inference time compared to the single resolution for MLLM. The whole thing happens since visual encoder is much smaller than LLM!

English

JJJYmmm@JJJYmmm2002·28 Mar

@MuCai7 any flops analysis? 🧐

English

160

Mu Cai@MuCai7·29 Mar

Hi Thomas, thanks for the comment! Huge fan of S² and learned upsamplers like AnyUp! 🤝 While we share the goal of multi-scale representation, MuRF takes a fundamentally different path. TL;DR: We show that simply resizing the whole image (no tiling!) and fusing features creates a universally stronger representation without any learned upsampling heuristics. Here is the deeper dive into why we are different: 1️⃣ Motivation & Token Budget: We asked: Does higher resolution always mean better features? Surprisingly, no! Low-res provides crucial global context that actually improves high-res performance. For MLLMs, we lift the performance ceiling by a large margin while keeping the exact same number of visual tokens! 2️⃣ Approach (No Tiling, No Bells & Whistles): Unlike S², which cuts images into independent patches (breaking spatial layout and object continuity), we process the entire image at different scales. No complex layout engineering. As for AnyUp, learned upsamplers are great, but our parameter-free bilinear upsampling requires zero training. This guarantees extreme simplicity, maximum flexibility, and prevents generalizability issues. 3️⃣ Universal Application: We aren't just optimizing MLLM token budgets. MuRF is a fundamental, training-free enhancement for visual representations—generalizing flawlessly out-of-the-box across high-level reasoning (MLLMs), dense geometry (Seg/Depth), and even unsupervised anomaly detection. We believe this simple, holistic multi-scale synergy is a highly promising direction. Let's push toward better visual representations together! 🚀

English

Thomas Wimmer@wimmer_th·28 Mar

@MuCai7 github.com/bfshi/scaling_… Isn't that pretty much what Shi et al. did in ECCV 2024? You're upsampling bilinearly (why not use a feature-agnostic learned upsampler like AnyUp?) instead of downsampling before aggregation but that's about it, on first glance?

English

432

Mu Cai@MuCai7·28 Mar

Huge congrats to @bochengzou, who began working on this two years ago and made this magical technique happen!

English

617

Mu Cai retweetledi

Bocheng Zou@bochengzou·28 Mar

English

147

28.3K

Mu Cai@MuCai7·28 Mar

@shenbokui Congratulation William!

English

William Shen@shenbokui·23 Mar

UNI-1 is intelligent, directable, cultured. Incredible range it can do. Incredibly proud of the world-class team building a world-class model. It’s a daunting task to go up against industry giants like Deepmind/OpenAI/Bytedance. More to come! API, technical report, model card… Come join us!

Luma@LumaLabsAI

Uni-1 is here! A new kind of model that thinks and generates pixels simultaneously. Less artificial. More intelligent.

English

368

90.7K

Keşfet

@atasteoff @yong_jae_lee @yu_zhuoran32720 @thinkymachines @liliyu_lili @rown @Mononofu @elonmusk