

Howard Zhou
17 posts

@howardzzh
I'm a Senior Engineering Director at Google DeepMind, interested in Computer Vision, Machine Learning problems, and Computer Graphics.






🏡Building realistic 3D scenes just got smarter! Introducing our #CVPR2025 work, 🔥FirePlace, a framework that enables Multimodal LLMs to automatically generate realistic and geometrically valid placements for objects into complex 3D scenes. How does it work?🧵👇

🚀 World Record Alert! 🚀 Join the GenAI Intensive with Google and help us BREAK the @GWR title for Largest Virtual AI Conference! Registration closes on March 28th 11:59PM PT. Last chance to register: rsvp.withgoogle.com/events/google-…


Think you know Gemini? 🤔 Think again. Meet Gemini 2.5: our most intelligent model 💡 The first release is Pro Experimental, which is state-of-the-art across many benchmarks - meaning it can handle complex problems and give more accurate responses. Try it now → goo.gle/4c2HKjf














CoCa: a new image-text foundation model subsuming single-encoder, dual-encoder and encoder-decoder. SOTA results on 19 unimodal/multimodal/alignment tasks including 86.3% zero-shot top-1 ImageNet, 90.6% with a frozen encoder, 91.0% when finetuned. Link: arxiv.org/abs/2205.01917

