Howard Zhou

17 posts

Howard Zhou

Howard Zhou

@howardzzh

I'm a Senior Engineering Director at Google DeepMind, interested in Computer Vision, Machine Learning problems, and Computer Graphics.

Katılım Haziran 2011
80 Takip Edilen74 Takipçiler
Howard Zhou retweetledi
André Araujo
André Araujo@andrefaraujo·
True multimodal AI needs to understand the world spatially 🎯 🚀 Excited to release #CVPR2026 TIPSv2 from @GoogleDeepMind, a foundational image-text encoder with spatial awareness, leading to strong overall results and massive gains on patch-text alignment. 🔥 1/N
André Araujo tweet media
English
9
89
716
77.4K
Howard Zhou retweetledi
Google AI Developers
Google AI Developers@googleaidevs·
Gemini 3 Pro is the frontier of multimodal AI, delivering SOTA performance across document, screen, spatial, and video understanding. Read our deep dive on how we’ve pushed our core capabilities to power hero use cases across: + Docs: "derender" complex docs into structured code (HTML/LaTeX) + Screen: build robust computer agents that automate complex tasks + Spatial: generate collision-free trajectories for robotics & XR + Video: analyze sports footage using high-FPS processing with "thinking" mode See how these capabilities are transforming workflows in education, biomedical, and law/finance → goo.gle/3Mt3UlT
Google AI Developers tweet media
English
45
134
1.1K
328.8K
Howard Zhou
Howard Zhou@howardzzh·
Last Call: Learn GenAI and help us break the GUINNESS WORLD RECORDS™ for Largest Virtual AI Conference! Join Google & Kaggle's GenAI Intensive: No cost, live sessions, hands-on labs. Registration closes this Friday! #GenAI #GuinnessWorldRecords #Kaggle #GoogleAI
Kaggle@kaggle

🚀 World Record Alert! 🚀 Join the GenAI Intensive with Google and help us BREAK the @GWR title for Largest Virtual AI Conference! Registration closes on March 28th 11:59PM PT. Last chance to register: rsvp.withgoogle.com/events/google-…

English
0
0
0
100
Howard Zhou retweetledi
Arena.ai
Arena.ai@arena·
BREAKING: Gemini 2.5 Pro is now #1 on the Arena leaderboard - the largest score jump ever (+40 pts vs Grok-3/GPT-4.5)! 🏆 Tested under codename "nebula"🌌, Gemini 2.5 Pro ranked #1🥇 across ALL categories and UNIQUELY #1 in Math, Creative Writing, Instruction Following, Longer Query, and Multi-Turn! Massive congrats to @GoogleDeepMind for this incredible Arena milestone! 🙌 More highlights in thread👇
Arena.ai tweet media
Google DeepMind@GoogleDeepMind

Think you know Gemini? 🤔 Think again. Meet Gemini 2.5: our most intelligent model 💡 The first release is Pro Experimental, which is state-of-the-art across many benchmarks - meaning it can handle complex problems and give more accurate responses. Try it now → goo.gle/4c2HKjf

English
72
399
2.3K
467.3K
Howard Zhou retweetledi
André Araujo
André Araujo@andrefaraujo·
Multimodal AI encoders often lack spatial understanding… but not anymore! Our #ICLR2025 TIPS model (Text-Image Pretraining with Spatial awareness) from @GoogleDeepMind can help 💡🚀 Check out our strong & versatile image-text encoder 💪 Paper & code: arxiv.org/abs/2410.16512
André Araujo tweet mediaAndré Araujo tweet mediaAndré Araujo tweet mediaAndré Araujo tweet media
English
6
64
322
35.4K
风起梧桐轩 💔
风起梧桐轩 💔@nagaviperszy·
太太说,这样更能加强安全意识。
风起梧桐轩 💔 tweet media
中文
6
1
23
2.2K
Howard Zhou retweetledi
André Araujo
André Araujo@andrefaraujo·
Want some TIPS? Well, then check out “Text-Image Pretraining with Spatial awareness” :) TIPS is a general-purpose image-text encoder, for off-the-shelf dense and image-level prediction. Finally image-text pretraining with spatially-aware representations! arxiv.org/abs/2410.16512
André Araujo tweet mediaAndré Araujo tweet mediaAndré Araujo tweet mediaAndré Araujo tweet media
English
4
11
49
6.2K
Howard Zhou retweetledi
AK
AK@_akhaliq·
Modeling Collaborator Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use From content moderation to wildlife conservation, the number of applications that require models to recognize nuanced or subjective visual concepts is growing.
AK tweet media
English
3
16
42
16.6K
Howard Zhou retweetledi
Jason Baldridge
Jason Baldridge@jasonbaldridge·
We are excited to share our work on our Pathways Autoregressive Text-to-Image model, Parti! #Parti achieves high-fidelity photorealistic image generation and supports content-rich synthesis involving complex compositions and world knowledge. parti.research.google
Jason Baldridge tweet media
English
10
77
340
0
Howard Zhou retweetledi
Howard Zhou retweetledi
Frank Dellaert
Frank Dellaert@fdellaert·
In anticipation of the Intl. Conf. on Computer Vision (#ICCV2021) this week, I rounded up all papers that use Neural Radiance Fields (NeRFs) represented in the main #ICCV2021 conference here (1/N): dellaert.github.io/NeRF21
English
2
71
317
0
Peyman Milanfar
Peyman Milanfar@docmilanfar·
Whatever your allegiance it’s hard not to be happy for Messi 🇦🇷
English
1
1
40
0
Howard Zhou retweetledi
Jon Barron
Jon Barron@jon_barron·
Training NeRFs per-scene is so 2020. Inspired by image based rendering, IBRNet does amortized inference for view synthesis by learning how to look at input images at render time. 15% drop in error, 80% fewer FLOPs than NeRF. Great work @QianqianWang5! ibrnet.github.io
English
2
79
442
0