Guangxing Han

106 posts

Guangxing Han

@GuangxingHan

Research Scientist at Google DeepMind

New York, USA Katılım Ağustos 2014

586 Takip Edilen310 Takipçiler

Guangxing Han retweetledi

André Araujo@andrefaraujo·2d

True multimodal AI needs to understand the world spatially 🎯 🚀 Excited to release #CVPR2026 TIPSv2 from @GoogleDeepMind, a foundational image-text encoder with spatial awareness, leading to strong overall results and massive gains on patch-text alignment. 🔥 1/N

English

715

77.3K

Guangxing Han retweetledi

Google DeepMind@GoogleDeepMind·18 Kas

This is Gemini 3: our most intelligent model that helps you learn, build and plan anything. It comes with state-of-the-art reasoning capabilities, world-leading multimodal understanding, and enables new agentic coding experiences. 🧵

English

213

1.1K

6.5K

1.7M

Guangxing Han@GuangxingHan·24 Eki

Shraman is one of the best young researchers I have been working with. He has demonstrated profound skill in multimodal LLMs for visual grounding, segmentation and reasoning. Reach out to him if you need a top-tier vision-language multimodal expert!

Shraman Pramanick@Shramanpramani2

My role at Meta's SAM team (MSL, previously at FAIR Perception) has been impacted within 3 months of joining after PhD. If you work with multimodal LLMs for grounding or complex reasoning, or have a long-term vision of unified understanding and generation, let's talk. I am on the job market starting immediately. #metalayoffs #FAIR #MSL #SAM

English

435

Guangxing Han retweetledi

Sundar Pichai@sundarpichai·8 Eki

Our new Gemini 2.5 Computer Use model is now available in the Gemini API, setting a new standard on multiple benchmarks with lower latency. These are early days, but the model’s ability to interact with the web – like scrolling, filling forms + navigating dropdowns – is an important next step in building general-purpose agents. Developers can try these capabilities via API in @googleaistudio + Vertex AI.

English

114

300

3.1K

310K

Guangxing Han@GuangxingHan·22 Tem

@YangsiboHuang @jasondeanlee at least better readability. Congrats!

English

Yangsibo Huang@YangsiboHuang·21 Tem

@jasondeanlee Correct, though personally I like Gemini’s style better (could be biased)

English

1.4K

Jason Lee@jasondeanlee·21 Tem

So both companies solved the same 5 problem?

English

12K

Guangxing Han retweetledi

Demis Hassabis@demishassabis·12 Tem

Thrilled to welcome @windsurf_ai founders @_mohansolo & Douglas Chen and some of the brilliant Windsurf eng team to @GoogleDeepMind. Excited to be working with them to turbocharge our Gemini efforts on coding agents, tool use and much more. Great to have you on board!

koray kavukcuoglu@koraykv

Very excited to share that @windsurf_ai co-founders @_mohansolo & Douglas Chen, and some of their talented team have joined @GoogleDeepMind to help advance our work in agentic coding in Gemini. Welcome to our new team mates from Windsurf! theverge.com/openai/705999/…

English

192

2.6K

382.4K

Guangxing Han retweetledi

Giorgos Kordopatis-Zilos@g_kordo·22 Haz

🚨 Deadline Extension Instance-Level Recognition and Generation (ILR+G) Workshop at ICCV2025 @ICCVConference 📅 new deadline: June 26, 2025 (23:59 AoE) 📄 paper submission: cmt3.research.microsoft.com/ILRnG2025 🌐 ILR+G website: ilr-workshop.github.io/ICCVW2025/ #ICCV2025 #ComputerVision #AI

English

3.6K

Guangxing Han retweetledi

Sundar Pichai@sundarpichai·17 Haz

Gemini 2.5 Pro + 2.5 Flash are now stable and generally available. Plus, get a preview of Gemini 2.5 Flash-Lite, our fastest + most cost-efficient 2.5 model yet. 🔦 Exciting steps as we expand our 2.5 series of hybrid reasoning models that deliver amazing performance at the Pareto frontier of cost and speed. 🚀

English

253

448

4.1K

Guangxing Han retweetledi

Arjun Karpur@arjunkarpur·25 Nis

Excited to be presenting TIPS at this morning’s #ICLR2025 poster session! Come by poster #318 and say hi 👋 w/ @kfrancischen @andrefaraujo @kmaninis #ICLR #ICLR25

English

770

Guangxing Han retweetledi

Google DeepMind@GoogleDeepMind·25 Mar

Think you know Gemini? 🤔 Think again. Meet Gemini 2.5: our most intelligent model 💡 The first release is Pro Experimental, which is state-of-the-art across many benchmarks - meaning it can handle complex problems and give more accurate responses. Try it now → goo.gle/4c2HKjf

English

507

2.5K

1.1M

Guangxing Han retweetledi

André Araujo@andrefaraujo·18 Mar

Multimodal AI encoders often lack spatial understanding… but not anymore! Our #ICLR2025 TIPS model (Text-Image Pretraining with Spatial awareness) from @GoogleDeepMind can help 💡🚀 Check out our strong & versatile image-text encoder 💪 Paper & code: arxiv.org/abs/2410.16512

English

322

35.4K

Guangxing Han retweetledi

André Araujo@andrefaraujo·11 Mar

Excited to release a super capable family of image-text models from our TIPS #ICLR2025 paper! github.com/google-deepmin… We have models from ViT-S to -g, with spatial awareness, suitable to many multimodal AI applications. Can’t wait to see what the community will build with them!

André Araujo@andrefaraujo

Want some TIPS? Well, then check out “Text-Image Pretraining with Spatial awareness” :) TIPS is a general-purpose image-text encoder, for off-the-shelf dense and image-level prediction. Finally image-text pretraining with spatially-aware representations! arxiv.org/abs/2410.16512

English

3.6K

Guangxing Han retweetledi

André Araujo@andrefaraujo·21 Şub

Very happy to see learnings from our TIPS method (ICLR'25 accepted arxiv.org/abs/2410.16512) adopted into SigLIP2! A very nice collaboration, great outcome!

Xiaohua Zhai@XiaohuaZhai

Introducing SigLIP2: now trained with additional captioning and self-supervised losses! Stronger everywhere: - multilingual - cls. / ret. - localization - ocr - captioning / vqa Try it out, backward compatible! Models: github.com/google-researc… Paper: arxiv.org/abs/2502.14786

English

521

Guangxing Han retweetledi

André Araujo@andrefaraujo·23 Eki

English

6.2K

Guangxing Han@GuangxingHan·18 Eki

@liuzhuang1234 @PrincetonCS Congrats!

English

279

Zhuang Liu@liuzhuang1234·17 Eki

Excited to share that I will be joining Princeton Computer Science @PrincetonCS as an Assistant Professor in September 2025! I'm looking for students to join me. If you are interested in working with me on VLMs, LLMs, deep learning (vision/LLM) architectures, data, training, efficiency, or understanding, please apply!

English

131

109

1.5K

173.1K

Guangxing Han@GuangxingHan·4 Eki

@_tim_brooks @GoogleDeepMind Congrats Google

English

126

Tim Brooks@_tim_brooks·4 Eki

I will be joining @GoogleDeepMind to work on video generation and world simulators! Can't wait to collaborate with such a talented team. I had an amazing two years at OpenAI making Sora. Thank you to all the passionate and kind people I worked with. Excited for the next chapter!

English

186

160

774.8K

Guangxing Han@GuangxingHan·24 Ağu

@ProfTomYeh @AnthropicAI interesting work. Any link for the original paper?

English

435

Tom Yeh@ProfTomYeh·24 Ağu

[Sparse Autoencoder] by Hand ✍️ @AnthropicAI uses Sparse Autoencoders (SAE) to produce interpretable features for large models. How does an SAE achieve interpretability? [1] Given ↳ Model activations for five tokens (X) ↳ They work but not interpretable. ↳ Can we map each activation (3D) to a higher dimensional space (6D) that we can interpret? [2] Encode: Linear Layer ↳ Multiply X with encoder weights and add biases [3] Encoder: ReLU ↳ Apply ReLU to add non-linearity ↳ ReLU suppresses negative activations (set to 0). ↳ Output: Sparse and interpretable features 𝘧 ↳ "Sparsity" means we want many zeros (21/30 here). I hand picked weight and bias values to purposely let ReLU zero out many features. ↳ "Interpretability" is achieved when only one or two features are positive. Here, 𝘟4 and 𝘟5 both have ones only at 𝘧5. By examining the input data, we can guess what 𝘧5 may mean by checking what 𝘟4 an 𝘟5 have in common, for example, both showing a "park." [4] Decoder: Reconstruction ↳ Multiply f with decoder weights and add biases ↳ Output: X', which is the reconstruction of X from interpretable features. ↳ Reconstruction means we want X' to be as close to X as possible. Here, X' is still quite different from X. More training is needed to update the weights. [5] Decoder: Weights ↳ Compute L2 norm for each weight column vector. We will use it later Training 🏋️ [6] Sparsity: L1 Loss ↳ Sparsity means we want as many values in f to zeros as possible. We use L1, which is the sum of the "absolute value" of all the values. We want that sum to be as small as possible. [7] Sparsity: Gradient ↳ L1's gradient is -1 for positive values, which makes intuitive sense because we want the value to go down to zero. [8] Sparsity: Zero ↳ For other values that are zero, set gradient values to zero since we don't need to change them. [9] Sparsity: Weight ↳ Multiply each gradient (row) by the corresponding decoder weight L2 norm. ↳ Goal: To prevent the algorithm from cheating by learning large weight values to reconstruct X. [10] Reconstruction: MSE Loss ↳ Reconstruction means we want the difference between X and X' to be as small as possible. Here we use L2. [11] Reconstruction: Gradient ↳ L2's gradient is simply X-X' times 2. ↳ With gradients computed, run backpropagation to update weights for both the Encoder and the Decoder, until we find a good balance between Sparsity and Reconstruction.

English

127

635

31.2K

Guangxing Han retweetledi

André Araujo@andrefaraujo·8 Tem

Our call for papers for the ILR workshop at #ECCV2024 is open! Deadline on July25th, options for both long and short papers. Don't miss this opportunity to showcase your work in the broad area of instance-level recognition! Submit at: openreview.net/group?id=thecv…

André Araujo@andrefaraujo

Announcing the #ECCV2024 workshop on Instance-Level Recognition (ILR)! This is the 6th edition in our workshop series, with amazing keynote speakers: @CordeliaSchmid, @jampani_varun and @g_kordo. Call for papers now open! All information on our website: ilr-workshop.github.io/ECCVW2024/

English