Guangxing Han

106 posts

Guangxing Han

Guangxing Han

@GuangxingHan

Research Scientist at Google DeepMind

New York, USA Katılım Ağustos 2014
586 Takip Edilen310 Takipçiler
Guangxing Han retweetledi
André Araujo
André Araujo@andrefaraujo·
True multimodal AI needs to understand the world spatially 🎯 🚀 Excited to release #CVPR2026 TIPSv2 from @GoogleDeepMind, a foundational image-text encoder with spatial awareness, leading to strong overall results and massive gains on patch-text alignment. 🔥 1/N
André Araujo tweet media
English
9
89
715
77.3K
Guangxing Han retweetledi
Google DeepMind
Google DeepMind@GoogleDeepMind·
This is Gemini 3: our most intelligent model that helps you learn, build and plan anything. It comes with state-of-the-art reasoning capabilities, world-leading multimodal understanding, and enables new agentic coding experiences. 🧵
English
213
1.1K
6.5K
1.7M
Guangxing Han
Guangxing Han@GuangxingHan·
Shraman is one of the best young researchers I have been working with. He has demonstrated profound skill in multimodal LLMs for visual grounding, segmentation and reasoning. Reach out to him if you need a top-tier vision-language multimodal expert!
Shraman Pramanick@Shramanpramani2

My role at Meta's SAM team (MSL, previously at FAIR Perception) has been impacted within 3 months of joining after PhD. If you work with multimodal LLMs for grounding or complex reasoning, or have a long-term vision of unified understanding and generation, let's talk. I am on the job market starting immediately. #metalayoffs #FAIR #MSL #SAM

English
0
0
1
435
Guangxing Han retweetledi
Sundar Pichai
Sundar Pichai@sundarpichai·
Our new Gemini 2.5 Computer Use model is now available in the Gemini API, setting a new standard on multiple benchmarks with lower latency. These are early days, but the model’s ability to interact with the web – like scrolling, filling forms + navigating dropdowns – is an important next step in building general-purpose agents. Developers can try these capabilities via API in @googleaistudio + Vertex AI.
Sundar Pichai tweet media
English
114
300
3.1K
310K
Yangsibo Huang
Yangsibo Huang@YangsiboHuang·
@jasondeanlee Correct, though personally I like Gemini’s style better (could be biased)
English
2
0
13
1.4K
Jason Lee
Jason Lee@jasondeanlee·
So both companies solved the same 5 problem?
English
7
0
41
12K
Guangxing Han retweetledi
Demis Hassabis
Demis Hassabis@demishassabis·
Thrilled to welcome @windsurf_ai founders @_mohansolo & Douglas Chen and some of the brilliant Windsurf eng team to @GoogleDeepMind. Excited to be working with them to turbocharge our Gemini efforts on coding agents, tool use and much more. Great to have you on board!
koray kavukcuoglu@koraykv

Very excited to share that @windsurf_ai co-founders @_mohansolo & Douglas Chen, and some of their talented team have joined @GoogleDeepMind to help advance our work in agentic coding in Gemini. Welcome to our new team mates from Windsurf! theverge.com/openai/705999/…

English
90
192
2.6K
382.4K
Guangxing Han retweetledi
Sundar Pichai
Sundar Pichai@sundarpichai·
Gemini 2.5 Pro + 2.5 Flash are now stable and generally available. Plus, get a preview of Gemini 2.5 Flash-Lite, our fastest + most cost-efficient 2.5 model yet. 🔦 Exciting steps as we expand our 2.5 series of hybrid reasoning models that deliver amazing performance at the Pareto frontier of cost and speed. 🚀
Sundar Pichai tweet media
English
253
448
4.1K
1M
Guangxing Han retweetledi
Google DeepMind
Google DeepMind@GoogleDeepMind·
Think you know Gemini? 🤔 Think again. Meet Gemini 2.5: our most intelligent model 💡 The first release is Pro Experimental, which is state-of-the-art across many benchmarks - meaning it can handle complex problems and give more accurate responses. Try it now → goo.gle/4c2HKjf
English
90
507
2.5K
1.1M
Guangxing Han retweetledi
André Araujo
André Araujo@andrefaraujo·
Multimodal AI encoders often lack spatial understanding… but not anymore! Our #ICLR2025 TIPS model (Text-Image Pretraining with Spatial awareness) from @GoogleDeepMind can help 💡🚀 Check out our strong & versatile image-text encoder 💪 Paper & code: arxiv.org/abs/2410.16512
André Araujo tweet mediaAndré Araujo tweet mediaAndré Araujo tweet mediaAndré Araujo tweet media
English
6
64
322
35.4K
Guangxing Han retweetledi
André Araujo
André Araujo@andrefaraujo·
Excited to release a super capable family of image-text models from our TIPS #ICLR2025 paper! github.com/google-deepmin… We have models from ViT-S to -g, with spatial awareness, suitable to many multimodal AI applications. Can’t wait to see what the community will build with them!
André Araujo@andrefaraujo

Want some TIPS? Well, then check out “Text-Image Pretraining with Spatial awareness” :) TIPS is a general-purpose image-text encoder, for off-the-shelf dense and image-level prediction. Finally image-text pretraining with spatially-aware representations! arxiv.org/abs/2410.16512

English
1
6
17
3.6K
Guangxing Han retweetledi
Guangxing Han retweetledi
André Araujo
André Araujo@andrefaraujo·
Want some TIPS? Well, then check out “Text-Image Pretraining with Spatial awareness” :) TIPS is a general-purpose image-text encoder, for off-the-shelf dense and image-level prediction. Finally image-text pretraining with spatially-aware representations! arxiv.org/abs/2410.16512
André Araujo tweet mediaAndré Araujo tweet mediaAndré Araujo tweet mediaAndré Araujo tweet media
English
4
11
49
6.2K
Zhuang Liu
Zhuang Liu@liuzhuang1234·
Excited to share that I will be joining Princeton Computer Science @PrincetonCS as an Assistant Professor in September 2025! I'm looking for students to join me. If you are interested in working with me on VLMs, LLMs, deep learning (vision/LLM) architectures, data, training, efficiency, or understanding, please apply!
English
131
109
1.5K
173.1K
Tim Brooks
Tim Brooks@_tim_brooks·
I will be joining @GoogleDeepMind to work on video generation and world simulators! Can't wait to collaborate with such a talented team. I had an amazing two years at OpenAI making Sora. Thank you to all the passionate and kind people I worked with. Excited for the next chapter!
English
186
160
4K
774.8K
Tom Yeh
Tom Yeh@ProfTomYeh·
[Sparse Autoencoder] by Hand ✍️ @AnthropicAI uses Sparse Autoencoders (SAE) to produce interpretable features for large models. How does an SAE achieve interpretability? [1] Given ↳ Model activations for five tokens (X) ↳ They work but not interpretable. ↳ Can we map each activation (3D) to a higher dimensional space (6D) that we can interpret? [2] Encode: Linear Layer ↳ Multiply X with encoder weights and add biases [3] Encoder: ReLU ↳ Apply ReLU to add non-linearity ↳ ReLU suppresses negative activations (set to 0). ↳ Output: Sparse and interpretable features 𝘧 ↳ "Sparsity" means we want many zeros (21/30 here). I hand picked weight and bias values to purposely let ReLU zero out many features. ↳ "Interpretability" is achieved when only one or two features are positive. Here, 𝘟4 and 𝘟5 both have ones only at 𝘧5. By examining the input data, we can guess what 𝘧5 may mean by checking what 𝘟4 an 𝘟5 have in common, for example, both showing a "park." [4] Decoder: Reconstruction ↳ Multiply f with decoder weights and add biases ↳ Output: X', which is the reconstruction of X from interpretable features. ↳ Reconstruction means we want X' to be as close to X as possible. Here, X' is still quite different from X. More training is needed to update the weights. [5] Decoder: Weights ↳ Compute L2 norm for each weight column vector. We will use it later Training 🏋️ [6] Sparsity: L1 Loss ↳ Sparsity means we want as many values in f to zeros as possible. We use L1, which is the sum of the "absolute value" of all the values. We want that sum to be as small as possible. [7] Sparsity: Gradient ↳ L1's gradient is -1 for positive values, which makes intuitive sense because we want the value to go down to zero. [8] Sparsity: Zero ↳ For other values that are zero,  set gradient values to zero since we don't need to change them. [9] Sparsity: Weight ↳ Multiply each gradient (row) by the corresponding decoder weight L2 norm. ↳ Goal: To prevent the algorithm from cheating by learning large weight values to reconstruct X. [10] Reconstruction: MSE Loss ↳ Reconstruction means we want the difference between X and X' to be as small as possible. Here we use L2. [11] Reconstruction: Gradient ↳ L2's gradient is simply X-X' times 2. ↳ With gradients computed, run backpropagation to update weights for both the Encoder and the Decoder, until we find a good balance between Sparsity and Reconstruction.
English
4
127
635
31.2K
Guangxing Han retweetledi
André Araujo
André Araujo@andrefaraujo·
Our call for papers for the ILR workshop at #ECCV2024 is open! Deadline on July25th, options for both long and short papers. Don't miss this opportunity to showcase your work in the broad area of instance-level recognition! Submit at: openreview.net/group?id=thecv…
André Araujo@andrefaraujo

Announcing the #ECCV2024 workshop on Instance-Level Recognition (ILR)! This is the 6th edition in our workshop series, with amazing keynote speakers: @CordeliaSchmid, @jampani_varun and @g_kordo. Call for papers now open! All information on our website: ilr-workshop.github.io/ECCVW2024/

English
0
2
5
1K
Zhiyuan  Liu
Zhiyuan Liu@zibuyu9·
话说在twitter上写中文,看得人多吗?🧐
中文
34
0
99
20K