André Araujo

59 posts

André Araujo

André Araujo

@andrefaraujo

I'm a Research Scientist at Google DeepMind, working on computer vision and machine learning.

São Paulo, Brazil Katılım Temmuz 2009
159 Takip Edilen482 Takipçiler
André Araujo
André Araujo@andrefaraujo·
@toilaluan_ @GoogleDeepMind yep, our model is working well without it, so it wasn't something we particularly aimed to improve. But it's definitely worth exploring to potentially boost performance even more.
English
0
0
0
51
kenji
kenji@toilaluan_·
@andrefaraujo @GoogleDeepMind Curious why the model doesnt use rope? I see your model still behaves well in multi resolution without rope, but lots of work use it
English
1
0
0
88
André Araujo
André Araujo@andrefaraujo·
True multimodal AI needs to understand the world spatially 🎯 🚀 Excited to release #CVPR2026 TIPSv2 from @GoogleDeepMind, a foundational image-text encoder with spatial awareness, leading to strong overall results and massive gains on patch-text alignment. 🔥 1/N
André Araujo tweet media
English
9
89
709
76.8K
André Araujo
André Araujo@andrefaraujo·
@tommiekerssies @GoogleDeepMind worth a try for sure! In this paper, our focus was on vision-language pretraining, very clearly this works great there, for both TIPS and CLIP/contrastive recipes.
English
0
0
1
71
Tommie Kerssies
Tommie Kerssies@tommiekerssies·
@andrefaraujo @GoogleDeepMind Great work! Curious if supervising unmasked tokens could be used to fix the dense feature degradation seen in DINOv3 without the need for Gram anchoring.
English
1
0
0
99
André Araujo
André Araujo@andrefaraujo·
@gbiziel @GoogleDeepMind oh, we just didn't release ViT-S since for many cases ViT-B seemed small enough. But we could possibly release the ViT-S as well if there is excitement to use it in the community.
English
1
0
7
392
André Araujo
André Araujo@andrefaraujo·
@Hershal0_0 @GoogleDeepMind oh yeah, IMHO these are greatest research findings! A simple change that makes a huge diff in practice and anyone can quickly adopt.
English
0
0
0
467
André Araujo
André Araujo@andrefaraujo·
Qualitatively, TIPSv2 PCA maps capture highly semantically focused details compared to DINOv3 – notice the distinct ceiling lamps in row 2, or how it easily contrasts the people from their backpacks in row 3! 13/N
André Araujo tweet media
English
1
0
13
1.1K