André Araujo

59 posts

André Araujo

@andrefaraujo

I'm a Research Scientist at Google DeepMind, working on computer vision and machine learning.

São Paulo, Brazil Katılım Temmuz 2009

159 Takip Edilen482 Takipçiler

André Araujo@andrefaraujo·20h

@toilaluan_ @GoogleDeepMind yep, our model is working well without it, so it wasn't something we particularly aimed to improve. But it's definitely worth exploring to potentially boost performance even more.

English

kenji@toilaluan_·1d

@andrefaraujo @GoogleDeepMind Curious why the model doesnt use rope? I see your model still behaves well in multi resolution without rope, but lots of work use it

English

André Araujo@andrefaraujo·1d

True multimodal AI needs to understand the world spatially 🎯 🚀 Excited to release #CVPR2026 TIPSv2 from @GoogleDeepMind, a foundational image-text encoder with spatial awareness, leading to strong overall results and massive gains on patch-text alignment. 🔥 1/N

English

709

76.8K

André Araujo@andrefaraujo·20h

@tommiekerssies @GoogleDeepMind worth a try for sure! In this paper, our focus was on vision-language pretraining, very clearly this works great there, for both TIPS and CLIP/contrastive recipes.

English

Tommie Kerssies@tommiekerssies·1d

@andrefaraujo @GoogleDeepMind Great work! Curious if supervising unmasked tokens could be used to fix the dense feature degradation seen in DINOv3 without the need for Gram anchoring.

English

André Araujo@andrefaraujo·1d

@cataluna84 @BingyiCao @kmaninis @kfrancischen @arjunkarpur @ye_xia_ai @sahildua2305 @tanmayadabral @GuangxingHan @AlexBewleyAI @washingtonsk8 @kchorolab @mseyed @howardzzh Thanks for the invite, sounds interesting! Please reach out to us by email, we'd be happy to chat about it.

English

Mayank Bhaskar@cataluna84·1d

Congratulations on the awesome release! Would you or any of your co-authors like to present your papers in the Cohere Labs Computer Vision community? Cohere Labs Community Page: sites.google.com/cohere.com/coh… Here's the playlist to previous talks, if you are interested: youtube.com/playlist?list=…

English

André Araujo@andrefaraujo·1d

@lasitolas @GoogleDeepMind we didn't focus on this in this work, but it's a great idea for an extension!

English

241

laso@lasitolas·1d

@andrefaraujo @GoogleDeepMind How about image and audio. ? For instance detecting an insect using audio and image

English

278

André Araujo@andrefaraujo·1d

@gbiziel @GoogleDeepMind oh, we just didn't release ViT-S since for many cases ViT-B seemed small enough. But we could possibly release the ViT-S as well if there is excitement to use it in the community.

English

392

Grzegorz Biziel@gbiziel·1d

@andrefaraujo @GoogleDeepMind Why have you skipped distilling a small version? The results were much worse? I am curious from a mobile use case perspective.

English

413

André Araujo@andrefaraujo·1d

@Hershal0_0 @GoogleDeepMind oh yeah, IMHO these are greatest research findings! A simple change that makes a huge diff in practice and anyone can quickly adopt.

English

467

Hershal Rao@Hershal0_0·1d

@andrefaraujo @GoogleDeepMind the "few lines of code" fix that saves the whole architecture. peak engineering.

English

602

André Araujo@andrefaraujo·1d

Huge thanks to amazing co-authors! @BingyiCao, Koert Chen, @kmaninis, @kfrancischen, @arjunkarpur, @ye_xia_ai, @sahildua2305, @tanmayadabral, @GuangxingHan, Bohyung Han, Joshua Ainslie, @AlexBewleyAI, Mithun Jacob, Rene Wagner, @washingtonsk8, @kchorolab, @mseyed, @howardzzh

English

971

André Araujo@andrefaraujo·1d

Qualitatively, TIPSv2 PCA maps capture highly semantically focused details compared to DINOv3 – notice the distinct ceiling lamps in row 2, or how it easily contrasts the people from their backpacks in row 3! 13/N

English

1.1K

Keşfet

@toilaluan_ @GoogleDeepMind @tommiekerssies @cataluna84 @BingyiCao @kmaninis @kfrancischen @arjunkarpur