Alexis Marouani

@Alexis1097657

PhD Student in DINO Team | Meta FAIR

Paris Katılım Temmuz 2023

12 Takip Edilen18 Takipçiler

Alexis Marouani@Alexis1097657·23 Nis

7/ Summary: Stop treating your tokens like they're identical. A little specialization goes a long way in making Vision Transformers more efficient and powerful. Read the full paper here: arxiv.org/abs/2602.08626 #MachineLearning #ComputerVision #AI #ViT #ICLR2026

English

229

Alexis Marouani@Alexis1097657·23 Nis

6/ Why does this matter? Standard ViTs often trade off classification for segmentation quality. By specializing the layers, we get the best of both worlds. It makes ViTs much more effective for "dense" tasks like object detection and depth estimation.

English

236

Alexis Marouani@Alexis1097657·23 Nis

#ICLR2026 Frictions in Vision Transformers 1/ ViTs use a [CLS] for global understanding and patch tokens for local details. Despite their different roles, we've been processing them with the exact same math. Looking forward for discussions ! Sat 25 10:30 AM – 1 PM P4 -#3303

English

2.1K

Alexis Marouani retweetledi

Dmytro Mishkin 🇺🇦@ducha_aiki·10 Şub

Revisiting [CLS] and Patch Token Interaction in Vision Transformers Alexis Marouani, @oriane_simeoni Hervé Jégou @p_bojanowski @huyvvo tl;dr: ViT spends some capacity making sure CLS token is distinct. Using un-shared FFN for it may improve things. arxiv.org/abs/2602.08626

English

4.5K

Keşfet

@oriane_simeoni @p_bojanowski @huyvvo @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates