TimDarcet

1.4K posts

TimDarcet

TimDarcet

@TimDarcet

codegen @ FAIR, prev. DINO stuff @ INRIA & FAIR

Katılım Mart 2021
803 Takip Edilen4.5K Takipçiler
Sabitlenmiş Tweet
TimDarcet
TimDarcet@TimDarcet·
1/ This week we released DINOv2: a series of general vision encoders pretrained without supervision. Good out-of-the-box performance on a variety of domains, matching or surpassing other publicly available encoders.
English
5
113
705
124.3K
TimDarcet retweetledi
Thomas Kipf
Thomas Kipf@tkipf·
Confession: I never had a single work-related sleepless night or ever pulled an all-nighter during my career incl. PhD. Don’t sacrifice your health. Sleep is a superpower — your brain on 8hrs of sleep is a lot smarter than your brain on sleep deprivation. Don’t listen to people who tell you to chronically sacrifice sleep for work. Sacrificing sleep for your kids/family is a different story.
Sarvesh Gharat@SarveshGharat12

@npparikh I doubt all those things are really possible. Infact I believe, you are not doing a good PhD unless you have sleepless nights. Definitely just working on your thesis is possible if you follow a 9-6 schedule, but a good PhD which involves exploring, colabs, etc needs extra hours

English
29
73
1.1K
103.2K
TimDarcet retweetledi
Thinking Machines
Thinking Machines@thinkymachines·
With the model's simultaneous speech capability, Horace has gotten a lot easier to work with recently.
English
42
62
1.2K
259.2K
TimDarcet retweetledi
Quentin Berthet
Quentin Berthet@qberthet·
🚨 New paper: Introducing MIND (Monge Inception Distance) Everyone agrees that FID is broken, requires too many samples, slowing down evals. MIND requires 10x fewer samples, is more robust, faster to compute. Our new drop-in replacement for evaluating generative models. 🧵👇
Quentin Berthet tweet media
English
12
66
457
53.3K
Rémi
Rémi@remilouf·
Starting a company in a garage is boring so we started @dottxtai in a French castle instead
Rémi tweet media
English
93
37
1.4K
184.1K
TimDarcet retweetledi
Alexis Marouani
Alexis Marouani@Alexis1097657·
#ICLR2026 Frictions in Vision Transformers 1/ ViTs use a [CLS] for global understanding and patch tokens for local details. Despite their different roles, we've been processing them with the exact same math. Looking forward for discussions ! Sat 25 10:30 AM – 1 PM P4 -#3303
English
1
6
31
2.1K
TimDarcet
TimDarcet@TimDarcet·
@DilijanTrails I have not used DDP once since 2022 haha In fsdp you have the option to not release shards, in which case it's basically equivalent to DDP, with the gather moved before the fwd instead of after the bwd I think both are fine if both fit
English
0
0
1
156
Aram🇦🇲
Aram🇦🇲@DilijanTrails·
but say for example, for a 100M DINOv3 (ViT-B size, default ssl config + custom dataset) on single DGX node with 8x H100 (80GB), would you still go plain DDP or does FSDP2 become worth it because of NVLink + potential extra batch size from lower peak RAM? (as well `torch.compile` gains on FSDP2 more-so than DDP). In general, for the same setup: any reason to prefer FSDP2 over DDP even when the model + optimizer + activations easily fit without sharding? i feel like the difference in runtime/MFU on dinov3 is minimal in my experiments...
English
1
0
0
108
TimDarcet
TimDarcet@TimDarcet·
- so we have this beautiful idea that's learning invariance to some automatic novel view generation rules while learning an implicit clustering of the data - oh you have to freeze the last layer for one epoch btw don't ask about it
Yacine Mahdid@yacinelearning

every great research paper I've read has this shape: - absolutely stellar philosophical reasoning about why their structure is the purest and most logical thing - the dankest duct tape you ever seen in your life to make this thing even start

English
2
0
114
10.6K
TimDarcet
TimDarcet@TimDarcet·
@jchencxh I had some dinov2 working with momentum=0 a while ago. The acc at optimal lr was the same, but the acc vs lr curve was much more spiky (ie unstable)
English
1
0
1
159
James Chen
James Chen@jchencxh·
i delude myself into thinking i can remove the EMA encoder from SSL training (without regularization) every 6 months, and it gives me ~3 weeks of mental illness every time.
English
9
6
113
9.5K
TimDarcet retweetledi
Florian Brand
Florian Brand@xeophon·
maybe @rasdani_ was right and we do need a Paris office…
Florian Brand tweet media
English
12
4
100
11.6K
TimDarcet retweetledi
Alexandr Wang
Alexandr Wang@alexandr_wang·
We are always open to feedback and welcome any perspective on weaknesses you've noticed in the model from using it. We are quite upfront that our model does not perform well on ARC AGI 2 for example, and published those results for the community to understand. That might reflect some areas of improvement of the model that we could focus on in the future. In general, we have been pleasantly surprised by users' feedback on the models in areas like visual coding, writing style, and reasoning queries.
English
39
15
826
68.1K
TimDarcet
TimDarcet@TimDarcet·
@meekaale @IsaacKing314 @ApriiSR Also note if it's freely available for offense, it's freely available for defense. In some cases the defense will adopt it slower, in other cases faster, but it's not an unbalanced situation in the long-term
English
0
0
0
35
Mikael Brockman
Mikael Brockman@meekaale·
@IsaacKing314 @ApriiSR it seems hard to compare with tariffs which cause relatively straightforwardly calculable and differentiable economic damage while mythos capabilities for example would plausibly let north korea set up a dark factory producing stuxnet class exploits
English
2
0
16
1K
Isaac King 🔎
Isaac King 🔎@IsaacKing314·
Happy to predict that Mythos-level hacking capabilities will not, in fact, cause the collapse of society when released to the public.
English
30
34
965
82.2K