TimDarcet

1.4K posts

TimDarcet

@TimDarcet

codegen @ FAIR, prev. DINO stuff @ INRIA & FAIR

Katılım Mart 2021

803 Takip Edilen4.5K Takipçiler

Sabitlenmiş Tweet

TimDarcet@TimDarcet·21 Nis

1/ This week we released DINOv2: a series of general vision encoders pretrained without supervision. Good out-of-the-box performance on a variety of domains, matching or surpassing other publicly available encoders.

English

113

705

124.3K

TimDarcet retweetledi

Thomas Kipf@tkipf·2d

Confession: I never had a single work-related sleepless night or ever pulled an all-nighter during my career incl. PhD. Don’t sacrifice your health. Sleep is a superpower — your brain on 8hrs of sleep is a lot smarter than your brain on sleep deprivation. Don’t listen to people who tell you to chronically sacrifice sleep for work. Sacrificing sleep for your kids/family is a different story.

Sarvesh Gharat@SarveshGharat12

@npparikh I doubt all those things are really possible. Infact I believe, you are not doing a good PhD unless you have sleepless nights. Definitely just working on your thesis is possible if you follow a 9-6 schedule, but a good PhD which involves exploring, colabs, etc needs extra hours

English

1.1K

103.2K

TimDarcet retweetledi

Tom Sander (Ph.D.)@RednasTom·2d

1/9 Excited to share TextSeal, our new state-of-the-art watermark for large language models at FAIR / Meta Superintelligence Labs (@AIatMeta) 🔐 Paper: arxiv.org/abs/2605.12456 Code: github.com/facebookresear…

English

3.8K

TimDarcet retweetledi

Thinking Machines@thinkymachines·4d

With the model's simultaneous speech capability, Horace has gotten a lot easier to work with recently.

English

1.2K

259.2K

TimDarcet retweetledi

Quentin Berthet@qberthet·5d

🚨 New paper: Introducing MIND (Monge Inception Distance) Everyone agrees that FID is broken, requires too many samples, slowing down evals. MIND requires 10x fewer samples, is more robust, faster to compute. Our new drop-in replacement for evaluating generative models. 🧵👇

English

457

53.3K

TimDarcet retweetledi

fedaykin reepicheep@mauddweeb·7 May

“And if a computer can grade a task, it can do that task.” Hey guys, @TIME says P=NP. Time to pack it up.

Marilyn Burns@mburnsmath

Interesting article: time.com/article/2026/0…

English

207

4.8K

259.1K

TimDarcet@TimDarcet·7 May

In this field you should aim for your h-index to be always equal to your number of papers

Gabriele Berton@gabriberton

And you realize that Kaiming He is the GOAT when you see that he wrote only 96 papers (vs people with his h-index usually have hundreds)

English

2.5K

TimDarcet@TimDarcet·7 May

@remilouf @dottxtai On signe où?

Français

306

Rémi@remilouf·6 May

Starting a company in a garage is boring so we started @dottxtai in a French castle instead

English

1.4K

184.1K

TimDarcet@TimDarcet·7 May

@KunhaoZ @LawrPaulson > DNS_PROBE_FINISHED_NXDOMAIN That's a short context huh

English

192

Kunhao Zheng@KunhaoZ·6 May

Everytime I write a paper draft, I open cl.cam.ac.uk/~lp15/Pages/Sc… by @LawrPaulson and put every token there in my persistent context.

English

415

TimDarcet retweetledi

Mozes Jacobs@mozesjacobs·10 Şub

Are ViTs secretly RNNs? #ICLR2026 Our 2-block recurrent transformer recovers 96% of DINOv2’s IN-1k accuracy & reproduces its activations 1-to-1, motivating the Block-Recurrent Hypothesis: arxiv.org/abs/2512.19941 w/ @thomas_fel_ @RichieHakim @ABrondetta Demba Ba @t_andy_keller

GIF

English

132

722

107.8K

TimDarcet retweetledi

Alexis Marouani@Alexis1097657·23 Nis

#ICLR2026 Frictions in Vision Transformers 1/ ViTs use a [CLS] for global understanding and patch tokens for local details. Despite their different roles, we've been processing them with the exact same math. Looking forward for discussions ! Sat 25 10:30 AM – 1 PM P4 -#3303

English

2.1K

TimDarcet@TimDarcet·24 Nis

@DilijanTrails I have not used DDP once since 2022 haha In fsdp you have the option to not release shards, in which case it's basically equivalent to DDP, with the gather moved before the fwd instead of after the bwd I think both are fine if both fit

English

156

Aram🇦🇲@DilijanTrails·24 Nis

but say for example, for a 100M DINOv3 (ViT-B size, default ssl config + custom dataset) on single DGX node with 8x H100 (80GB), would you still go plain DDP or does FSDP2 become worth it because of NVLink + potential extra batch size from lower peak RAM? (as well `torch.compile` gains on FSDP2 more-so than DDP). In general, for the same setup: any reason to prefer FSDP2 over DDP even when the model + optimizer + activations easily fit without sharding? i feel like the difference in runtime/MFU on dinov3 is minimal in my experiments...

English

108

TimDarcet@TimDarcet·22 Nis

periodic reminder that FSDP does exactly 0 additional communication compared to DDP

François Fleuret@francoisfleuret

Nothing shockingly dumb?

English

17.1K

TimDarcet@TimDarcet·17 Nis

in other words, you need 300 tokens/gpu to not be HBM bandwidth-bound on H100

François Fleuret@francoisfleuret

Is it reasonable to consider that since the HBM3 memory of a H100 has a bandwidth of ~3Tb/s and the chip can do ~900TFlops, a rule of thumb is that every bfloat16 should be reused ~600 times?

English

2.8K

TimDarcet@TimDarcet·17 Nis

- so we have this beautiful idea that's learning invariance to some automatic novel view generation rules while learning an implicit clustering of the data - oh you have to freeze the last layer for one epoch btw don't ask about it

Yacine Mahdid@yacinelearning

every great research paper I've read has this shape: - absolutely stellar philosophical reasoning about why their structure is the purest and most logical thing - the dankest duct tape you ever seen in your life to make this thing even start

English

114

10.6K

TimDarcet@TimDarcet·14 Nis

@jchencxh I had some dinov2 working with momentum=0 a while ago. The acc at optimal lr was the same, but the acc vs lr curve was much more spiky (ie unstable)

English

159

James Chen@jchencxh·7 Nis

i delude myself into thinking i can remove the EMA encoder from SSL training (without regularization) every 6 months, and it gives me ~3 weeks of mental illness every time.

English

113

9.5K

TimDarcet retweetledi

Florian Brand@xeophon·9 Nis

maybe @rasdani_ was right and we do need a Paris office…

English

100

11.6K

TimDarcet retweetledi

Alexandr Wang@alexandr_wang·9 Nis

We are always open to feedback and welcome any perspective on weaknesses you've noticed in the model from using it. We are quite upfront that our model does not perform well on ARC AGI 2 for example, and published those results for the community to understand. That might reflect some areas of improvement of the model that we could focus on in the future. In general, we have been pleasantly surprised by users' feedback on the models in areas like visual coding, writing style, and reasoning queries.

English

826

68.1K

TimDarcet retweetledi

Wassim (Wes) Bouaziz @ ICLR 🇧🇷@_Vassim·8 Nis

Numbers in blue are blue 🤷‍♂️

Alexandr Wang@alexandr_wang

1/ today we're releasing muse spark, the first model from MSL. nine months ago we rebuilt our ai stack from scratch. new infrastructure, new architecture, new data pipelines. muse spark is the result of that work, and now it powers meta ai. 🧵

English

2.1K

TimDarcet@TimDarcet·8 Nis

@meekaale @IsaacKing314 @ApriiSR Also note if it's freely available for offense, it's freely available for defense. In some cases the defense will adopt it slower, in other cases faster, but it's not an unbalanced situation in the long-term

English

Mikael Brockman@meekaale·8 Nis

@IsaacKing314 @ApriiSR it seems hard to compare with tariffs which cause relatively straightforwardly calculable and differentiable economic damage while mythos capabilities for example would plausibly let north korea set up a dark factory producing stuxnet class exploits

English

Isaac King 🔎@IsaacKing314·8 Nis

Happy to predict that Mythos-level hacking capabilities will not, in fact, cause the collapse of society when released to the public.

English

965

82.2K

TimDarcet retweetledi

Saksham Suri@_sakshams_·6 Nis

Introducing Efficient Universal Perception Encoder (EUPE)🚀 A family of compact vision encoders that match or exceed domain experts across diverse tasks, in a single model. 📄 Paper: arxiv.org/pdf/2603.22387 💻 Code: github.com/facebookresear… 🤗 Models: huggingface.co/collections/fa… 🧵

English

334

16.8K

Keşfet

@AIatMeta @TIME @remilouf @dottxtai @KunhaoZ @LawrPaulson @thomas_fel_ @RichieHakim