Ross Girshick

28 posts

Ross Girshick

@inkynumbers

Giver of vision to machines.

Katılım Nisan 2009

42 Takip Edilen3.8K Takipçiler

Ross Girshick@inkynumbers·25 Şub

🐜🐜🐜

QME

3.8K

Ross Girshick@inkynumbers·31 Oca

Do you want to work with me and our amazing team? Apply at vercept.com/careers Stack: PyTorch, React, Next.js, TypeScript #PyTorch #NextJS #AI #React #AIStartup

English

126

35.2K

Ross Girshick@inkynumbers·2 Ara

@giffmana FAIR was and continues to be an amazing place! However, being in one place for a long time (8 yrs for me) can eventually become its own good reason to move. Reinitialization and randomization are important in research life. (Any talk of publication quotas is pure nonsense.)

English

213

30.1K

Lucas Beyer (bl16)@giffmana·1 Ara

Wow! FAIR was home to the best computer vision researchers. But over the last couple years, one by one, they left. It's now a shell of its former self. This is not a hate post: I like and admire them. But wonder what went wrong. I'd love to buy a book that tells their story.

English

306

175.2K

Ross Girshick@inkynumbers·1 Mar

@gruntleme It's a wonderful illustration of a giant house spider! These friends visit me from time to time in the basement (where I work from home). They have a curious range: most of Europe, a small bit of the PNW around Seattle and Vancouver, BC, and another small bit of the Mid-Atlantic.

English

Ross Girshick@inkynumbers·14 Kas

@karpathy @giffmana @PaulKRubenstein @endernewton @sainingxie The mismatch may be an issue (I don't know), but apparently it's not a catastrophe. End-to-end or partial fine-tuning may help compensate, if it is a problem. I also find it somewhat concerning and think it could be worth investigating.

English

Andrej Karpathy@karpathy·14 Kas

@inkynumbers @giffmana @PaulKRubenstein @endernewton @sainingxie Oh hey, following Twitter rabbit hole bears fruit :) Great, was wondering the same. Slightly unnerved about the train/test mismatch and surprised it is not an issue. Good ref to the earlier/related high-res result.

English

Lucas Beyer (bl16)@giffmana·12 Kas

1/N The return of patch-based self-supervision! It never worked well and you had to bend over backwards with ResNets (I tried). Now with ViT, very simple patch-based self-supervised pre-training rocks! First BeIT, now Masked AutoEncoders i1k=87.8% arxiv.org/pdf/2111.06377… 🧶

English

216

1.1K

Ross Girshick@inkynumbers·13 Kas

@giffmana @PaulKRubenstein @endernewton @sainingxie Yes, all patches are used for downstream tasks.

English

Lucas Beyer (bl16)@giffmana·13 Kas

@PaulKRubenstein @endernewton @sainingxie @inkynumbers Hi Paul! They don't specify but I expect that all patches are provided. ViT is flexible wrt patches it sees, we already showed this via the high-res trick in original ViT (which also works without fine-tuning). But would be good if one of the authors could confirm.

English

Ross Girshick@inkynumbers·12 Kas

@giffmana I agree about the use of nparams being slightly wrong here. Within the scope of the fig each "column" of points is comparable, but using flops on the x-axis would be more meaningful wrt scaling. (I actually complain about nparams as a complexity measure all the time...)

English

Lucas Beyer (bl16)@giffmana·12 Kas

@inkynumbers re /14: ah thanks. This makes using nparams a little wrong: H/16 would have same params but less "capacity". re style: the style is actually fine, I like the blue-gray, but this figure is clearly rushed/unpolished. One more day of love would not have hurt :)

English

Ross Girshick@inkynumbers·12 Kas

@giffmana @y_m_asano @endernewton @sainingxie Our bias is for det and seg transfer, rather than more cls results, so that's what we went with (lacking bandwidth for both). I do want to note the det and seg tables show IN1k sup baselines (not just self-sup). MAE (and BEiT) surpass IN1k sup convincingly, exciting to me!

English

Lucas Beyer (bl16)@giffmana·12 Kas

@y_m_asano @endernewton @sainingxie @inkynumbers That's too old-school ;-) Agree transfer results are limited, but they do have seg and det results, but only compare to other self-sup in their tables.

English

Ross Girshick@inkynumbers·12 Kas

@giffmana It's a bit buried in the caption of Table 3, but therein it says ViT-H is /14. We debated the line-plot style and have diverging opinions of what looks nice ;).

English

Lucas Beyer (bl16)@giffmana·12 Kas

7/N Bigger models work better. AFAIK, first paper that's not ours and trains a ViT-H model (yay TPUs! Is it /14 or /16?) Here, having params on x-axis makes sense: it's comparing the exact same architecture with same(?) patch-size. They need to up their line-plots game though.