Dídac Surís

213 posts

Dídac Surís banner
Dídac Surís

Dídac Surís

@Surisdi

Research Scientist @AIatMeta. Previously a Computer Vision PhD student at @Columbia. Amateur guitarist. Tweets in Catalan, Spanish or English

Katılım Ağustos 2010
514 Takip Edilen408 Takipçiler
Dídac Surís retweetledi
Nicolas Carion
Nicolas Carion@alcinos26·
Happy and proud to release SAM3, our new segmentation model. What's new? It's now a fully fledged open vocabulary detector, capable of finding any object given a simple text prompt or an example. And we cooked hard to bring you SAM's signature "it just works" feel. A 🧵 1/x
English
5
8
53
10.4K
Dídac Surís retweetledi
Mia Chiquier
Mia Chiquier@mia_chiquier·
Multimodal pre-trained models, such as CLIP, are popular for zero-shot classification due to their open-vocabulary flexibility and high performance, but how would you classify images that don’t have obvious names using CLIP?
GIF
English
2
11
66
12.9K
Dídac Surís retweetledi
Ruoshi Liu
Ruoshi Liu@ruoshi_liu·
Thanks @_akhaliq for tweeting our work! It has been shown in our prior work Zero123 that Stable Diffusion has learned powerful visual priors that can be serve as the foundation of zero-shot generalization ability for many vision tasks. In our recent work pix2gestalt, we show that Stable Diffusion, when finetuned on the task of amodal segmentation, performs incredibly well on data far outside of training distribution. We've demonstrated that this model can serves as a unified solution for occlusion reasoning, benefiting many other tasks whose performance is greatly hindered by occlusion in images such as recognition, novel view synthesis, 3D reconstruction etc. Work led by Ege Ozguroglu who is currently applying for PhD!
AK@_akhaliq

pix2gestalt: Amodal Segmentation by Synthesizing Wholes paper page: huggingface.co/papers/2401.14… synthesizes whole objects from only partially visible ones, enabling amodal segmentation, recognition, and 3D reconstruction of occluded objects

English
2
9
62
17K
Dídac Surís retweetledi
Justin Salamon
Justin Salamon@justin_salamon·
Excited to finally share this project! We train a model to match music to video based on its contents and style 🎞️➡️🎵 Here are examples of matching music to video shot on mobile phones 📱 Led by @Surisdi w/ @cvondrick & Bryan Russell #CVPR2022 Let's see more results 🧵(1/n)
Dídac Surís@Surisdi

Do you have some home videos you’d like to add music to? Tomorrow at #CVPR2022 we present “It’s Time for Artistic Correspondence in Music and Video”! video: youtu.be/A4g30USxI0Q website and paper: musicforvideo.cs.columbia.edu w/ @cvondrick, Bryan Russell, @justin_salamon

English
4
6
44
0
Michael Black
Michael Black@Michael_J_Black·
One more @CVPR question: We are asked to include captions in our videos. Is this also true for oral presentations? Oral videos are not supposed to include the speaker thumbnail so I wondered if they are also not supposed to be captioned.
English
1
0
9
0
Dídac Surís
Dídac Surís@Surisdi·
@ctocevents @CVPR @Michael_J_Black Is this also expected for oral presentations? Is there a way of adding captions as a separate file, so that they can be turned on/off for virtual/oral? (e.g. *.vtt files). Thanks!
English
1
0
0
0
Dídac Surís retweetledi
Pascal Mettes
Pascal Mettes@PascalMettes·
Want to know what the future of video research will be? Join us at the #ICCV2021 workshop on Structured Representations for Video Understanding. We end with a bang: a panel with Josef Sivic and Deva Ramanan. A must watch! We start at 15:00 CEST (9:00 local), panel at 21:30 CEST
Pascal Mettes tweet media
English
1
12
46
0
Dídac Surís retweetledi
Hazel Doughty
Hazel Doughty@doughty_hazel·
Working on how to best represent video? Still time to submit to the #ICCV2021 workshop on Structured Representations for Video Understanding. We're inviting submissions of either recently published or unpublished works. Deadline: Aug 27th Details: sites.google.com/view/srvu-iccv…
Pascal Mettes@PascalMettes

It’s time to discuss: what is the best structure for representing videos and what is the way forward in video understanding? We are eager to hear your views at our #ICCV2021 workshop on Structured Representations for Video Understanding Submission: Aug 27 sites.google.com/view/srvu-iccv…

English
0
5
20
0
Dídac Surís retweetledi
Pascal Mettes
Pascal Mettes@PascalMettes·
It’s time to discuss: what is the best structure for representing videos and what is the way forward in video understanding? We are eager to hear your views at our #ICCV2021 workshop on Structured Representations for Video Understanding Submission: Aug 27 sites.google.com/view/srvu-iccv…
Pascal Mettes tweet media
English
1
17
62
0
Dídac Surís retweetledi
Pascal Mettes
Pascal Mettes@PascalMettes·
We will have a full day with keynotes and accepted oral/poster presentations. We accept submissions for unpublished work and work published at a recent conference/journal (incl. ICCV’21) Organized with: @cvondrick @Surisdi @doughty_hazel @MikeShou1 Shih-Fu Chang @CordeliaSchmid
English
0
2
7
0
Dídac Surís
Dídac Surís@Surisdi·
@mayfer @cvondrick Thanks for you interest! No, we don't manually tag abstract predictions as closer to the center. This is learned by the model and it is a natural result of using hyperbolic geometry.
English
1
0
1
0
murat 🍥
murat 🍥@mayfer·
@Surisdi @cvondrick thanks for the presentation & can't wait to see more about your work. i do have one question. in your video prediction use case, was the hierarchical classification supervised? (manually tagging more abstract predictions as closer to center?)
English
2
0
1
0
murat 🍥
murat 🍥@mayfer·
seems like neural network classifiers will do so much better in hyperbolic instead of linear space a linear average ends up somewhere between two points, boring. a hyperbolic average ends up closer to the center - more generalized, more abstract youtube.com/watch?v=-Uy92j…
YouTube video
YouTube
murat 🍥 tweet media
English
1
0
6
0
Dídac Surís retweetledi
Carl Vondrick
Carl Vondrick@cvondrick·
The future is hard to anticipate! In our latest #CVPR2021 paper, we introduce a framework for learning *what* is predictable in the future. Rather than committing up front to categories to predict, our approach learns how to hedge the bet. hyperfuture.cs.columbia.edu
English
9
44
269
0