Danny Bickson

6 posts

Danny Bickson

@BicksonDanny

Beigetreten Ocak 2023

0 Folgt11 Follower

Danny Bickson retweetet

Visual Layer@visual_layer·11 Ağu

Here are some of the quality issues that you may find: ‣ Duplicates ‣ Outliers ‣ Mislabels ‣ Corrupted images ‣ Train/test leakage ‣ Overly bright/dark/blurry images Notebooks: ‣ Kaggle Notebook - lnkd.in/gs8mc6Kx ‣ Colab Notebook - lnkd.in/gTHYJqdE

English

Danny Bickson@BicksonDanny·21 Nis

@nietras1 we have just packaged Meta's #dinov2 model using #fastdup. It should be super easy to run: import fastdup fd=fastdup.create(input_dir=<your images>, work_dir=<output folder>) fd.run(model_path='dinov2s') fd.vis.component_gallery() LAION results:

English

Danny Bickson@BicksonDanny·1 Şub

@Eric_Wallace_ Thanks for featuring our github repo fastdup title image! github.com/visualdatabase… Everyone should try us out for deduplicating large scale image repos. It is free!

English

Danny Bickson retweetet

Eric Wallace@Eric_Wallace_·31 Oca

See our paper for a lot more technical details and results. Speaking personally, I have many thoughts on this paper. First, everyone should de-duplicate their data as it reduces memorization. However, we can still extract non-duplicated images in rare cases! [6/9]

English

475

102K

Eric Wallace@Eric_Wallace_·31 Oca

Models such as Stable Diffusion are trained on copyrighted, trademarked, private, and sensitive images. Yet, our new paper shows that diffusion models memorize images from their training data and emit them at generation time. Paper: arxiv.org/abs/2301.13188 👇[1/9]

English

150

1.7K

8.7K

Danny Bickson@BicksonDanny·19 Oca

@Suhail Hi @Suhail we are building exactly that, and coincidentally you already used us! github.com/visual-layer/f… We would love to chat if you are open to it and explore collaboration.

English

Suhail@Suhail·18 Oca

The ideal experience is some kind of Mixpanel for large AI training datasets + a built in internal team data labeling system just so you can grasp how good your own accuracy might be. Upload data and just plot/graph/visualize all kinds of things.

English

13.4K

Suhail@Suhail·18 Oca

There's a pretty big gap in a product that helps you understand your datasets better before AI training. Things like rapidly seeing similar or outlier samples across huge datasets, deduping, letting you label some of your own data to build intuition, etc. Someone should build it.

English

370

136.6K

Danny Bickson@BicksonDanny·17 Oca

@Suhail Hi @Suhail I am the co-creator of fastdup, it is great to learn you managed to clean 500,000 in one hour including learning our tool. We would love to hear what are you doing with images and see if we can help in any way!

English

Suhail@Suhail·13 Oca

fastdupe worked great. Already on my way.

English

15.9K

Suhail@Suhail·13 Oca

What is the fastest way to de-dupe near duplicate images across 500K+ images?

English

185

180K

Entdecken

@nietras1 @Eric_Wallace_ @Suhail @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates