Danny Bickson

6 posts

Danny Bickson

Danny Bickson

@BicksonDanny

Beigetreten Ocak 2023
0 Folgt11 Follower
Danny Bickson retweetet
Visual Layer
Visual Layer@visual_layer·
Here are some of the quality issues that you may find: ‣ Duplicates ‣ Outliers ‣ Mislabels ‣ Corrupted images ‣ Train/test leakage ‣ Overly bright/dark/blurry images Notebooks: ‣ Kaggle Notebook - lnkd.in/gs8mc6Kx ‣ Colab Notebook - lnkd.in/gTHYJqdE
English
1
1
1
56
Danny Bickson
Danny Bickson@BicksonDanny·
@nietras1 we have just packaged Meta's #dinov2 model using #fastdup. It should be super easy to run: import fastdup fd=fastdup.create(input_dir=<your images>, work_dir=<output folder>) fd.run(model_path='dinov2s') fd.vis.component_gallery() LAION results:
Danny Bickson tweet media
English
1
1
3
74
Danny Bickson retweetet
Eric Wallace
Eric Wallace@Eric_Wallace_·
See our paper for a lot more technical details and results. Speaking personally, I have many thoughts on this paper. First, everyone should de-duplicate their data as it reduces memorization. However, we can still extract non-duplicated images in rare cases! [6/9]
Eric Wallace tweet media
English
6
18
475
102K
Eric Wallace
Eric Wallace@Eric_Wallace_·
Models such as Stable Diffusion are trained on copyrighted, trademarked, private, and sensitive images. Yet, our new paper shows that diffusion models memorize images from their training data and emit them at generation time. Paper: arxiv.org/abs/2301.13188 👇[1/9]
Eric Wallace tweet media
English
150
1.7K
8.7K
3M
Suhail
Suhail@Suhail·
The ideal experience is some kind of Mixpanel for large AI training datasets + a built in internal team data labeling system just so you can grasp how good your own accuracy might be. Upload data and just plot/graph/visualize all kinds of things.
English
12
3
53
13.4K
Suhail
Suhail@Suhail·
There's a pretty big gap in a product that helps you understand your datasets better before AI training. Things like rapidly seeing similar or outlier samples across huge datasets, deduping, letting you label some of your own data to build intuition, etc. Someone should build it.
English
55
26
370
136.6K
Danny Bickson
Danny Bickson@BicksonDanny·
@Suhail Hi @Suhail I am the co-creator of fastdup, it is great to learn you managed to clean 500,000 in one hour including learning our tool. We would love to hear what are you doing with images and see if we can help in any way!
English
0
0
1
27
Suhail
Suhail@Suhail·
fastdupe worked great. Already on my way.
English
3
2
85
15.9K
Suhail
Suhail@Suhail·
What is the fastest way to de-dupe near duplicate images across 500K+ images?
English
38
5
185
180K