Guillaume Leclerc

25 posts

Guillaume Leclerc

Guillaume Leclerc

@gpoleclerc

PhD Student @ MIT

Cambridge, MA Katılım Nisan 2018
13 Takip Edilen222 Takipçiler
Preetum Nakkiran
Preetum Nakkiran@PreetumNakkiran·
@dustinvtran This was exactly my complaint -- I just want the pytorch imagenet example but faster. And bare minimal hacks -- no adversarial-mix-up-erasing-dropsmooth with 10 hyperparameters...
English
1
0
4
0
Preetum Nakkiran
Preetum Nakkiran@PreetumNakkiran·
misc question but are there good benchmarks for training "hackable" models quickly? for research, "ImageNet in 1 hr" is useless to me if changing 2 lines makes it "NaN in 5 hrs"
English
8
0
39
0
Guillaume Leclerc
Guillaume Leclerc@gpoleclerc·
@jefrankle @PreetumNakkiran (3) because FFCV does GPU augmentation and data movement in parallel it might still give you a little boost in other cases. It's always worth a try. Feel free to ask technical questions on our slack!
English
0
0
0
0
Guillaume Leclerc
Guillaume Leclerc@gpoleclerc·
@jefrankle @PreetumNakkiran @jefrankle. From our experience, yes. (1) First case is when you are IO bottlenecked, with the appropriate parameters FFCV will dramatically improve the throughput you get from your storage. (2) FFCV makes it easy to move augmentation from/to the CPU to maximize speed. 1/2
English
1
0
1
0
Guillaume Leclerc
Guillaume Leclerc@gpoleclerc·
@ArashVahdat - Allows declaring arguments where they need to be - Allows capturing the arguments where they are needed - Supports the definition of arguments through a combination of both config files (easy to checkout on git), and CLI for env dependent args github.com/GuillaumeLecle…
English
0
0
1
0
Arash Vahdat
Arash Vahdat@ArashVahdat·
Machine learning Twitter: how do you pass a large list of arguments to your python training scripts? If you are happy with any other library please comment below.
English
47
6
90
0
odbol
odbol@odbol·
@aleks_madry Neat! Could this work with tensorflow as well?
English
2
0
0
0
Guillaume Leclerc retweetledi
Aleksander Madry
Aleksander Madry@aleks_madry·
ImageNet is the new CIFAR! My students made FFCV (ffcv.io), a drop-in data loading library for training models *fast* (e.g., ImageNet in half an hour on 1 GPU, CIFAR in half a minute). FFCV speeds up ~any existing training code (no training tricks needed) (1/3)
Aleksander Madry tweet media
English
28
368
1.8K
0
Guillaume Leclerc
Guillaume Leclerc@gpoleclerc·
@crude2refined @aleks_madry Colab only runs python 3.7 which doesn't include `multiprocessing.shared_memory`. Therefore the earliest compatible version is 3.8 :/ As soon as Colab updates python we will have an example notebook!
English
0
0
0
0
Guillaume Leclerc
Guillaume Leclerc@gpoleclerc·
@yanndubs @aleks_madry @williamfalcon @PyTorchLightnin We have been using PTL with FFCV in our lab with success for quite a bit now. They are definitely complementary. The only caveat is that one has to override a few things from PTL. We will release a demo soon but feel free to join our slack it has been discussed there.
English
0
0
12
0
Guillaume Leclerc
Guillaume Leclerc@gpoleclerc·
@code_star @giffmana We have been using FFCV internally on shared clusters with many different GPUs including V100s, 2080ti, 1080ti and it really helped a lot, especially since most of these clusters use network attached storage and don't have fast local storage and you share CPU with other users.
English
0
0
5
0
Cody Blakeney
Cody Blakeney@code_star·
@giffmana I’m not sure how much it would speed up training on most department servers anyways (Without A100s). Assuming you have 2080tis, 3090s, or even v100s I don’t know that you get the specific speed up benefits they demonstrate.
English
1
0
4
0
Guillaume Leclerc
Guillaume Leclerc@gpoleclerc·
@Anshumali_ @aleks_madry We did try it (it was definitely better but still much slower than FFCV), but due to lack of good interop with PyTorch and the fact that webdataset is meant to fulfill the same function, we decided to stick with the latter for our thorough benchmarking.
English
1
0
3
0
Guillaume Leclerc
Guillaume Leclerc@gpoleclerc·
@jacobgorm @aleks_madry @schrep JPEG and RAW are just two example data types that FFCV can work with. It's really easy to add other Field Types. You can either keep it for yourself or submit a pull request! We would love to have WEBP support.
English
0
0
5
0
Guillaume Leclerc
Guillaume Leclerc@gpoleclerc·
@RafailFridman @aleks_madry @ml_norms If it is sampled only once (i.e., getitem returns the same thing for the same index), FFCV can be used out of the box! Otherwise, you can (1) have getitem return the parameters of the distribution/do any needed pre-processing (2) use FFCV's fast data pipeline to do the sampling.
English
0
0
0
0
Guillaume Leclerc
Guillaume Leclerc@gpoleclerc·
@PhongStormVN @aleks_madry While we haven't personally experienced with this one, FFCV was designed to accommodate virtually any dataset. In the case of COCO (segmentation map), one can easily store the segmentation map in an additional field. Feel free to join our slack if you need help getting started!
English
1
0
2
0
Phong Nguyen-Ha
Phong Nguyen-Ha@PhongStormVN·
@aleks_madry Hi, does this library works on different dataset for different tasks. For example, coco for object detection?
English
1
0
0
0
Michal Wolski
Michal Wolski@michalwols·
@kevin_zakka @chriswolfvision @aleks_madry @soumithchintala They preprocess the dataset to a smaller size, cache all of it in ram, use progressive resizing, test time augmentation and tuned cyclical learning rate. I'm pretty sure the baselines they compare against are not optimized to saturate 8 A100s.
English
2
3
29
0