David Page

180 posts

David Page

@dcpage3

Machine learning researcher

Beigetreten Nisan 2018

1K Folgt2.4K Follower

Angehefteter Tweet

David Page@dcpage3·20 Ağu

Ever wanted to train CIFAR10 to 94% in 26 SECONDS on a single-GPU?! In the final post of our ResNet series, we open a bag of tricks and drive training time ever closer to zero... Colab: colab.research.google.com/github/davidcp… Blog: myrtle.ai/how-to-train-y…

English

365

1.3K

David Page@dcpage3·27 Ara

@badlogicgames My git repo 🙈 github.com/davidcpage/ski…

English

Mario Zechner@badlogicgames·27 Ara

Anytime someone posts "skill issues"/"ngmi", ask them to show you what they build, preferably with a link to a git repo.

English

4.6K

David Page@dcpage3·5 Nis

@jeremyphoward @hi_tysam @kellerjordan0 @karpathy Impressive, well done! Hi Jeremy! Still lurk here occasionally and hopefully start blogging again soon..

English

207

Jeremy Howard@jeremyphoward·5 Nis

@hi_tysam @kellerjordan0 @dcpage3 @karpathy This is awesome. Love it :D I hope David sees it too, although I haven't seen him on Twitter for a long time.

English

497

Keller Jordan@kellerjordan0·4 Nis

New training speed record for CIFAR-10: 94% accuracy in 3.29 seconds on a single GPU Paper: arxiv.org/abs/2404.00498 Code: github.com/KellerJordan/c…

English

363

73.1K

David Page@dcpage3·1 May

@samgd @jeremyphoward @nanopore Yep not me this time! Looks like fascinating work from Giuseppe though

English

Jeremy Howard@jeremyphoward·1 May

I'm a bit late to the party, but I just noticed that @dcpage3 has been writing an awesome series on Vision Transformers over the last few years: myrtle.ai/learn/leo-1-lo…

English

164

David Page@dcpage3·22 Mar

@JoaquinAlori Thank you!

English

Joaquin Alori@JoaquinAlori·21 Mar

@dcpage3 Links seems to be broken, here is a working one myrtle.ai/learn/how-to-t… Thanks for the great thread by the way!

English

David Page@dcpage3·11 Eyl

The paper that introduced Batch Norm arxiv.org/abs/1502.03167 combines clear intuition with compelling experiments (14x speedup on ImageNet!!) So why has 'internal covariate shift' remained controversial to this day? Thread 👇

English

313

1.1K

David Page retweetet

Zeyuan Allen-Zhu, Sc.D.@ZeyuanAllenZhu·20 Oca

Excited to announce our new work, a unified theory towards explaining 3 black magics in deep learning: (1) ensemble, (2) knowledge distillation, and (3) self-distillation. An accessible blog post is below.

Microsoft Research@MSFTResearch

Microsoft and CMU researchers begin to unravel 3 mysteries in deep learning related to ensemble, knowledge distillation & self-distillation. Discover how their work leads to the first theoretical proof with empirical evidence for ensemble in deep learning: aka.ms/AAavp1k

English

258

David Page@dcpage3·28 Eki

@bozavlado @iiSeymour CTC_CRF extends flipflop to output scores for multiple (six) consecutive bases not just two. Output layer is mostly orthogonal to choice of RNN/CNN encoder so CNN improvements are very welcome! More details coming soon..

English

Vlado Boza@bozavlado·28 Eki

@iiSeymour Oh the pair decoding! BTW, is there any description what CTC_CRF does? Is it flipflop? Or something even completelly different?

English

Vlado Boza@bozavlado·28 Eki

Well, it seems that our bag of improvement for bonito is sort of useless, since they switched back from CNNs to RNNs :)

Clive G. Brown@The__Taybor

I think its out : github.com/nanoporetech/b…

English

David Page retweetet

Chris Seymour@iiSeymour·20 Eki

Big accuracy update coming in the next version of Bonito 🚀 v0.3.0 combines everything we have learned with structured and unstructured approaches - @dcpage3, Tim and myself are working hard on the finished touches this week - watch this space 👀

Clive G. Brown@The__Taybor

Some base-caller updates coming within 5-10 days. 98%+modal and many reads above Q20. Note the X-axis. Generally, sig +ve uplift in consensus and mutation detection. Slightly slower speed in research version.

English

David Page@dcpage3·6 Eki

@TMVector @nanopore Thanks Jonny, that’s very kind!

English

Jonny@TMVector·6 Eki

@dcpage3 @nanopore Congratulations and all the best in your new role! I hope they know how fantastic a hire they've made 😎

English

David Page@dcpage3·5 Eki

First day of new job @nanopore where I get to apply ML to a bunch of fun science and engineering problems. Pretty excited!

English

David Page@dcpage3·5 Eki

@achacond @nanopore Thanks, looking forward to it!

English

Alejandro (Alex) Chacon@achacond·5 Eki

@dcpage3 @nanopore Congrats for the position! Enjoy the new Journey!

English

David Page@dcpage3·5 Eki

@Sisseljuul @nanopore Thank you!

English

Sissel Juul@Sisseljuul·5 Eki

@dcpage3 @nanopore Welcome!

Manhattan, NY 🇺🇸 English

David Page retweetet

Chris Seymour@iiSeymour·18 Ağu

Bonito v0.2.2 - SAM output - Sequence and alignment tsv summaries - Grab bag of training improvements from @dcpage3 github.com/nanoporetech/b…

English

David Page retweetet

Alex Thiery@alexxthiery·15 Haz

Preparing a short course on neural nets can be fun. Below is one of the fast Resnets by @dcpage3 on CIFAR10. Would have been nice to track a UMAP-like representation of some internal layer, but have not found a reasonably fast/stable way to do so. Any idea? @NikolayOskolkov

English

David Page@dcpage3·6 Mar

Undertraining a large model is a good way to speed things up on toy problems myrtle.ai/how-to-train-y… but it was far from clear this should extend to large scale.

English

David Page@dcpage3·6 Mar

Great study of training efficiency at large scale + nice results on compression for inference!

Eric Wallace@Eric_Wallace_

Not everyone can afford to train huge neural models. So, we typically *reduce* model size to train/test faster. However, you should actually *increase* model size to speed up training and inference for transformers. Why? [1/6] 👇 bair.berkeley.edu/blog/2020/03/0… arxiv.org/abs/2002.11794

English

David Page@dcpage3·15 Şub

Simple setup + attention to details -> sota self-supervised reps! LARS -> large batches -> no need for memory bank of -ve examples Random crops + color aug (to prevent hist cheating) -> no need for special arch Projn head for contrastive loss -> hidden reps preserve info

Ting Chen@tingchenai

Introducing SimCLR: a Simple framework for Contrastive Learning of Representations. SimCLR advances previous SOTA in self-supervised and semi-supervised learning on ImageNet by 7-10% (see next). arxiv.org/abs/2002.05709 Joint work with @skornblith @mo_norouzi @geoffreyhinton.

English

David Page retweetet

Chris Seymour@iiSeymour·13 Şub

Blitzing fast CTC decoding github.com/nanoporetech/f…

English

David Page@dcpage3·28 Oca

@Buntworthy It's fixed now. Thanks for letting me know!

English

Justin Pinkney@Buntworthy·27 Oca

@dcpage3 FYI one of the images in Part 2 of "how to train your resnet" is broken. myrtle.ai/how-to-train-y…

English

David Page retweetet

Jeremy Howard@jeremyphoward·9 Oca

@ylecun @viglovikov @timetravellertt @kaggle The problem though with "you can always add those tricks to get the numbers up" is that *very* often I see papers that don't do data aug, or don't tune hyper-params, etc, then claim their new idea helps. But then I find it's actually just a poor proxy for the things they skipped

English

David Page retweetet

Jeremy Howard@jeremyphoward·30 Ara

@ID_AA_Carmack If you're interested in training small accurate nets efficiently, then the best in the world is @dcpage3, and he told all his secrets in this amazing series: myrtle.ai/how-to-train-y…

English

177

Entdecken

@badlogicgames @jeremyphoward @hi_tysam @kellerjordan0 @karpathy @samgd @nanopore @JoaquinAlori