Pierre Foret

84 posts

Pierre Foret

@Foret_p

QR at Citadel Securities. Ex Google AI resident.

New York, USA Katılım Ekim 2015

233 Takip Edilen407 Takipçiler

Sabitlenmiş Tweet

Pierre Foret@Foret_p·6 Eki

Introducing SAM: An easy-to-use algorithm derived by connecting PAC Bayesian bounds and geometry of the loss landscape. Achieves SOTA on benchmark image tasks (0.3% error on CIFAR10, 3.9% on CIFAR100) and drastically improves label noise robustness. arxiv.org/abs/2010.01412

English

152

Pierre Foret retweetledi

Maksym Andriushchenko@maksym_andr·15 Haz

Excited to share our #ICML2022 paper "Towards Understanding Sharpness-Aware Minimization"! Why does m-sharpness matter in m-SAM? Can we explain the benefits of m-SAM on simple models? Which other interesting properties does m-SAM show? Paper: arxiv.org/abs/2206.06232 🧵1/n

English

196

Pierre Foret retweetledi

UCL CSML@uclcsml·21 Şub

Excited to host @TheGradient to talk about the current state of Sharpness-Aware Minimization (SAM) and future directions next Friday 25th Feb 5pm (GMT time) Zoom details: ucl-ellis.github.io/dm_csml_semina…

English

Pierre Foret retweetledi

Hossein Mobahi@TheGradient·29 Oca

Are you a strong PhD student interested in doing cutting edge research at @GoogleAI? I have an opening for student researcher position to explore open problems and extensions of Sharpness-Aware Minimization (SAM) w/ @bneyshabur. Please refer to tinyurl.com/4nfarsvt.

English

118

Pierre Foret@Foret_p·12 Kas

@_arohan_ @TheGradient 1024 samples on 64 replicas seems ideal. Is the perturbation scaled by anything ?

English

Pierre Foret@Foret_p·12 Kas

@_arohan_ @TheGradient Indeed, not syncing the perturbations is pretty critical to SAM's success (see the section about M-sharpness in the paper)

English

Pierre Foret retweetledi

Aran Komatsuzaki@arankomatsuzaki·19 Eki

Sharpness-Aware Minimization Improves Language Model Generalization SAM substantially improves performance on SuperGLUE, GLUE, Web Questions, Natural Questions, Trivia QA, and TyDiQA by encourageing convergence to flatter minima w/ minimal overhead. arxiv.org/abs/2110.08529

English

Pierre Foret@Foret_p·23 Eyl

@thanhnguyentang @matthen2 If each particle is independent, each particle probably only need to keep the random seed used to generate the path increments

English

Thanh Nguyen-Tang@thanhnguyentang·23 Eyl

@matthen2 Each particle must have a path information so that a winning one can be traced back. It seems implausible (especially the particles do not share information) to keep path info for a huge number of particles for as long as a winning one is determined.

English

Matt Henderson@matthen2·22 Eyl

the dumbest way to solve a maze? simulate a gas of thousands of particles diffusing from the start point, until one particle reaches the exit. trace back the winning particle

English

404

30.1K

Pierre Foret retweetledi

AK@_akhaliq·4 Haz

When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations pdf: arxiv.org/pdf/2106.01548… abs: arxiv.org/abs/2106.01548 +5.3% and +11.0% top-1 accuracy on ImageNet for ViT-B/16 and MixerB/16, with the simple Inception-style preprocessing

English

124

474

Pierre Foret retweetledi

Olivier Grisel@ogrisel·4 Haz

Interesting empirical study of the geometry of the loss landscape of Vision Transformers and MLP-Mixers and study of the critical impact of Sharpness Aware Minimization (SAM) for those architectures.

AK@_akhaliq

English

Pierre Foret retweetledi

Hossein Mobahi@TheGradient·28 May

Excited to see Sharpness-Aware Minimization (SAM optimizer) we have proposed recently (w/ @Foret_p @bneyshabur and Kleiner) is becoming a persistent component in recent state-of-the-art records 😇

AK@_akhaliq

Drawing Multiple Augmentation Samples Per Image During Training Efficiently Decreases Test Error pdf: arxiv.org/pdf/2105.13343… abs: arxiv.org/abs/2105.13343 ImageNet SOTA of 86.8% top-1 accuracy after just 34 epochs of training with an NFNet-F5 using the SAM optimizer

English

Pierre Foret retweetledi

Behnam Neyshabur@bneyshabur·30 Nis

Sharpness-Aware Minimization for Efficiently Improving Generalization (Spotlight at #ICLR2021 ) with @Foret_p, Ariel Kleiber and @TheGradient Paper: openreview.net/forum?id=6Tm1m… Code: github.com/google-researc… Video and Poster: iclr.cc/virtual/2021/p… 3/7

English

Pierre Foret retweetledi

KDnuggets@kdnuggets·18 Mar

We don’t need to worry about #Overfitting anymore? Sharpness-Aware Minimization, seeks parameters that lie in neighborhoods having uniformly low loss; results in a min-max optimization formulation with efficient gradient descent #MachineLearning buff.ly/38VJTOf

English

Pierre Foret@Foret_p·25 Şub

@RisingSayak Great stuff! Is this syncing epsilon across replicas ? On a TPU (8 chips for this one I think?) I would expect the benefits of SAM to be amplified by not syncing epsilon accross the devices (one perturbation per sub-batch). Could be a cool improvement if it's not already the case

English

Sayak Paul@RisingSayak·25 Şub

@TensorFlow implementation of sharpness-aware minimization (SAM), a recipe that seeks params in neighborhoods having a uniformly low loss. The repo includes a @GoogleColab demoing SAM on CIFAR10 with TPUv2-8 hardware. Cc: @Foret_p github.com/sayakpaul/Shar…

English

Pierre Foret@Foret_p·22 Şub

@imos You can of course emulate this on a single device with data accumulation, but it becomes tedious and the wall clock time might suffer (although NFNet using a subset of the batch to compute the SAM epsilon is a great trick)

English

いもす@imos·22 Şub

@Foret_p SAM is very impressive to me! Can I ask why SAM is used only in the largest model of pre-trained NFNet models? I guess that SAM behaves like finding an ensemble solution efficiently and needs more parameters to represent it. Do you have any observations?

English

Pierre Foret@Foret_p·22 Şub

@imos so SAM on TPU minimize m-sharpness for a small m, which leads to the biggest boosts. That's why I assume we will mostly see SAM applied to larger nets that require TPU or multiple GPU, where it really shines. 3/3

English

Pierre Foret@Foret_p·22 Şub

@imos SAM usually works well for smaller models, but the best results are obtained when using a lot of data parallelism (see section about M-sharpness in the SAM paper). Because the largest nets are trained on a lot of tpu chips, each chip computes epsilon for few samples... 2/3

English

Pierre Foret retweetledi

Andy Brock@ajmooch·17 Şub

Pretrained NFNet model weights (F0-F5, F6+SAM) are now available at dpmd.ai/nfnets, along with a demo Colab! All models are pretrained on ImageNet.

Andy Brock@ajmooch

Our most recent work on training Normalizer-Free nets! We focus on developing performant architectures which train fast, and show that a simple technique (Adaptive Grad Clipping, or AGC) allows us to train with large batches and heavy augmentations and reach state-of-the-art.

English

150

Pierre Foret retweetledi

AK@_akhaliq·12 Şub

High-Performance Large-Scale Image Recognition Without Normalization pdf: arxiv.org/pdf/2102.06171… abs: arxiv.org/abs/2102.06171 github: github.com/deepmind/deepm…

English

240

Keşfet

@TheGradient @GoogleAI @bneyshabur @_arohan_ @thanhnguyentang @matthen2 @RisingSayak @TensorFlow