Guillaume Sautière

975 posts

Guillaume Sautière

@gsautiere

machine learning researcher in video and audio compression at qualcomm ai research amsterdam. electronic music is my second life. he/him

Amsterdam, The Netherlands Katılım Nisan 2013

663 Takip Edilen331 Takipçiler

Sabitlenmiş Tweet

Guillaume Sautière@gsautiere·20 Şub

The code for clockwork is now open-source! It allows up to 40% time savings with close to no loss in percept. quality w/o any finetuning required. Add a single line to your pipeline synthesis code and see for yourself! github.com/Qualcomm-AI-re…

Amir Habibian@amir_habibian

Do we need to run the whole UNet for all the diffusion steps? No! Accelerating your Diffusion Model with a simple trick, even without retraining ?! w/ Amir Ghodrati, @noor_fathima_ , @gsautiere, Fatih Porikli and @peterjensen_ arxiv.org/abs/2312.08128 Code: Stay tuned

English

Guillaume Sautière@gsautiere·9 Kas

@Ilaria__Manco @GoogleDeepMind Congratulations! Insane run so far!

English

Ilaria Manco@Ilaria__Manco·7 Kas

I recently submitted my PhD thesis on “Learning Music Representations from Audio and Language”! A bit nervous to wrap up such an incredible period of my life, but super excited for what comes next. After the defence, I'll be joining @GoogleDeepMind, this time no longer a student!

English

344

26.5K

Guillaume Sautière retweetledi

Johann Brehmer@johannbrehmer·31 Eki

Does equivariance matter when you have lots of data and compute? In a new paper with Sönke Behrends, @pimdehaan, and @TacoCohen, we collect some evidence. arxiv.org/abs/2410.23179 1/7

English

310

69.2K

Guillaume Sautière@gsautiere·17 Eki

@viggiebirodkar @pa9501460 Hello Vignesh, looking forward to reading your work. It seems quite similar to our work on image compression using diffusion. See A Residual Diffusion Model for High Perceptual Quality Codec Augmentation arxiv.org/abs/2301.05489. A diff is that we model the residual with diff

English

455

Guillaume Sautière retweetledi

𝘼𝙈𝘼𝙂𝙄@amagitakayosi·5 Eki

I wrote a datamosh camera using WebCodecs #Codepen

English

270

3.3K

342.1K

Guillaume Sautière retweetledi

Soumith Chintala@soumithchintala·2 Eki

There's three parts. 1. Fitting as large of a network and as large of a batch-size as possible onto the 10k/100k/1m H100s -- parallelizing and using memory-saving tricks. 2. Communicating state between these GPUs as quickly as possible 3. Recovering from failures (hardware, software, etc.) as quickly as possible 1. Fitting as large of a network and as large of a batch-size as possible onto the 10k H100s. Parallelizing: 1. parallelize over batches 2. parallelize over layers (i.e. split a layer across GPUs) 3. parallelize across layers (i.e. 1 to N are on GPU1, N+1th layer to N+10th layer are on GPU2) Keep parallelizing until you are able to use all GPUs well, with maximum utilization. Checkpointing / Compute vs memorize: * You need to save certain terms from forward to compute the backprop (save_for_backward). However, if the network is sufficiently large, it is more profitable to free these terms in order to fit a larger batch-size, and recompute them again when you need them to compute the backprop. * Tricks like FSDP discard parts of weights that are held in one GPU (to save memory), and ask for the shards of weights from other GPUs right before they need them. 2. Communicating state between these GPUs as quickly as possible Communication overlap: When you need to communicate among GPUs, try to start communication as soon as you can: * Exampel: when Nth layer is done with backward, while N-1th layer is computing backward, all GPUs with an Nth layer can all-reduce their gradients) Discover and leverage the underlying networking topology: Communicating large amounts of state (gradients, optimizer state) across multiple nodes is complicated. with Sync SGD, you have to communicate this state in a burst, as quickly as you can. we might have multiple layers of switches, and have RDMA (ability to copy GPU memory directly to NIC, bypassing CPU ram entirely), and have frontend and backend NICs (frontend connects to storage like NFS, backend connects GPUs to other GPUs in cluster). So, it's important to leverage all this info when running communication collectives like all-reduce or scatter/gather. All-reduce for example can be done algorithmically in log(n) if you tree-reduce; and the constant factors that change based on the type of fiber connecting one node to another in the tree of networking fiber is important to reduce overall time and latency. Libraries like NCCL do sophisticated discovery of the underlying networking topology and leverage them when we run all-reduce and other collectives. 3. Recovering from failures (hardware, software, etc.) as quickly as possible At 10k GPU scale, things fail all the time -- GPUs, NICs, cables, etc. Some of these failures are easy to detect quickly, some of them you can only detect because one node isn't replying back in time (say a NCCL all-reduce is stuck). We build various tools to monitor and detect fleet health, and remove failed nodes from the fleet as quickly as possible. This is quite hard. Separately, at this large of a scale you can have silent data corruptions from memory bits flipping randomly (due to basic physics and amplifying the probability at this scale), and you suddenly have loss-explosions for no reason other than this random phenomenon. These happen at small-scale too, but very very infrequently so you barely notice. This is very hard to detect before-hand in software. Some hardware has hardware circuitry that does built-in checksums after it computes things -- this way if bit-flips occur the hardware can throw an interrupt. H100s and previous NVIDIA GPUs don't have this feature. To counter all these failures, you would want to save your model state as frequently and as quickly as you can; and when a failure occurs, you want to recover and continue as quickly as you can. Usually, we save model state really quickly to CPU memory in a separate thread and in the background we save from CPU memory to disk or remote storage. We also save model state in shards (this is torch.distributed's checkpointing feature), i.e. not every GPU needs to save all of the model weights; each GPU only needs to save a portion of weights -- and they can recover the other part of weights from other GPU shard checkpoints.

English

167

1.3K

241.5K

Guillaume Sautière retweetledi

Maxime Labonne@maximelabonne·30 Eyl

This is the proudest release of my career :) At @liquidai, we're launching three LLMs (1B, 3B, 40B MoE) with SOTA performance, based on a custom architecture. Minimal memory footprint & efficient inference bring long context tasks to edge devices for the first time!

English

201

1.7K

166.8K

Guillaume Sautière retweetledi

Jinsheng Wang@wolfwjs·30 Eyl

💗Excited to share Emu3 with you all! #Emu3 trained entirely on next-token prediction, unifies generation and perception, surpassing leading task-specific models. Github: github.com/baaivision/Emu3 HF space: huggingface.co/spaces/BAAI/Em…

English

656

Guillaume Sautière retweetledi

Dr Thomas Guénolé@thomas_guenole·22 Eyl

"Je pense qu'il faut interdire le Rassemblement national. C'est prévu par l'article L212-1 du Code de sécurité intérieure. Motif d'interdiction: provocation à la discrimination et à la haine. En l'occurrence envers la population d'origine maghrébine." @franceinfo #RN

Français

3.7K

1.1K

3.7K

765.9K

Guillaume Sautière retweetledi

kyutai@kyutai_labs·18 Eyl

Mimi is a neural audio codec that improves over SoundStream and Encodec by jointly modeling semantic and acoustic information using distillation, inspired by SpeechTokenizer. Not only its improved architecture and adversarial training make it outperform SpeechTokenizer, RVQGAN and SemantiCodec, but we designed Mimi specifically for working with LLMs: it operates at 12.5Hz and 1.1kbps, while being fully causal and thus provides ideal tokens for a streaming Transformer. ⬇️

English

14.1K

Guillaume Sautière retweetledi

Karen Hao@_KarenHao·13 Eyl

To the public, Microsoft uses its reputation as an AI & sustainability leader to tell a compelling story: AI will do wonders to help solve the climate crisis. To fossil-fuel firms, Microsoft has a different message: AI will help them drill, baby, drill. 1/ theatlantic.com/technology/arc…

English

788

1.6K

247K

Guillaume Sautière retweetledi

Sander Dieleman@sedielem·2 Eyl

Diffusion is the rising tide that eventually submerges all frequencies, high and low 🌊 Diffusion is the gradual decomposition into feature scales, fine and coarse 🗼 Diffusion is just spectral autoregression 🤷🌈

GIF

English

155

1.1K

277.9K

Guillaume Sautière retweetledi

Johann Brehmer@johannbrehmer·28 Ağu

Come work with us at @cusp_ai! We're looking for engineers, experience with ML and materials is a plus. You'll join a small but great team in Amsterdam, Berlin, or Cambridge. If all goes well, your work here might contribute to capturing some carbon...

CuspAI@cusp_ai

🚀We're looking for ML Engineers to join our ambitious team led by @wellingmax. If you're passionate about machine learning and want to make an impact, we'd love to hear from you! Full Details: linkedin.com/jobs/view/4011… #MLEngineering #Hiring #Climate

English

9.2K

Guillaume Sautière retweetledi

Johann Brehmer@johannbrehmer·26 Ağu

I have a new job: Today's my first day at @cusp_ai. We'll work on ML-based material discovery, in particular for carbon capture. I'm joining a great team led by @wellingmax and @ac_edwards_1, and a shiny new office in Amsterdam. I'm super excited.

English

153

11.8K

Guillaume Sautière retweetledi

Aran Komatsuzaki@arankomatsuzaki·8 Tem

Learning to (Learn at Test Time): RNNs with Expressive Hidden States - Performs Linear-time RNN by propagating the gradient to the next step, i.e., test-time training - Achieves better perplexity than Mamba arxiv.org/abs/2407.04620

English

404

120.2K

Guillaume Sautière retweetledi

Josef Dean@JosefNDean·20 Ağu

Sure matplotlib is cool, but what if I want to load my loss curves into the 2006 hit Flash game LineRider?

English

807

6.4K

437.9K

Guillaume Sautière retweetledi

Taco Cohen@TacoCohen·12 Ağu

Fascinating paper, showing that transformers are energy-based models in disguise .. And this insight leads to an efficient decoding algorithm

Vasu Shyam@vasud3vshyam

Ever looked at the attention operation and said "hang on, that's a one-point function!"?

English

573

63.4K

Guillaume Sautière retweetledi

Soumith Chintala@soumithchintala·7 Ağu

@lucidrains joined the @PyTorch team at Meta this June, doing what they do best -- working on open source implementations of important AI work. So proud to have them on the team! (just announcing after realizing we never did)

/@gazorp5

time to panic

English

486

45.9K

Guillaume Sautière retweetledi

Arash Behboodi@behboodiarash·30 Tem

Check out the workshop we are organizing on differentiable simulations and surrogate modeling! We have great invited talks. Also, submit your papers!

Kamyar Azizzadenesheli@Azizzadenesheli

At #NeurIPS2024, Dec 9-15th, 2024, an exciting workshop on #AI4Science, solvers, computational methods, PDEs, sciences, engineering, #NeuralOperators, PINNs, surrogate model, data-driven algorithmic computing, and MLonFunctionSpaces, d3s3workshop.github.io Deadline: late Aug

English

1.6K

Guillaume Sautière retweetledi

AI at Meta@AIatMeta·30 Tem

Introducing Meta Segment Anything Model 2 (SAM 2) — the first unified model for real-time, promptable object segmentation in images & videos. SAM 2 is available today under Apache 2.0 so that anyone can use it to build their own experiences Details ➡️ go.fb.me/p749s5

English

152

1.4K

1.6M

Guillaume Sautière retweetledi

Zeyuan Allen-Zhu, Sc.D.@ZeyuanAllenZhu·28 Tem

Bad news (1/2): video taken down by ICML (brockmeyer@icml.cc) for copyright. While I can't agree (the consent I signed allows me to publish elsewhere) - I will respect it to save time for more important things. To bad I delayed many things and spent 20+ hrs preparing the video.

Zeyuan Allen-Zhu, Sc.D.@ZeyuanAllenZhu

Incredibly honored and humbled by the overwhelming response to my tutorial, and thank you everyone who attended in person. Truly heartwarming to hear how much you enjoyed it. Many have been asking for a recording, and I prepared one with my own subtitles youtu.be/yBL7J0kgldU

English

473

237.8K

Keşfet

@Ilaria__Manco @GoogleDeepMind @pimdehaan @TacoCohen @pa9501460 @liquidai @franceinfo @cusp_ai