Guillaume Sautière

975 posts

Guillaume Sautière banner
Guillaume Sautière

Guillaume Sautière

@gsautiere

machine learning researcher in video and audio compression at qualcomm ai research amsterdam. electronic music is my second life. he/him

Amsterdam, The Netherlands Katılım Nisan 2013
663 Takip Edilen331 Takipçiler
Sabitlenmiş Tweet
Guillaume Sautière
Guillaume Sautière@gsautiere·
The code for clockwork is now open-source! It allows up to 40% time savings with close to no loss in percept. quality w/o any finetuning required. Add a single line to your pipeline synthesis code and see for yourself! github.com/Qualcomm-AI-re…
Guillaume Sautière tweet mediaGuillaume Sautière tweet mediaGuillaume Sautière tweet media
Amir Habibian@amir_habibian

Do we need to run the whole UNet for all the diffusion steps? No! Accelerating your Diffusion Model with a simple trick, even without retraining ?! w/ Amir Ghodrati, @noor_fathima_ , @gsautiere, Fatih Porikli and @peterjensen_ arxiv.org/abs/2312.08128 Code: Stay tuned

English
0
15
43
7K
Ilaria Manco
Ilaria Manco@Ilaria__Manco·
I recently submitted my PhD thesis on “Learning Music Representations from Audio and Language”! A bit nervous to wrap up such an incredible period of my life, but super excited for what comes next. After the defence, I'll be joining @GoogleDeepMind, this time no longer a student!
English
24
8
344
26.5K
Guillaume Sautière
Guillaume Sautière@gsautiere·
@viggiebirodkar @pa9501460 Hello Vignesh, looking forward to reading your work. It seems quite similar to our work on image compression using diffusion. See A Residual Diffusion Model for High Perceptual Quality Codec Augmentation arxiv.org/abs/2301.05489. A diff is that we model the residual with diff
Guillaume Sautière tweet media
English
0
0
5
455
Guillaume Sautière retweetledi
𝘼𝙈𝘼𝙂𝙄
𝘼𝙈𝘼𝙂𝙄@amagitakayosi·
I wrote a datamosh camera using WebCodecs #Codepen
English
22
270
3.3K
342.1K
Guillaume Sautière retweetledi
Soumith Chintala
Soumith Chintala@soumithchintala·
There's three parts. 1. Fitting as large of a network and as large of a batch-size as possible onto the 10k/100k/1m H100s -- parallelizing and using memory-saving tricks. 2. Communicating state between these GPUs as quickly as possible 3. Recovering from failures (hardware, software, etc.) as quickly as possible 1. Fitting as large of a network and as large of a batch-size as possible onto the 10k H100s. Parallelizing: 1. parallelize over batches 2. parallelize over layers (i.e. split a layer across GPUs) 3. parallelize across layers (i.e. 1 to N are on GPU1, N+1th layer to N+10th layer are on GPU2) Keep parallelizing until you are able to use all GPUs well, with maximum utilization. Checkpointing / Compute vs memorize: * You need to save certain terms from forward to compute the backprop (save_for_backward). However, if the network is sufficiently large, it is more profitable to free these terms in order to fit a larger batch-size, and recompute them again when you need them to compute the backprop. * Tricks like FSDP discard parts of weights that are held in one GPU (to save memory), and ask for the shards of weights from other GPUs right before they need them. 2. Communicating state between these GPUs as quickly as possible Communication overlap: When you need to communicate among GPUs, try to start communication as soon as you can: * Exampel: when Nth layer is done with backward, while N-1th layer is computing backward, all GPUs with an Nth layer can all-reduce their gradients) Discover and leverage the underlying networking topology: Communicating large amounts of state (gradients, optimizer state) across multiple nodes is complicated. with Sync SGD, you have to communicate this state in a burst, as quickly as you can. we might have multiple layers of switches, and have RDMA (ability to copy GPU memory directly to NIC, bypassing CPU ram entirely), and have frontend and backend NICs (frontend connects to storage like NFS, backend connects GPUs to other GPUs in cluster). So, it's important to leverage all this info when running communication collectives like all-reduce or scatter/gather. All-reduce for example can be done algorithmically in log(n) if you tree-reduce; and the constant factors that change based on the type of fiber connecting one node to another in the tree of networking fiber is important to reduce overall time and latency. Libraries like NCCL do sophisticated discovery of the underlying networking topology and leverage them when we run all-reduce and other collectives. 3. Recovering from failures (hardware, software, etc.) as quickly as possible At 10k GPU scale, things fail all the time -- GPUs, NICs, cables, etc. Some of these failures are easy to detect quickly, some of them you can only detect because one node isn't replying back in time (say a NCCL all-reduce is stuck). We build various tools to monitor and detect fleet health, and remove failed nodes from the fleet as quickly as possible. This is quite hard. Separately, at this large of a scale you can have silent data corruptions from memory bits flipping randomly (due to basic physics and amplifying the probability at this scale), and you suddenly have loss-explosions for no reason other than this random phenomenon. These happen at small-scale too, but very very infrequently so you barely notice. This is very hard to detect before-hand in software. Some hardware has hardware circuitry that does built-in checksums after it computes things -- this way if bit-flips occur the hardware can throw an interrupt. H100s and previous NVIDIA GPUs don't have this feature. To counter all these failures, you would want to save your model state as frequently and as quickly as you can; and when a failure occurs, you want to recover and continue as quickly as you can. Usually, we save model state really quickly to CPU memory in a separate thread and in the background we save from CPU memory to disk or remote storage. We also save model state in shards (this is torch.distributed's checkpointing feature), i.e. not every GPU needs to save all of the model weights; each GPU only needs to save a portion of weights -- and they can recover the other part of weights from other GPU shard checkpoints.
English
24
167
1.3K
241.5K
Guillaume Sautière retweetledi
Maxime Labonne
Maxime Labonne@maximelabonne·
This is the proudest release of my career :) At @liquidai, we're launching three LLMs (1B, 3B, 40B MoE) with SOTA performance, based on a custom architecture. Minimal memory footprint & efficient inference bring long context tasks to edge devices for the first time!
Maxime Labonne tweet media
English
65
201
1.7K
166.8K
Guillaume Sautière retweetledi
Dr Thomas Guénolé
Dr Thomas Guénolé@thomas_guenole·
"Je pense qu'il faut interdire le Rassemblement national. C'est prévu par l'article L212-1 du Code de sécurité intérieure. Motif d'interdiction: provocation à la discrimination et à la haine. En l'occurrence envers la population d'origine maghrébine." @franceinfo #RN
Français
3.7K
1.1K
3.7K
765.9K
Guillaume Sautière retweetledi
kyutai
kyutai@kyutai_labs·
Mimi is a neural audio codec that improves over SoundStream and Encodec by jointly modeling semantic and acoustic information using distillation, inspired by SpeechTokenizer. Not only its improved architecture and adversarial training make it outperform SpeechTokenizer, RVQGAN and SemantiCodec, but we designed Mimi specifically for working with LLMs: it operates at 12.5Hz and 1.1kbps, while being fully causal and thus provides ideal tokens for a streaming Transformer. ⬇️
kyutai tweet media
English
1
8
87
14.1K
Guillaume Sautière retweetledi
Karen Hao
Karen Hao@_KarenHao·
To the public, Microsoft uses its reputation as an AI & sustainability leader to tell a compelling story: AI will do wonders to help solve the climate crisis. To fossil-fuel firms, Microsoft has a different message: AI will help them drill, baby, drill. 1/ theatlantic.com/technology/arc…
English
17
788
1.6K
247K
Guillaume Sautière retweetledi
Sander Dieleman
Sander Dieleman@sedielem·
Diffusion is the rising tide that eventually submerges all frequencies, high and low 🌊 Diffusion is the gradual decomposition into feature scales, fine and coarse 🗼 Diffusion is just spectral autoregression 🤷🌈
GIF
English
33
155
1.1K
277.9K
Guillaume Sautière retweetledi
Johann Brehmer
Johann Brehmer@johannbrehmer·
Come work with us at @cusp_ai! We're looking for engineers, experience with ML and materials is a plus. You'll join a small but great team in Amsterdam, Berlin, or Cambridge. If all goes well, your work here might contribute to capturing some carbon...
CuspAI@cusp_ai

🚀We're looking for ML Engineers to join our ambitious team led by @wellingmax. If you're passionate about machine learning and want to make an impact, we'd love to hear from you! Full Details: linkedin.com/jobs/view/4011… #MLEngineering #Hiring #Climate

English
2
6
63
9.2K
Guillaume Sautière retweetledi
Johann Brehmer
Johann Brehmer@johannbrehmer·
I have a new job: Today's my first day at @cusp_ai. We'll work on ML-based material discovery, in particular for carbon capture. I'm joining a great team led by @wellingmax and @ac_edwards_1, and a shiny new office in Amsterdam. I'm super excited.
English
11
5
153
11.8K
Guillaume Sautière retweetledi
Aran Komatsuzaki
Aran Komatsuzaki@arankomatsuzaki·
Learning to (Learn at Test Time): RNNs with Expressive Hidden States - Performs Linear-time RNN by propagating the gradient to the next step, i.e., test-time training - Achieves better perplexity than Mamba arxiv.org/abs/2407.04620
Aran Komatsuzaki tweet media
English
7
75
404
120.2K
Guillaume Sautière retweetledi
Josef Dean
Josef Dean@JosefNDean·
Sure matplotlib is cool, but what if I want to load my loss curves into the 2006 hit Flash game LineRider?
English
50
807
6.4K
437.9K
Guillaume Sautière retweetledi
Soumith Chintala
Soumith Chintala@soumithchintala·
@lucidrains joined the @PyTorch team at Meta this June, doing what they do best -- working on open source implementations of important AI work. So proud to have them on the team! (just announcing after realizing we never did)
/@gazorp5

time to panic

English
23
45
486
45.9K
Guillaume Sautière retweetledi
Arash Behboodi
Arash Behboodi@behboodiarash·
Check out the workshop we are organizing on differentiable simulations and surrogate modeling! We have great invited talks. Also, submit your papers!
Kamyar Azizzadenesheli@Azizzadenesheli

At #NeurIPS2024, Dec 9-15th, 2024, an exciting workshop on #AI4Science, solvers, computational methods, PDEs, sciences, engineering, #NeuralOperators, PINNs, surrogate model, data-driven algorithmic computing, and MLonFunctionSpaces, d3s3workshop.github.io Deadline: late Aug

English
0
3
13
1.6K
Guillaume Sautière retweetledi
AI at Meta
AI at Meta@AIatMeta·
Introducing Meta Segment Anything Model 2 (SAM 2) — the first unified model for real-time, promptable object segmentation in images & videos. SAM 2 is available today under Apache 2.0 so that anyone can use it to build their own experiences Details ➡️ go.fb.me/p749s5
English
152
1.4K
7K
1.6M
Guillaume Sautière retweetledi
Zeyuan Allen-Zhu, Sc.D.
Zeyuan Allen-Zhu, Sc.D.@ZeyuanAllenZhu·
Bad news (1/2): video taken down by ICML (brockmeyer@icml.cc) for copyright. While I can't agree (the consent I signed allows me to publish elsewhere) - I will respect it to save time for more important things. To bad I delayed many things and spent 20+ hrs preparing the video.
Zeyuan Allen-Zhu, Sc.D. tweet mediaZeyuan Allen-Zhu, Sc.D. tweet media
Zeyuan Allen-Zhu, Sc.D.@ZeyuanAllenZhu

Incredibly honored and humbled by the overwhelming response to my tutorial, and thank you everyone who attended in person. Truly heartwarming to hear how much you enjoyed it. Many have been asking for a recording, and I prepared one with my own subtitles youtu.be/yBL7J0kgldU

English
37
29
473
237.8K