Loïck BOURDOIS

335 posts

Loïck BOURDOIS

Loïck BOURDOIS

@BdsLoick

FAT5 (Flash Attention T5) boy @huggingface Fellow 🤗

France Katılım Ekim 2021
230 Takip Edilen280 Takipçiler
Sabitlenmiş Tweet
Loïck BOURDOIS
Loïck BOURDOIS@BdsLoick·
We introduce FAT5 (Flash Attention T5) ⚡ An implementation of T5 in PyTorch with UL2 objective optimized for GPGPU for both training and inference thanks to 13 differents optimizations. Unroll to find out more! 🧵👇 [1/8]
GIF
English
1
0
7
942
Loïck BOURDOIS
Loïck BOURDOIS@BdsLoick·
CuTeDSL is really nice For those wishing to get into writing kernels in this language, github.com/b-albar/machete can be useful Boris ALBAR reimplemented Flash Attention, RoPE, RMSnorm, etc. Everything compatible with HF Transformers (tests on llama3, GLM4.7, Qwen3), TRL, PEFT/LoRA
maharshi@maharshii

CuTeDSL is my new favourite thing: I wrote a kernel for RMS norm after learning about layouts, tiling, copying tensors, reductions and so on, especially for inference and it is about 2.13x faster than a triton fused kernel for the given shape.

English
0
0
0
136
Maziyar PANAHI
Maziyar PANAHI@MaziyarPanahi·
@BdsLoick @OpenMed_AI @huggingface this is just the first 6 months until the blog post. wanted to share what happened so far. I haven’t counted after that. A lot of models, takes time to count 😆
English
1
0
2
30
Maziyar PANAHI
Maziyar PANAHI@MaziyarPanahi·
I am telling you, it's a great time to be in healthcare AI!
Maziyar PANAHI tweet media
English
5
5
33
2.1K
Loïck BOURDOIS
Loïck BOURDOIS@BdsLoick·
@gui_penedo I suppose all good things must come to an end. Thank you very much for the high-quality multilingual datasets
Loïck BOURDOIS tweet media
English
0
0
1
20
Guilherme Penedo
Guilherme Penedo@gui_penedo·
Update: I’ve left Hugging Face 🤗 I spent the last ~2.5 years working on large-scale datasets like FineWeb🍷, FineWeb2 🥂, FineTranslations💬, and FinePDFs📄, and also got to contribute to exciting projects like SmolLM🤏 and Open-R1🐳. It’s pretty incredible to look back at the impact these projects have had, and to see how much the community now relies on them. The reception to FineWeb when it came out, in particular, was wild. One thing I really appreciated was being able to share what we were learning as we went rather than keeping everything internal. Writing lengthy blog posts (books?) takes a lot of work but ends up being incredibly rewarding. Hugging Face’s culture actively encourages this, and I feel very lucky to have been able to work in that kind of environment. I’m now starting a new project with @HKydlicek (also leaving HF), still fully focused on data and strongly shaped by what we learned building at scale. More soon 🫡
English
17
4
290
18.1K
Quentin Lhoest 🤗
Quentin Lhoest 🤗@lhoestq·
Wow there are now 600,000 public datasets on @huggingface ! They were 600 exactly five years ago 😳 (@mervenoyann @abhi1thakur can testify) So, to summarize: - 2020>2025 : x1000 - 2025>2030: x1000 too ??? 😂 The open source community is crazy !
Quentin Lhoest 🤗 tweet media
English
5
7
63
18K
Loïck BOURDOIS
Loïck BOURDOIS@BdsLoick·
@JFPuget @arcprize @kaggle It seemed to me that at one point in the public leaderboard, giotto ai was ranked between you and the ARChitects. Do we know what happened to them? I can't find their results on the Kaggle leaderboard
English
1
0
0
114
JFPuget 🇺🇦🇨🇦🇬🇱
Interesting read from the ARChitects, 2nd place team on @arcprize competition on @kaggle . Their combination of diffusion LLMs with iterative improvement is quite interesting. It has some ties with TRM and HRM models. There is some irony thoough. They tried something different from their winning solution from last year because they thought it was not successful. Irony is we won reusing their last year solution (with some improvements). Key for us was to use better pretraining data.
Jan Disselhoff@JDisselh

ARC Prize 2025 is over, an amazing contest, with amazing people competing. This year our team "the ARChitects" managed to reach second place. We tried a lot of things, some thoughts and explanation of our approach below!

English
5
7
118
9.9K
Loïck BOURDOIS
Loïck BOURDOIS@BdsLoick·
@MaziyarPanahi "he’s gonna work" 🤔 He has been working on it for at least 2 years and uses the term AMI in his lectures (in the one given at the Collège de France as part of Benoit Sagot's series, he said that he used this name with Joëlle Pineau because of the meaning it also has in French)
Loïck BOURDOIS tweet media
English
1
0
3
55
Julien Chaumond
Julien Chaumond@julien_c·
we shipped a new thing like it / hate it?
Julien Chaumond tweet media
English
31
6
204
47.8K
Loïck BOURDOIS retweetledi
João Maria Janeiro
João Maria Janeiro@JoaoMJaneiro·
🚨New Paper @AIatMeta 🚨 You want to train a largely multilingual model, but languages keep interfering and you can’t boost performance? Using a dense model is suboptimal when mixing many languages, so what can you do? You can use our new architecture Mixture of Languages! 🧵1/n
João Maria Janeiro tweet media
English
3
11
22
1.9K
Loïck BOURDOIS
Loïck BOURDOIS@BdsLoick·
@antoine_chaffin So much easier than downloading the dataset yourself and then searching through it for what you want
English
0
0
1
33
Antoine Chaffin
Antoine Chaffin@antoine_chaffin·
@BdsLoick (You also reminded me we can run SQL queries on the hub datasets, here goes my afternoon)
English
1
0
1
106
Antoine Chaffin
Antoine Chaffin@antoine_chaffin·
if NanoBEIR was the official bench, we would be beating SOTA by such a large margin 😭
English
1
0
7
1.3K
Antoine Chaffin
Antoine Chaffin@antoine_chaffin·
@BdsLoick FWIW I checked and it does not seems like those faulty documents are linked to any query in qrels Still bad, but a bit less bad than if they were I wonder if those come from the original DBPedia dataset or have been introduced in Nano version
English
2
0
0
196
Loïck BOURDOIS
Loïck BOURDOIS@BdsLoick·
@antoine_chaffin Yeah corpus split of this subset. Note there is also similar issue for NanoFiQA2018, NanoNQ and NanoSCIDOCS (so at least 30.7% of splits have a problem).
English
0
0
0
20
Antoine Chaffin
Antoine Chaffin@antoine_chaffin·
@BdsLoick god I knew Quora had issue but this seems very odd for DBPedia Those are from the documents corpus right?
English
3
0
0
151
Antoine Chaffin
Antoine Chaffin@antoine_chaffin·
unfortunately growing is realizing that although NanoBEIR has some signal, it decorrelates largely from BEIR at times
English
1
0
3
301
Loïck BOURDOIS retweetledi
Lewis Tunstall
Lewis Tunstall@_lewtun·
We've just published the Smol Training Playbook: a distillation of hard earned knowledge to share exactly what it takes to train SOTA LLMs ⚡️ Featuring our protagonist SmolLM3, we cover: 🧭 Strategy on whether to train your own LLM and burn all your VC money 🪨 Pretraining, aka turning a mountain of text into a fancy auto-completer 🗿How to sculpt base models with post-training alchemy 🛠️ The underlying infra and how to debug your way out of NCCL purgatory Highlights from the post-training chapter in the thread 👇
Lewis Tunstall tweet media
English
21
90
478
139.3K
Loïck BOURDOIS retweetledi
Shayne Longpre
Shayne Longpre@ShayneRedford·
📢Thrilled to introduce ATLAS 🗺️: scaling laws beyond English, for pretraining, finetuning, and the curse of multilinguality. The largest public, multilingual scaling study to-date—we ran 774 exps (10M-8B params, 400+ languages) to answer: 🌍Are scaling laws different by language? 🧙‍♂️Can we model the curse of multilinguality? ⚖️Pretrain from scratch or finetune from multilingual checkpoint? 🔀Cross-lingual transfer scores for 1444 lang pairs? 1/🧵
Shayne Longpre tweet mediaShayne Longpre tweet media
English
7
41
154
24.1K
tomaarsen
tomaarsen@tomaarsen·
🤗 Sentence Transformers is joining @huggingface! 🤗 This formalizes the existing maintenance structure, as I've personally led the project for the past two years on behalf of Hugging Face. I'm super excited about the transfer! Details in 🧵
tomaarsen tweet media
English
24
46
376
41.1K
Loïck BOURDOIS
Loïck BOURDOIS@BdsLoick·
@tomaarsen Tom, you told me to count trimm as part of HF but not ST because it was just sponsorship in the form of maintenance from HF😭 I'm going to have to redo all the graphs in my blog post
English
1
0
0
42
tomaarsen
tomaarsen@tomaarsen·
That choice ended up being very valuable for the embedding & IR community, and I think this choice of granting Hugging Face stewardship will be similarly successful. I'm very excited about the future of the project, and for the world of embeddings and retrieval at large!
English
1
0
16
806