Raphaël Sourty

854 posts

Raphaël Sourty banner
Raphaël Sourty

Raphaël Sourty

@raphaelsrty

Language Models, Information Retrieval, Knowledge Distillation PhD | AI @LightonIO

Paris, France Katılım Mayıs 2020
837 Takip Edilen892 Takipçiler
Sabitlenmiş Tweet
Raphaël Sourty
Raphaël Sourty@raphaelsrty·
Releasing ColGREP and LateOn-Code models 🚀 ColGREP is a multi-vector search tool built in Rust made for coding agents. It's an hybrid grep which supports both grep features and semantic retrieval. Run 100% locally. You get two SOTA code retrieval model within ColGREP
English
7
19
128
9.5K
Raphaël Sourty retweetledi
Raphaël Sourty retweetledi
Omar Khattab
Omar Khattab@lateinteraction·
Antoine and team had trained a nice ColBERT late interaction model last year... Now they decided to try it on BrowseComp+, the canonical "deep research" task. Guess what, it's not only the strongest method by far but also basically solved the task (~90%). Who would have thunk!
Antoine Chaffin@antoine_chaffin

BrowseComp-Plus, perhaps the hardest popular deep research task, is now solved at nearly 90%... ... and all it took was a 150M model ✨ Thrilled to announce that Reason-ModernColBERT did it again and outperform all models (including models 54× bigger) on all metrics

English
2
8
53
4K
Raphaël Sourty retweetledi
Connor Shorten
Connor Shorten@CShorten30·
Super exciting win for Agentic Search and Late Interaction! 🧬 GPT-5 + Reason-ModernColBERT (150M) reaches ~88% accuracy with an average of ~13 search calls. For reference, when BrowseComp-Plus was published in August 2025, the max accuracy reported was ~70% using GPT-5 + Qwen3-Embed-8B, using ~22 search calls. Searching with reasoning 🤖💭is a beast. 🔥 This is a huge evangelist for semantic search and Late Interaction models are particularly shining thanks to their effectiveness at long input modeling with fine-grained similarity scores. 🛠️ Congratulations @antoine_chaffin and team! 🎉
Antoine Chaffin@antoine_chaffin

BrowseComp-Plus, perhaps the hardest popular deep research task, is now solved at nearly 90%... ... and all it took was a 150M model ✨ Thrilled to announce that Reason-ModernColBERT did it again and outperform all models (including models 54× bigger) on all metrics

English
3
9
28
3.1K
Raphaël Sourty retweetledi
LightOn
LightOn@LightOnIO·
The multi-vector era is here and there is no going back. Reason-ModernColBERT tops BrowseComp-Plus, the hardest agentic search benchmark available, by 7.59 points on accuracy. 🥇on accuracy. 🥇on recall. 🥇on calibration. 📉 Fewest search calls. The models it outperforms? Up to 54× larger. Reasoning-intensive retrieval (BRIGHT), code search (MTEB Code), agentic Deep Research (BrowseComp-Plus). The pattern is the same: late interaction dominates, with a fraction of the parameters. 149M parameters. Open weights. Open code. Built with PyLate in a few hours. Full results, analysis and recipe on LightOn blog: lighton.ai/lighton-blogs/…
LightOn tweet media
English
0
16
96
13K
Raphaël Sourty retweetledi
Omar Khattab
Omar Khattab@lateinteraction·
@antoine_chaffin Antoine and LightOn team right now, with their multi-vector magic
Omar Khattab tweet media
English
2
4
26
1.3K
Raphaël Sourty retweetledi
Hanx
Hanx@Hanx1664439·
100%. We always find ways to scale gracefully (we saw this with LMs: N-grams → RNN → LSTM → Transformers). Efficiency isn't a static wall. For efficiency, late-interaction actually gives us an extra dimension to play with.
Omar Khattab@lateinteraction

can we stop saying this stuff folks? it’s complete nonsense since 2021, the age of GPT-3 API, there is no world where 1M docs will use 40GB of memory the index size would be 3-8GB under standard ColBERTv2 compression from the dark ages, let alone newer methods

English
2
5
35
4.1K
Raphaël Sourty retweetledi
Omar Khattab
Omar Khattab@lateinteraction·
late interaction model (150M) beats the 54x larger Qwen3-8B-Embedding by... hmm, looks like up to 34% relative increase :D also really funny that the entire top section of the BC+ leaderboard, sorted by Recall, is just late interaction models by @LightOnIO and @mixedbreadai
Omar Khattab tweet media
Antoine Chaffin@antoine_chaffin

BrowseComp-Plus, perhaps the hardest popular deep research task, is now solved at nearly 90%... ... and all it took was a 150M model ✨ Thrilled to announce that Reason-ModernColBERT did it again and outperform all models (including models 54× bigger) on all metrics

English
9
15
179
12.4K
Raphaël Sourty retweetledi
Raphaël Sourty
Raphaël Sourty@raphaelsrty·
On agentic code retrieval tasks LateOn model is built different, being small while achieving strong results. Code projects are small, easy to index. You want to minimize time spent on encoding using a small model on cpu and maximize accuracy. Multi-vector are a good fit for this
Antoine Chaffin@antoine_chaffin

Combined with the impressive results of LateOn-Code models and the ColGrep harness for coding agents, we can clearly see a trend where late interaction models seem particularly suited for this new agentic era If you factor in the long context and out-of-domain generalization capabilities, you know the future will be very bright (pun intended) x.com/antoine_chaffi…

English
1
2
11
2.1K
Raphaël Sourty
Raphaël Sourty@raphaelsrty·
Amazing work from @antoine_chaffin pushing the limits! Learned a lot by following recent results from Antoine, excited to see a 150M model being competitive on deep research task. Still it's not trivial and Antoine used few tricks detailed in this thread 👇
Antoine Chaffin@antoine_chaffin

BrowseComp-Plus, perhaps the hardest popular deep research task, is now solved at nearly 90%... ... and all it took was a 150M model ✨ Thrilled to announce that Reason-ModernColBERT did it again and outperform all models (including models 54× bigger) on all metrics

English
0
5
11
1.1K
Raphaël Sourty retweetledi
Omar Khattab
Omar Khattab@lateinteraction·
can we stop saying this stuff folks? it’s complete nonsense since 2021, the age of GPT-3 API, there is no world where 1M docs will use 40GB of memory the index size would be 3-8GB under standard ColBERTv2 compression from the dark ages, let alone newer methods
Omar Khattab tweet media
English
14
10
148
15.1K
Haocheng Xi
Haocheng Xi@HaochengXiUCB·
𝗞-𝗺𝗲𝗮𝗻𝘀 𝗶𝘀 𝘀𝗶𝗺𝗽𝗹𝗲. 𝗠𝗮𝗸𝗶𝗻𝗴 𝗶𝘁 𝗳𝗮𝘀𝘁 𝗼𝗻 𝗚𝗣𝗨𝘀 𝗶𝘀𝗻’𝘁. That’s why we built Flash-KMeans — an IO-aware implementation of exact k-means that rethinks the algorithm around modern GPU bottlenecks. By attacking the memory bottlenecks directly, Flash-KMeans achieves 30x speedup over cuML and 200x speedup over FAISS — with the same exact algorithm, just engineered for today’s hardware. At the million-scale, Flash-KMeans can complete a k-means iteration in milliseconds. A classic algorithm — redesigned for modern GPUs. Paper: arxiv.org/abs/2603.09229 Code: github.com/svg-project/fl…
English
35
196
1.7K
277.7K
Raphaël Sourty
Raphaël Sourty@raphaelsrty·
@mixedbreadai It's great that you show that late interaction models are no joke and sota will be multi-vector and production ready, kudos
English
0
0
2
193
Raphaël Sourty retweetledi
Mixedbread
Mixedbread@mixedbreadai·
Introducing Mixedbread Wholembed v3, our new SOTA retrieval model across all modalities and 100+ languages. Wholembed v3 brings best-in-class search to text, audio, images, PDFs, videos... You can now get the best retrieval performance on your data, no matter its format.
Mixedbread tweet media
English
35
121
944
183.6K
Raphaël Sourty retweetledi
Ben Clavié
Ben Clavié@bclavie·
I'm so excited to introduce this! We've worked on a million different moving parts to produce this. I'm fairly confident it's the best multimodal model that exists, period -- and it's not too shabby at pushing back the LIMITs of retrieval either...
Mixedbread@mixedbreadai

Introducing Mixedbread Wholembed v3, our new SOTA retrieval model across all modalities and 100+ languages. Wholembed v3 brings best-in-class search to text, audio, images, PDFs, videos... You can now get the best retrieval performance on your data, no matter its format.

English
37
40
410
138.3K