Raphaël Sourty

1K posts

Raphaël Sourty banner
Raphaël Sourty

Raphaël Sourty

@raphaelsrty

AI @LightonIO Language Models, Information Retrieval, Knowledge Distillation PhD

Paris, France Katılım Mayıs 2020
861 Takip Edilen1K Takipçiler
Sabitlenmiş Tweet
Raphaël Sourty
Raphaël Sourty@raphaelsrty·
Releasing ColGREP and LateOn-Code models 🚀 ColGREP is a multi-vector search tool built in Rust made for coding agents. It's an hybrid grep which supports both grep features and semantic retrieval. Run 100% locally. You get two SOTA code retrieval model within ColGREP
English
7
19
133
10.3K
Raphaël Sourty retweetledi
Manuel Faysse
Manuel Faysse@ManuelFaysse·
🚨 Do LLMs need to store everything they read in memory? To reduce KV cache size and improve decoding speeds, we propose Self-Pruned KV attention, a mechanism where the model learns to decide which KVs to write in the persistent KV cache, discarding all the rest! @AIatMeta🧵
Manuel Faysse tweet media
English
5
30
128
9.4K
Raphaël Sourty retweetledi
Raphaël Sourty retweetledi
Manuel Faysse
Manuel Faysse@ManuelFaysse·
I feel what these results (@mattjustram) mainly show is NOT that retrievers like BM25 are better than previously thought, NOT that GPT 5.5 is amazing at search orchestration, but rather that scaffolds matter a lot in search (and should be reported). Pi-serini is one of the first non-trivial search specific efforts to give some proper tooling to LLMs (cached results, well defined tool calls) and smashes the baseline with the same retriever and baseline model. @antoine_chaffin 's results with/without the "get_document" tool also vastly differ: purely scaffolding. CC and Codex are good scaffolds by design and also perform very well at search as shown here (although they are not built specifically for this). What does this mean: Biggest low hanging fruits right now in search probably aren't through better embedding models (or even better baseline LLMs) but smart scaffolding paradigms (think RLMs @a1zhang, Pi-serini @lintool, Deep Research systems @din0s_ ). As LLMs get better, I would expect inductive biases cooked in the scaffolds to become less important than now. However, scaffolds will still need to be expressive enough to let the baseline model "cook" and still need to encode what matters to the end user (speed matters a lot, search is not just accuracy). Importantly, I believe scaffolds are not an inference time thing only, but should serve during document indexing. To properly contextualize documents in a corpus, we need to move away from the "1 paragraph/webpage is 1 document" concept and use agents to properly map the corpus space, and leverage their priors to index then retrieve. Test time compute scaling is not reserved to querying agents! Just a few thoughts on recent results, this direction will be well aided by novel datasets like @dianetc_ 's Obliq-Bench which will necessarily require a change in paradigms, the easiest surely being around scaffolding.
Manuel Faysse tweet mediaManuel Faysse tweet media
English
5
19
105
6K
Raphaël Sourty retweetledi
Amélie Chatelain
Amélie Chatelain@AmelieTabatta·
ColBERT models continue to embarrass models 54× their sizes 😎, this is why we trust late interaction @LightOnIO . A 1-year-old ColBERT + 5 min of fine-tuning = a model which, coupled to gpt-oss-120B, surpasses qwen3-embed-8B + gpt-5. @antoine_chaffin keeps making it look easy.
Antoine Chaffin@antoine_chaffin

Reason-ModernColBERT nearly solved BrowseComp-Plus, smashing SOTA and outperforming models models 54× bigger Not bad for a 1 year old model not optimized for deep research What if we actually tried? Introducing Agent-ModernColBERT: adding another 10% on top with a 5 min training

English
2
12
66
4.2K
Raphaël Sourty retweetledi
Antoine Chaffin
Antoine Chaffin@antoine_chaffin·
Reason-ModernColBERT nearly solved BrowseComp-Plus, smashing SOTA and outperforming models models 54× bigger Not bad for a 1 year old model not optimized for deep research What if we actually tried? Introducing Agent-ModernColBERT: adding another 10% on top with a 5 min training
Antoine Chaffin tweet media
English
11
44
224
38.8K
Raphaël Sourty retweetledi
Amélie Chatelain
Amélie Chatelain@AmelieTabatta·
In Paris and into information retrieval, and what comes with it? The @ParisMLgroup is organizing a meetup in the coming weeks! Links below
English
2
3
30
2.8K
Raphaël Sourty retweetledi
Amélie Chatelain
Amélie Chatelain@AmelieTabatta·
Half of 'our embedder SOTA on BEIR' claims are measuring contamination, not retrieval quality. If any stage of your training mix touched MS MARCO, the leaderboard delta is noise. Hold-out benchmarks aren't optional anymore, run your own evals!
English
3
2
26
2.5K
Raphaël Sourty retweetledi
bitwise
bitwise@bitwise0X2A·
tested @LightOnIO's recent 149M denseOn inside kbolt (fast local retrieval engine) replacing the default 300M EmbeddingGemma on BEIR FIQA + SciFact. - FiQA: nDCG@10 0.3695 → 0.4767, Recall@10 0.4218 → 0.5996. - SciFact: essentially the same latency wise denseOn was a bit slower possibly because Gemma's inference path is more mature and both have ~100M non-embed params?
bitwise tweet media
English
1
5
14
1.3K
Raphaël Sourty retweetledi
Omri Uzan
Omri Uzan@omri_uzan·
Can we improve a black-box retriever’s performance without additional query-time latency? We introduce Document Optimization: documents are rewritten offline into retrieval-optimized surrogates by a language model. The LM/VLM is trained with RL, with rewards derived from the retriever's rankings. Work with Ron Polonsky, @douwekiela and @ChrisGPotts 👇🧵
Omri Uzan tweet media
English
2
18
63
4.3K
Raphaël Sourty retweetledi
Orion Weller
Orion Weller@orionweller·
Awesome new dataset from @dianetc_ @lateinteraction et al! Would love to see Promptriever evaluated also, as promptable models would work well (arxiv.org/abs/2409.11136) and others have already shown it’s very effective on ToT style queries! Excited to see others build on this!
Diane@dianetc_

Descriptive queries target latent properties without naming them. Twitter-Conflict asks for tweets that imply a specific stance through framing or irony, with explicit/news tweets as hard distractors. WildChat Errors finds model failures that are never explicitly mentioned.

English
3
5
25
3.6K
Omar Khattab
Omar Khattab@lateinteraction·
it was so cool to see how @antoine_chaffin @raphaelsrty (@LightOnIO) 's tiny 0.1B parameter late interaction model beats orders of magnitude bigger dense models on this task and to realize that all that it's still at 8% nDCG@10, with SO much headroom to 91% for stronger models!
Diane@dianetc_

Oblique tip-of-tongue queries match a fuzzy, lossy recollection to one obscure passage. Congress Hearings: "Someone sent me this clip where a senator starts off weirdly friendly, then flips and starts pinning the guy down…"

English
6
16
106
17.9K
Raphaël Sourty retweetledi
Antoine Chaffin
Antoine Chaffin@antoine_chaffin·
Late interaction models break benches released after they were trained But it does not mean they are perfect, far from it actually This work is very important because it shows that our first stage retriever models are still very far from cross-encoders Thanks for putting this out, I really think this can help getting the next generation of models and kick off new initiatives
Diane@dianetc_

We set out to build a better retriever, so we looked for the hardest IR benchmarks. For each, we asked how much headroom remained by running oracle reranking with a frontier LLM. Most had little room left! So we built OBLIQ-Bench to study much harder search queries than before.

English
2
6
22
2.4K
Raphaël Sourty retweetledi
Diane
Diane@dianetc_·
We set out to build a better retriever, so we looked for the hardest IR benchmarks. For each, we asked how much headroom remained by running oracle reranking with a frontier LLM. Most had little room left! So we built OBLIQ-Bench to study much harder search queries than before.
Diane tweet media
English
9
55
268
118.4K
Raphaël Sourty retweetledi
Connor Shorten
Connor Shorten@CShorten30·
How do we train and evaluate Search Agents? 👾🔎 I am SUPER EXCITED to publish a new episode of the Weaviate Podcast with Nandan Thakur (@beirmug) on Search Agents! 🎙️💚 Firstly, congratulations to Nandan who has just completed his Ph.D. at the University of Waterloo advised by Professor Jimmy Lin (@lintool)! 🎉 During this time, Nandan published several impactful works such as BEIR 🍻, MIRACL 🌍🙌🌏, FreshStack 🥞, and many more. This podcast dives into his new work on ORBIT and the current state of Search Agents! ⚛️ ORBIT contains 20K training examples, each one a complex, multi-hop question paired with a short verifiable answer. For example, "What was the runtime of the 2017 animated film set inside a smartphone, directed by..." (Answer: 86 minutes). 🎬 This dataset is used to train Search Agents on queries that require say 4 to 5 searches in order to answer. The crazy part is that ORBIT was generated entirely without paid Web Search APIs! The entire pipeline runs on a 2018 Linux laptop driving DeepSeek's free chat interface! 💻♻️ Trained on ORBIT, Qwen3-4B beats InfoSeeker-4B by 4.3 EM and Search-R1-4B by 9.0 EM across 7 Wikipedia QA benchmarks. A lot of interesting nuggets in this one! As always, I hope you find it useful and happy to discuss further! 👋
Connor Shorten tweet media
English
5
15
45
9.9K