Raphaël Sourty (@raphaelsrty) - Twitter Profili

Sabitlenmiş Tweet

Releasing ColGREP and LateOn-Code models 🚀 ColGREP is a multi-vector search tool built in Rust made for coding agents. It's an hybrid grep which supports both grep features and semantic retrieval. Run 100% locally. You get two SOTA code retrieval model within ColGREP

English

7

19

133

10.3K

Raphaël Sourty retweetledi

Manuel Faysse@ManuelFaysse·10h

🚨 Do LLMs need to store everything they read in memory? To reduce KV cache size and improve decoding speeds, we propose Self-Pruned KV attention, a mechanism where the model learns to decide which KVs to write in the persistent KV cache, discarding all the rest! @AIatMeta🧵

English

5

30

128

9.4K

Raphaël Sourty retweetledi

Omar Khattab@lateinteraction·20h

The second effort promised above is now public. I think this paradigm might be a key step toward transforming how we teach new skills to LLMs. x.com/SOURADIPCHAKR1…

Souradip Chakraborty@SOURADIPCHAKR18

🚨Typical RL algorithms and on-policy distillation methods are blind samplers: they use privileged info to score rollouts, but not to *find* them. We ask: can we use privileged info to *actively sample* the rollouts RL wishes it can stumble upon with compute? ⤵️ Pedagogical RL

English

2

15

89

7.9K

Raphaël Sourty retweetledi

Antoine Chaffin@antoine_chaffin·1d

Very good timing as we just released yet another strong model on agentic search Let’s discuss how we did it!

Doug Turnbull@softwaredoug

Excited to announce @antoine_chaffin will be a guest speaker at Cheat at Search w/ Agents. His SOTA Deep Research wins need no introduction :)

English

1

6

36

1.6K

Raphaël Sourty retweetledi

Manuel Faysse@ManuelFaysse·2d

I feel what these results (@mattjustram) mainly show is NOT that retrievers like BM25 are better than previously thought, NOT that GPT 5.5 is amazing at search orchestration, but rather that scaffolds matter a lot in search (and should be reported). Pi-serini is one of the first non-trivial search specific efforts to give some proper tooling to LLMs (cached results, well defined tool calls) and smashes the baseline with the same retriever and baseline model. @antoine_chaffin 's results with/without the "get_document" tool also vastly differ: purely scaffolding. CC and Codex are good scaffolds by design and also perform very well at search as shown here (although they are not built specifically for this). What does this mean: Biggest low hanging fruits right now in search probably aren't through better embedding models (or even better baseline LLMs) but smart scaffolding paradigms (think RLMs @a1zhang, Pi-serini @lintool, Deep Research systems @din0s_ ). As LLMs get better, I would expect inductive biases cooked in the scaffolds to become less important than now. However, scaffolds will still need to be expressive enough to let the baseline model "cook" and still need to encode what matters to the end user (speed matters a lot, search is not just accuracy). Importantly, I believe scaffolds are not an inference time thing only, but should serve during document indexing. To properly contextualize documents in a corpus, we need to move away from the "1 paragraph/webpage is 1 document" concept and use agents to properly map the corpus space, and leverage their priors to index then retrieve. Test time compute scaling is not reserved to querying agents! Just a few thoughts on recent results, this direction will be well aided by novel datasets like @dianetc_ 's Obliq-Bench which will necessarily require a change in paradigms, the easiest surely being around scaffolding.

English

5

19

105

6K

Raphaël Sourty retweetledi

Igor Carron@IgorCarron·3d

Introducing @LightOnIO 's Agent-ModernColbert: the more you use it, the more you save.

Antoine Chaffin@antoine_chaffin

Reason-ModernColBERT nearly solved BrowseComp-Plus, smashing SOTA and outperforming models models 54× bigger Not bad for a 1 year old model not optimized for deep research What if we actually tried? Introducing Agent-ModernColBERT: adding another 10% on top with a 5 min training

English

1

7

21

1.4K

Raphaël Sourty retweetledi

Amélie Chatelain@AmelieTabatta·3d

ColBERT models continue to embarrass models 54× their sizes 😎, this is why we trust late interaction @LightOnIO . A 1-year-old ColBERT + 5 min of fine-tuning = a model which, coupled to gpt-oss-120B, surpasses qwen3-embed-8B + gpt-5. @antoine_chaffin keeps making it look easy.

Antoine Chaffin@antoine_chaffin

Reason-ModernColBERT nearly solved BrowseComp-Plus, smashing SOTA and outperforming models models 54× bigger Not bad for a 1 year old model not optimized for deep research What if we actually tried? Introducing Agent-ModernColBERT: adding another 10% on top with a 5 min training

English

2

12

66

4.2K

Raphaël Sourty retweetledi

Bo@bo_wangbo·3d

another great work from Antoine and LightOn folks! Guess it's time for a more challenging benchmark? @xueguang_ma

Antoine Chaffin@antoine_chaffin

Reason-ModernColBERT nearly solved BrowseComp-Plus, smashing SOTA and outperforming models models 54× bigger Not bad for a 1 year old model not optimized for deep research What if we actually tried? Introducing Agent-ModernColBERT: adding another 10% on top with a 5 min training

English

1

7

23

3.9K

Raphaël Sourty@raphaelsrty·3d

@antoine_chaffin Ahah 😂

Italiano

0

47

Antoine Chaffin@antoine_chaffin·3d

@raphaelsrty

QME

1

0

8

287

Raphaël Sourty retweetledi

Antoine Chaffin@antoine_chaffin·3d

Reason-ModernColBERT nearly solved BrowseComp-Plus, smashing SOTA and outperforming models models 54× bigger Not bad for a 1 year old model not optimized for deep research What if we actually tried? Introducing Agent-ModernColBERT: adding another 10% on top with a 5 min training

English

11

44

224

38.8K

Raphaël Sourty retweetledi

Amélie Chatelain@AmelieTabatta·30 Nis

In Paris and into information retrieval, and what comes with it? The @ParisMLgroup is organizing a meetup in the coming weeks! Links below

English

2

3

30

2.8K

Raphaël Sourty retweetledi

Amélie Chatelain@AmelieTabatta·4d

Half of 'our embedder SOTA on BEIR' claims are measuring contamination, not retrieval quality. If any stage of your training mix touched MS MARCO, the leaderboard delta is noise. Hold-out benchmarks aren't optional anymore, run your own evals!

English

3

2

26

2.5K

Raphaël Sourty retweetledi

bitwise@bitwise0X2A·6d

tested @LightOnIO's recent 149M denseOn inside kbolt (fast local retrieval engine) replacing the default 300M EmbeddingGemma on BEIR FIQA + SciFact. - FiQA: nDCG@10 0.3695 → 0.4767, Recall@10 0.4218 → 0.5996. - SciFact: essentially the same latency wise denseOn was a bit slower possibly because Gemma's inference path is more mature and both have ~100M non-embed params?

English

1

5

14

1.3K

Raphaël Sourty retweetledi

Omri Uzan@omri_uzan·6 May

Can we improve a black-box retriever’s performance without additional query-time latency? We introduce Document Optimization: documents are rewritten offline into retrieval-optimized surrogates by a language model. The LM/VLM is trained with RL, with rewards derived from the retriever's rankings. Work with Ron Polonsky, @douwekiela and @ChrisGPotts 👇🧵

English

2

18

63

4.3K

Raphaël Sourty retweetledi

Orion Weller@orionweller·6 May

Awesome new dataset from @dianetc_ @lateinteraction et al! Would love to see Promptriever evaluated also, as promptable models would work well (arxiv.org/abs/2409.11136) and others have already shown it’s very effective on ToT style queries! Excited to see others build on this!

Diane@dianetc_

Descriptive queries target latent properties without naming them. Twitter-Conflict asks for tweets that imply a specific stance through framing or irony, with explicit/news tweets as hard distractors. WildChat Errors finds model failures that are never explicitly mentioned.

English

3

5

25

3.6K

Omar Khattab@lateinteraction·6 May

it was so cool to see how @antoine_chaffin @raphaelsrty (@LightOnIO) 's tiny 0.1B parameter late interaction model beats orders of magnitude bigger dense models on this task and to realize that all that it's still at 8% nDCG@10, with SO much headroom to 91% for stronger models!

Diane@dianetc_

Oblique tip-of-tongue queries match a fuzzy, lossy recollection to one obscure passage. Congress Hearings: "Someone sent me this clip where a senator starts off weirdly friendly, then flips and starts pinning the guy down…"

English

6

16

106

17.9K

Raphaël Sourty@raphaelsrty·6 May

@lateinteraction @antoine_chaffin @LightOnIO Thank you @lateinteraction for including our model in the benchmark, lot of room for improvement, I like this approach of creating benchmarks / tasks with near 0% accuracy on current sota to make research move forward

English

0

7

319

Raphaël Sourty retweetledi

Antoine Chaffin@antoine_chaffin·6 May

Late interaction models break benches released after they were trained But it does not mean they are perfect, far from it actually This work is very important because it shows that our first stage retriever models are still very far from cross-encoders Thanks for putting this out, I really think this can help getting the next generation of models and kick off new initiatives

Diane@dianetc_

We set out to build a better retriever, so we looked for the hardest IR benchmarks. For each, we asked how much headroom remained by running oracle reranking with a frontier LLM. Most had little room left! So we built OBLIQ-Bench to study much harder search queries than before.

English

2

6

22

2.4K

Raphaël Sourty retweetledi

Diane@dianetc_·6 May

We set out to build a better retriever, so we looked for the hardest IR benchmarks. For each, we asked how much headroom remained by running oracle reranking with a frontier LLM. Most had little room left! So we built OBLIQ-Bench to study much harder search queries than before.

English

9

55

268

118.4K

Raphaël Sourty retweetledi

Connor Shorten@CShorten30·5 May

How do we train and evaluate Search Agents? 👾🔎 I am SUPER EXCITED to publish a new episode of the Weaviate Podcast with Nandan Thakur (@beirmug) on Search Agents! 🎙️💚 Firstly, congratulations to Nandan who has just completed his Ph.D. at the University of Waterloo advised by Professor Jimmy Lin (@lintool)! 🎉 During this time, Nandan published several impactful works such as BEIR 🍻, MIRACL 🌍🙌🌏, FreshStack 🥞, and many more. This podcast dives into his new work on ORBIT and the current state of Search Agents! ⚛️ ORBIT contains 20K training examples, each one a complex, multi-hop question paired with a short verifiable answer. For example, "What was the runtime of the 2017 animated film set inside a smartphone, directed by..." (Answer: 86 minutes). 🎬 This dataset is used to train Search Agents on queries that require say 4 to 5 searches in order to answer. The crazy part is that ORBIT was generated entirely without paid Web Search APIs! The entire pipeline runs on a 2018 Linux laptop driving DeepSeek's free chat interface! 💻♻️ Trained on ORBIT, Qwen3-4B beats InfoSeeker-4B by 4.3 EM and Search-R1-4B by 9.0 EM across 7 Wikipedia QA benchmarks. A lot of interesting nuggets in this one! As always, I hope you find it useful and happy to discuss further! 👋

English

5

15

45

9.9K

Raphaël Sourty

Keşfet