Aarush

89 posts

Aarush

@Aarush1003

Searching for things to retrieve @LightOnIo Masters @DIKU_Institut

Katılım Temmuz 2024

397 Takip Edilen31 Takipçiler

Aarush@Aarush1003·3d

@capemox @LightOnIO @AmelieTabatta @raphaelsrty Thankss 😊

English

gautham@capemox·3d

@Aarush1003 @LightOnIO @AmelieTabatta @raphaelsrty Congrats!

English

Aarush@Aarush1003·3d

Going to be joining @LightOnIO for the summer!! Will be working on some really cool search tools 🕵️‍♂️🌐 And thanks @AmelieTabatta & @raphaelsrty for the opportunity :) Search Models that will be trained by me⬇️⬇️

GIF

English

1.1K

Aarush@Aarush1003·3d

@antoine_chaffin @LightOnIO @AmelieTabatta @raphaelsrty Thankss 😊🙌

English

Antoine Chaffin@antoine_chaffin·3d

@Aarush1003 @LightOnIO @AmelieTabatta @raphaelsrty Welcome on board!

English

148

Aarush@Aarush1003·3d

@raphaelsrty @LightOnIO @AmelieTabatta

GIF

QME

Raphaël Sourty@raphaelsrty·3d

@Aarush1003 @LightOnIO @AmelieTabatta Very happy to work with you @Aarush1003, we will do great things !!! 😁

English

116

Aarush retweetledi

Omar Khattab@lateinteraction·11 May

in case you missed it, OBLIQ-Bench is now on arXiv: arxiv.org/pdf/2605.06235 my hope is that this reduces the frequency of IR or search agents papers that I discard immediately as a reader because in 2026 they’re still evaluating on long-expired MS MARCO, NQ, HotPotQA, BEIR, etc

Diane@dianetc_

We set out to build a better retriever, so we looked for the hardest IR benchmarks. For each, we asked how much headroom remained by running oracle reranking with a frontier LLM. Most had little room left! So we built OBLIQ-Bench to study much harder search queries than before.

English

108

12.1K

Aarush@Aarush1003·28 Nis

Won the best poster award @ICBINBWorkshop!! What a great way to end a wonderful week in Rio 🎊

Aarush@Aarush1003

New Pre-Print !! LLMs are not good dataset generators for retrieval tasks...

English

Aarush@Aarush1003·15 Nis

@antoine_chaffin @lateinteraction Agreed 👀

English

158

Antoine Chaffin@antoine_chaffin·15 Nis

Scaling laws of multi-vector models seem rather different from dense ones

English

3.9K

Aarush@Aarush1003·14 Nis

I am also looking for visiting research positions for the summer, if anyone has any leads please reach out :)

English

Aarush@Aarush1003·14 Nis

I will be in Rio presenting this paper at the @ICBINBWorkshop :)

Aarush@Aarush1003

New Pre-Print !! LLMs are not good dataset generators for retrieval tasks...

English

Aarush retweetledi

Guanya Shi@GuanyaShi·25 Mar

I’m so tired of writing rebuttals to this kind of “lack of novelty” review: “This paper trivially combines A, B, and C, so the algorithmic novelty is limited.” Technically, most (if not all) robotics papers are convex combinations of existing ideas. I still deeply appreciate A+B+C papers—especially when they deliver: - New capabilities: the “trivial combination” unlocks behaviors we simply couldn’t achieve before - Sensible & organic design: A+B+C is clearly the right composition—not some arbitrary A′+B+C′ - Nontrivial interactions: careful analysis of the dynamics, coupling, or failure modes between A, B, C - Rehabilitating old ideas: A was dismissed for years, but paired with modern B/C, it suddenly works—and teaches us why - System-level & "interface" insight: the contribution is not any single piece, but how the pieces talk to each other - Scaling laws or regimes: identifying when/why A+B+C works (and when it doesn’t) - Engineering clarity: making something actually work robustly in the real world is not “trivial” - New problem formulations: sometimes the real novelty is in the reformulation—only under this view does A+B+C make sense. Maybe worth keeping these in mind when reviewing the next A+B+C paper : )

English

122

980

113.9K

Aarush@Aarush1003·24 Mar

Thanks for sharing our work :)

Sumit@_reachsumit

ECI: Effective Contrastive Information to Evaluate Hard-Negatives Introduces a training-free metric grounded in information theory to assess hard-negative quality before fine-tuning. 📝 arxiv.org/abs/2603.20990

English

Aarush@Aarush1003·23 Mar

@antoine_chaffin @AmelieTabatta Multi-Vector has been making everyone happy Late-ly

English

Antoine Chaffin@antoine_chaffin·23 Mar

@AmelieTabatta made me realize I smiled quite a lot during the pod that's my face when I talk about encoders and late interaction

Weaviate Podcast@weaviatepodcast

Weaviate Podcast #134 is live! Multi-Vector Search! 🎙️💚🔥

English

864

Aarush@Aarush1003·2 Mar

@cneuralnetwork @prajdabre @AdishPandya Same here even i got a paper accepted :)

English

neural nets.@cneuralnetwork·2 Mar

happy to announce that our paper from AI4Bharat has been accepted to the icbinb workshop at ICLR 2026 🎊 work done with @prajdabre @AdishPandya

English

440

13.5K

Aarush@Aarush1003·2 Mar

This work has been accepted to the @ICBINBWorkshop at @iclr_conf !! See you in Rio hopefully 🇧🇷🇧🇷

Aarush@Aarush1003

New Pre-Print !! LLMs are not good dataset generators for retrieval tasks...

English

276

Aarush@Aarush1003·18 Oca

Paper: arixiv.org/abs/2504.21015 HuggingFace: huggingface.co/collections/ch…

English

Aarush@Aarush1003·18 Oca

Combining LLM generated hard-negatives with that of BM25 or cross-encoders improves performance but still can not outperform the baselines. More importantly we find that Phi4 generates data that results in a better retriever than Qwen3-30B a model twice the size of Phi4.

English

Aarush@Aarush1003·18 Oca

New Pre-Print !! LLMs are not good dataset generators for retrieval tasks...

English

489

Keşfet

@capemox @LightOnIO @AmelieTabatta @raphaelsrty @antoine_chaffin @ICBINBWorkshop @lateinteraction @cneuralnetwork