

Qingcheng Zeng
194 posts

@SteveZeng7
PhD-ing with @rfpvjr and @kaize0409 / IR, search agent, LLM, social computing / Big fan of @Arsenal / Christian







About the results now As the biggest fan of @orionweller on earth, I've immediately thought about this when they released Promptriever (actually, directly shot him a DM about it). Anytime people release dataset on the hub, it's a free opportunity to get SOTA by plugging it into PyLate So I went ahead and tried it (there is still a branch on the main repo for it, actually), even coded a in-training evaluator running with a PLAID index and coding p-MRR! So why did I not release it? Well, I am running this kind of experiments on a daily basis and, as illustrated in the blog post, the results are good, but not insanely good. I was in a phase of beating B-scale models by a large margin, so being better on some metrics but not all was a good result, but not good enough for me to spend time digging into it Essentially, I was scared that for prompting capabilities, you might require a LLM (or at least, starting from something that have shown some prompting capabilities), so I was waiting to train some larger scale ColBERT models to iterate on this (and I still believe we should see much better performance using those kind of models, it's already pretty cool to hit those results with such small models!) The work from @SteveZeng7 is a good reminder that sometimes sharing some cool results is enough and we should aim at sharing as much as we can, not just the perfect shiny results. I should share more about all the exploration I make! Finally, about the fact that it's better to start from GTE-ModernColBERT, I would say that it's somewhat related to our ColBERT-Zero study (huggingface.co/blog/lightonai…), in the sense that it's important to be careful about training the model for late interaction and not take as granted that just taking a dense model as a basis is optimal I am a bit surprised that the scale of this training is not enough, but as raised in the BP, I suppose it's because it's more related to learning "general" retrieval before! Actually, it would be pretty cool to try this boilerplate with the ColBERT-Zero models, I wonder how would be the results! The main issue to me is that those models already leverage a "prompt" and it might conflict a bit, but it's an interesting avenue!


We build the first production ready multi-vector and multimodal search. Now we are serving over 1 billion documents in under 50ms latency (p50). We are sharing how we build it.



📢 New Preprint 📢 💪 Current LLMs are performing quite well in pragmatic reasoning 🧐 But how do they acquire this ability? Introducing AltPrag, a dataset motivated by the idea of "alternatives" in pragmatics to trace during which phase LLMs learn pragmatic reasoning. [1/n]





📢 New Preprint 📢 💪 Current LLMs are performing quite well in pragmatic reasoning 🧐 But how do they acquire this ability? Introducing AltPrag, a dataset motivated by the idea of "alternatives" in pragmatics to trace during which phase LLMs learn pragmatic reasoning. [1/n]

Echoing the great work from @dongkeun_yoon, also share our updated preprint! 🧐 Do reasoning models verbalize their confidence better than instruct models? 🧐 🧐 Does RL provide additional benefits? 🧐 We explore this using a series of instruct and reasoning models... [1/n]





