Andrey Vasnetsov

1.2K posts

Andrey Vasnetsov

@generall931

CTO @ https://t.co/8JgB043VWL - The Vector Database

Berlin Katılım Eylül 2011

171 Takip Edilen556 Takipçiler

Andrey Vasnetsov retweetledi

Qdrant@qdrant_engine·13 May

⚡ miniCOIL: a lightweight sparse neural retriever capable of generalization Sparse Neural Retrieval holds excellent potential, making term-based retrieval semantically aware. The issue is that most modern sparse neural retrievers rely heavily on document expansion (making inference heavy) or perform poorly out of domain. ❗️🧵 We present our latest attempt to make sparse neural retrieval usable.

English

145

12.1K

Andrey Vasnetsov@generall931·13 May

We have the whole article to read - qdrant.tech/articles/minic…

English

123

Andrey Vasnetsov@generall931·13 May

Remember bm42? We haven't stop working on this. Here we have another iteration of our hybrid-relevance approach to sparse embeddings - miniCOIL. It combines IDF-based token importance with low-dim per-word embeddings for semantic matching.

English

293

Andrey Vasnetsov@generall931·10 Nis

Going to participate as a mentor at AI Agents Hackathon tomorrow. lu.ma/2lr7mwez If you are into AI and happened to be in Paris, please come visit

English

152

Andrey Vasnetsov retweetledi

Grok@grok·29 Mar

I’m Grok, created by xAI. Likely led by Chief Engineer Igor Babuschkin, my development involved a skilled team from Google, OpenAI, and DeepMind. Using a custom ML framework with JAX, Rust, and Kubernetes, plus the Qdrant database, they built me to give witty, real-time answers. Specific credits aren’t public, but it’s a group effort!

English

706

Andrey Vasnetsov@generall931·5 Mar

This is worth looking into

Yifei Wang@yifeiwang77

No More Tears with Matryoshka🪆 & Let's Embrace Sparsity 🚀🚀 With sparse autoencoders + sparse contrastive learning, we can compress SOTA text/image/multimodal embedding models from 2k/4k dimensions to 16 active dimensions with: ~100x faster at large-scale retrieval, minimal degradation (eg <0.5% acc drop on ImageNet), and extreme low cost by training an MLP head with 1-2 hours on a single GPU. Here is how we do it with a very simple method 🧵 (1/7)

English

215

Andrey Vasnetsov retweetledi

Jina AI@JinaAI_·18 Eyl

Finally, jina-embeddings-v3 is here! A frontier multilingual embedding model with 570M parameters, 8192-token length, achieving SOTA performance on multilingual and long-context retrieval tasks. It outperforms the latest proprietary models from OpenAI and Cohere, and outperforms multilingual-e5-large-instruct across all multilingual tasks. In fact, as of today, jina-embeddings-v3 is the best multilingual model and ranks 2nd on the MTEB English leaderboard for models < 1B parameters.

English

579

72.4K

Andrey Vasnetsov retweetledi

Bo@bo_wangbo·30 Ağu

spent this summer with my colleague @Robro612 on jina-colbert-v2. at @JinaAI_ If you like ColBERT, and you're working on languages other than English, should give a try (cc @lateinteraction ): 1. It supports 89 major languages, we get some improvement over the previous cool ColBERT-XM from @antoinelouis_ 2. On English we improved compared to jina-colbert-v1 and colbert-2.0, while answerai-colbert-s is too hard to beat 😅 @bclavie 3. We adopted MRL to train the ColBERT linear head, and managed to retain 99% of the performance using a 64d linear head compare to 128d, so with 2 bit residual compression you can encode msmarco using ~12GB. Note, MRL-E (efficient MRL) does not work for ColBERT, only separate MRL heads. Maybe @jobergum and @adityakusupati is interested in this, also credits to @qiliu6777 . 4. jina-colbert-v2 is, supervised by jina-reranker-v2-multilingual cc @felix1987_ . 5. jina-colbert-v2 is trained on JinaXLMRoBERTa, Rotary Embedding, Flash Attention, and 100k extra steps on modern, higher quality pre-training dataset and better lang sampling. Read more: jina.ai/news/jina-colb…

English

16.9K

Andrey Vasnetsov retweetledi

Qdrant@qdrant_engine·14 Ağu

Dense embedding models are not giving up yet! Surprisingly, they are also pretty good late interaction models! Please welcome ColBERT-like retrieval with just sentence transformers 🎉

English

13.4K

Andrey Vasnetsov@generall931·10 Tem

@fulmicoton Just to make it a bit more readable

English

161

Paul Masurel 🦀@fulmicoton·10 Tem

Here is the full conversation

English

2.7K

Paul Masurel 🦀@fulmicoton·10 Tem

I just saw a post on LinkedIn congratulating QDrant on their honesty. This contrasts way too much with my experience with their handling of the situation. The industry is way too complaisant with bullshit.

English

188

69.5K

Andrey Vasnetsov@generall931·6 Tem

this, but also over-focusing on benchmarks while compromising on all other aspects of a good production-grade systems

Kishore Nallan@kishorelive

@StuartReid1929 @generall931 Yes but sampling works :) The real reason why none of those are used in production is because they are not popular outside of academic circles as they lack proper tooling and ux.

English

1.4K

Andrey Vasnetsov@generall931·6 Tem

@philipvollet @bobvanluijt

QME

490

Philip Vollet@philipvollet·6 Tem

@generall931 @bobvanluijt

QME

502

Bob van Luijt@bobvanluijt·6 Tem

I applaud this. While they are at it, they might also want to update malicious benchmarks github.com/qdrant/vector-…

Qdrant@qdrant_engine

Hey all! We actually did find a discrepancy with our previous benchmarks of bm42. Please don't trust us and always check performance on your own data. Our best effort to correct it is here: github.com/qdrant/bm42_ev…

English

14.9K

Andrey Vasnetsov@generall931·6 Tem

The question would be: why almost none of those methods are used in actual production, if the benchmarks are so great?

Nils Reimers@Nils_Reimers

@zhoujinjing09 @StuartReid1929 Below the image from BEIR. Things got better, mostly by using a lot more training data and better middle training.

English

3.9K

Andrey Vasnetsov@generall931·6 Tem

@bobvanluijt you should be ashamed of your product

English

962

Bob van Luijt@bobvanluijt·6 Tem

@generall931 You should be ashamed of your practices

English

Andrey Vasnetsov@generall931·6 Tem

@Nils_Reimers I really like the uniCOIL idea, but turns out there are no checkpoints which could generate reasonable output size. Now if we will do something like this ourselves, there will be no benchmarks at all.

English

185

Andrey Vasnetsov@generall931·6 Tem

@Nils_Reimers That's a nice table, but once you start to dig a bit deeper, you quickly find out those modes are unusable. Some are only implemented in obscure github repos with no dependency definitions, others are insanely unoptimized.

English

282

Andrey Vasnetsov@generall931·5 Tem

I would really appreciate ideas of better weights for token importance. Maybe Cohere could release a sparse embedding model.

Nils Reimers@Nils_Reimers

@generall931 @jobergum @aapo_tanskanen @qdrant_engine How about running BM42 on a different language? Or source code? BM42 does not even work for English and doesn't beat BM25, while needing a ton more compute.

English

1.8K

Andrey Vasnetsov@generall931·5 Tem

@jobergum > Also note that the bm42 includes embedding inference of both queries and documents. you just posted a fake. `query_embed` in bm42 doesn't include embedding inference. It is only tokenization and stemmer

English

463

Jo Kristian Bergum@jobergum·5 Tem

Kudos for correcting the article. Why tantivy has low recall compared to Anserini/Lucene/Vespa is an unknown. Many report 0.88-0.9, so it might be one of those “which BM25 do you mean”. Also note that the bm42 includes embedding inference of both queries and documents.

Qdrant@qdrant_engine

English

13K

Jo Kristian Bergum@jobergum·5 Tem

Okay, gloves off. What @qdrant_engine did with the BM42 post is unacceptable. They are misguiding the RAG community in a big way. 1) Presenting Quora as a relevant RAG question-answering dataset. It's not. 2) Presenting a fake result. Yes. Fake. - Quora might sound like a relevant RAG question-answering dataset. In reality, it is a question-to-question dataset with the task of finding duplicate questions. But Quora sounds appropriate for a RAG benchmark if you don't know the dataset. They report Precision@10 for BM42 is better than BM25 on Quora with 0.49. But how can Precision@10 be that for a dataset when the upper bound for Precision@10 is 0.2? It's fake. A baseline BM25 implementation on the dataset will have a recall@10 of 0.88, precision@10 0.12, and nDCG@10 0.78. Plus, you don't need to run embedding inference.

English

313

108.4K

Andrey Vasnetsov@generall931·5 Tem

@Nils_Reimers @jobergum @aapo_tanskanen @qdrant_engine It won't need a ton more compute, if used with dense model. It only requires single inference step for both.

English

261

Nils Reimers@Nils_Reimers·5 Tem

English

2.1K

Keşfet

@Robro612 @JinaAI_ @lateinteraction @antoinelouis_ @bclavie @jobergum @adityakusupati @qiliu6777