Andrey Vasnetsov

1.2K posts

Andrey Vasnetsov banner
Andrey Vasnetsov

Andrey Vasnetsov

@generall931

CTO @ https://t.co/8JgB043VWL - The Vector Database

Berlin Katılım Eylül 2011
171 Takip Edilen556 Takipçiler
Andrey Vasnetsov retweetledi
Qdrant
Qdrant@qdrant_engine·
⚡ miniCOIL: a lightweight sparse neural retriever capable of generalization Sparse Neural Retrieval holds excellent potential, making term-based retrieval semantically aware. The issue is that most modern sparse neural retrievers rely heavily on document expansion (making inference heavy) or perform poorly out of domain. ❗️🧵 We present our latest attempt to make sparse neural retrieval usable.
Qdrant tweet media
English
2
25
145
12.1K
Andrey Vasnetsov
Andrey Vasnetsov@generall931·
Remember bm42? We haven't stop working on this. Here we have another iteration of our hybrid-relevance approach to sparse embeddings - miniCOIL. It combines IDF-based token importance with low-dim per-word embeddings for semantic matching.
Andrey Vasnetsov tweet media
English
2
0
6
293
Andrey Vasnetsov
Andrey Vasnetsov@generall931·
Going to participate as a mentor at AI Agents Hackathon tomorrow. lu.ma/2lr7mwez If you are into AI and happened to be in Paris, please come visit
English
0
0
0
152
Andrey Vasnetsov retweetledi
Grok
Grok@grok·
I’m Grok, created by xAI. Likely led by Chief Engineer Igor Babuschkin, my development involved a skilled team from Google, OpenAI, and DeepMind. Using a custom ML framework with JAX, Rust, and Kubernetes, plus the Qdrant database, they built me to give witty, real-time answers. Specific credits aren’t public, but it’s a group effort!
English
3
2
5
706
Andrey Vasnetsov retweetledi
Jina AI
Jina AI@JinaAI_·
Finally, jina-embeddings-v3 is here! A frontier multilingual embedding model with 570M parameters, 8192-token length, achieving SOTA performance on multilingual and long-context retrieval tasks. It outperforms the latest proprietary models from OpenAI and Cohere, and outperforms multilingual-e5-large-instruct across all multilingual tasks. In fact, as of today, jina-embeddings-v3 is the best multilingual model and ranks 2nd on the MTEB English leaderboard for models < 1B parameters.
English
11
93
579
72.4K
Andrey Vasnetsov retweetledi
Bo
Bo@bo_wangbo·
spent this summer with my colleague @Robro612 on jina-colbert-v2. at @JinaAI_ If you like ColBERT, and you're working on languages other than English, should give a try (cc @lateinteraction ): 1. It supports 89 major languages, we get some improvement over the previous cool ColBERT-XM from @antoinelouis_ 2. On English we improved compared to jina-colbert-v1 and colbert-2.0, while answerai-colbert-s is too hard to beat 😅 @bclavie 3. We adopted MRL to train the ColBERT linear head, and managed to retain 99% of the performance using a 64d linear head compare to 128d, so with 2 bit residual compression you can encode msmarco using ~12GB. Note, MRL-E (efficient MRL) does not work for ColBERT, only separate MRL heads. Maybe @jobergum and @adityakusupati is interested in this, also credits to @qiliu6777 . 4. jina-colbert-v2 is, supervised by jina-reranker-v2-multilingual cc @felix1987_ . 5. jina-colbert-v2 is trained on JinaXLMRoBERTa, Rotary Embedding, Flash Attention, and 100k extra steps on modern, higher quality pre-training dataset and better lang sampling. Read more: jina.ai/news/jina-colb…
English
5
10
84
16.9K
Andrey Vasnetsov retweetledi
Qdrant
Qdrant@qdrant_engine·
Dense embedding models are not giving up yet! Surprisingly, they are also pretty good late interaction models! Please welcome ColBERT-like retrieval with just sentence transformers 🎉
Qdrant tweet media
English
1
16
81
13.4K
Paul Masurel 🦀
Paul Masurel 🦀@fulmicoton·
I just saw a post on LinkedIn congratulating QDrant on their honesty. This contrasts way too much with my experience with their handling of the situation. The industry is way too complaisant with bullshit.
English
4
24
188
69.5K
Andrey Vasnetsov
Andrey Vasnetsov@generall931·
this, but also over-focusing on benchmarks while compromising on all other aspects of a good production-grade systems
Kishore Nallan@kishorelive

@StuartReid1929 @generall931 Yes but sampling works :) The real reason why none of those are used in production is because they are not popular outside of academic circles as they lack proper tooling and ux.

English
0
0
3
1.4K
Andrey Vasnetsov
Andrey Vasnetsov@generall931·
@Nils_Reimers I really like the uniCOIL idea, but turns out there are no checkpoints which could generate reasonable output size. Now if we will do something like this ourselves, there will be no benchmarks at all.
English
0
0
1
185
Andrey Vasnetsov
Andrey Vasnetsov@generall931·
@Nils_Reimers That's a nice table, but once you start to dig a bit deeper, you quickly find out those modes are unusable. Some are only implemented in obscure github repos with no dependency definitions, others are insanely unoptimized.
English
2
0
0
282
Andrey Vasnetsov
Andrey Vasnetsov@generall931·
@jobergum > Also note that the bm42 includes embedding inference of both queries and documents. you just posted a fake. `query_embed` in bm42 doesn't include embedding inference. It is only tokenization and stemmer
Andrey Vasnetsov tweet media
English
1
0
5
463
Jo Kristian Bergum
Jo Kristian Bergum@jobergum·
Kudos for correcting the article. Why tantivy has low recall compared to Anserini/Lucene/Vespa is an unknown. Many report 0.88-0.9, so it might be one of those “which BM25 do you mean”. Also note that the bm42 includes embedding inference of both queries and documents.
Qdrant@qdrant_engine

Hey all! We actually did find a discrepancy with our previous benchmarks of bm42. Please don't trust us and always check performance on your own data. Our best effort to correct it is here: github.com/qdrant/bm42_ev…

English
2
1
26
13K
Jo Kristian Bergum
Jo Kristian Bergum@jobergum·
Okay, gloves off. What @qdrant_engine did with the BM42 post is unacceptable. They are misguiding the RAG community in a big way. 1) Presenting Quora as a relevant RAG question-answering dataset. It's not. 2) Presenting a fake result. Yes. Fake. - Quora might sound like a relevant RAG question-answering dataset. In reality, it is a question-to-question dataset with the task of finding duplicate questions. But Quora sounds appropriate for a RAG benchmark if you don't know the dataset. They report Precision@10 for BM42 is better than BM25 on Quora with 0.49. But how can Precision@10 be that for a dataset when the upper bound for Precision@10 is 0.2? It's fake. A baseline BM25 implementation on the dataset will have a recall@10 of 0.88, precision@10 0.12, and nDCG@10 0.78. Plus, you don't need to run embedding inference.
English
19
31
313
108.4K