OpenSource Connections

2.9K posts

OpenSource Connections

OpenSource Connections

@o19s

We can help you Own Your Search by empowering your search team to succeed! Team Mentorship,Consulting: Solr, Elasticsearch, OpenSearch On fosstodon as o19s

Katılım Mayıs 2009
244 Takip Edilen1.3K Takipçiler
Sabitlenmiş Tweet
OpenSource Connections
PLEASE NOTE! We're now also on Fosstodon with the same handle o19s if you'd like to follow us there.
English
0
2
1
1.3K
OpenSource Connections retweetledi
Charlie Hull
Charlie Hull@FlaxSearch·
My journey with @o19s has come to an end - I’m looking forward to finding new challenges & opportunities in the world of search and AI. It’s been an amazing 6 years. I’ll be looking for new things to do from 30th Nov - get in touch if you’d like to discuss something exciting!
English
2
6
9
837
OpenSource Connections retweetledi
Eric Pugh
Eric Pugh@dep4b·
“Get your changes into production quick to reduce testing uncertainty”. Sound familiar David Fisher???
English
0
2
3
235
Qdrant
Qdrant@qdrant_engine·
🎙 #VectorWeekly 𝐎𝐧 𝐂𝐨𝐥𝐏𝐚𝐥𝐢 𝐚𝐧𝐝 𝐆𝐫𝐚𝐩𝐡-𝐁𝐚𝐬𝐞𝐝 𝐀𝐝𝐚𝐩𝐭𝐢𝐯𝐞 𝐑𝐞-𝐑𝐚𝐧𝐤𝐢𝐧𝐠⁣ This Monday-Tuesday, we visited the @Haystack_AI conference in Berlin and gathered insights for you, part of which formed this #VectorWeekly. Kudos to all the fantastic speakers and organizers! 📄 Graph-Based Adaptive Re-Ranking (talk by @macavaney) Reranking typically works like this: retrieve X documents using a simple retriever (dense, sparse, or lexical), then rerank them with a more sophisticated heavy model, like a late interaction one or even a cross-encoder. However, if your first-stage retriever misses the right document, no matter how good your reranker is, that document’s gone. What if, at the reranking stage, we would get access to the documents outside of the initially retrieved set? Inspired by the Battleship game, adaptive reranking uses a Hierarchical Navigable Small World graph. It is built at the moment of indexing; therefore, it can be accessed with a simple lookup. ℹ️ Here’s the process: 1. Retrieve the top-X results 2. Rerank them, and take the top-Z (Z < X) results (e.g., top 10). 3. Look up their neighbours in the HNSW graph. 4. If their similarity scores improve, continue exploring, staying within a chosen budget. 5. If not, stop. Essentially, you’re adapting your reranking based on the discovered neighbours, like zeroing in on a target in Battleship. 🔗 More details here: arxiv.org/pdf/2405.01122 📄 ColPali (talk by @jobergum) PDF retrieval is messy: good OCR, crafting heuristics to combine text-heavy and image-heavy parts, tricks to capture context... Contextualized Late Interaction over PaliGemma (ColPali) could be a way out of this maze of despair. It’s based on the visual language model PaliGemma and the late interaction approach (covered in #VectorWeekly two weeks ago). ℹ️ ColPali directly embeds PDFs into vector representations, regardless of what’s on the page — if it can be printed, it can be embedded. Each page is represented by a contextualized set of 128-dimensional vectors, each encoding an image patch of 32x32 pixels. The query is also encoded as a set of vectors, one per token. To compute similarity, for each query token, the most similar image patch on a page is found, and the dot product is summed across all query tokens, similar to any late interaction model. ColPali has impressive results on the Visual Document Retrieval Benchmark (ViDoRe), which is reliable for evaluating retrieval not just on text-heavy documents but also on those with visual and tabular data. 🔗 More details here: arxiv.org/pdf/2407.01449 ✍ Written by @krotenWanderung
Qdrant tweet media
English
5
3
28
2.3K
OpenSource Connections retweetledi
Jo Kristian Bergum
Jo Kristian Bergum@jobergum·
Recording and slides will be available after the conference! Great fun to talk about a topic I’m passionate about!
Aditya Varun Chadha@adichad

Directly indexing documents as visual artifacts to support multimodal search with vision language models #VLM, especially #ColPali in a #BiEncoder multivector late interaction architecture. Naturally repped as a Vespa tensor. @@jobergum’s motivating masterclass at #haystackconf

English
3
5
67
4.1K
OpenSource Connections retweetledi
Charlie Hull
Charlie Hull@FlaxSearch·
First up after lunch on day 2 of #HaystackConf is Pallavi Patil of @Yelp on LLM powered annotations
Charlie Hull tweet media
English
0
1
2
293
OpenSource Connections retweetledi
Charlie Hull
Charlie Hull@FlaxSearch·
Gregor and Alexandra from @knowunity telling us about their mission to change education at #HaystackConf
Charlie Hull tweet media
English
0
2
6
402