Sabitlenmiş Tweet
OpenSource Connections
2.9K posts

OpenSource Connections
@o19s
We can help you Own Your Search by empowering your search team to succeed! Team Mentorship,Consulting: Solr, Elasticsearch, OpenSearch On fosstodon as o19s
Katılım Mayıs 2009
244 Takip Edilen1.3K Takipçiler

User Behavior Insights, a new & open standard way to record how users interact with search results and use this data to improve relevance, continues to advance with plugins for @OpenSearchProj @elastic & @ApacheSolr now available - and a shiny new website! ubisearch.dev
English
OpenSource Connections retweetledi

My journey with @o19s has come to an end - I’m looking forward to finding new challenges & opportunities in the world of search and AI. It’s been an amazing 6 years. I’ll be looking for new things to do from 30th Nov - get in touch if you’d like to discuss something exciting!
English

All the talk videos from #HaystackConf Europe 2024 are now publically available - some amazing insights here from leading minds in search & AI - check it out! youtube.com/playlist?list=…
English
OpenSource Connections retweetledi

Get early access to the rest of the talk videos from #HaystackConf EU - including our great lightning talks - by joining the @aiPoweredSearch community! community.aipoweredsearch.com/share/pZl9sxrV…
English
OpenSource Connections retweetledi

My talk on ColPali (What You See Is What You Search: Vision Language Models for PDF Retrieval) from #Haystack2024 is now online.
Excellent conference by @o19s, thank you @FlaxSearch for publishing so quickly youtube.com/watch?v=lURz8T…

YouTube
English

First #HaystackConf EU'24 talk videos are available! youtube.com/playlist?list=… featuring @treygrainger @jobergum and many other search experts - more to come soon!
English
OpenSource Connections retweetledi

If you couldn't make it to Haystack Europe, but you'd like to get early access to the talk videos, head over to the @aiPoweredSearch Community community.aipoweredsearch.com/share/pZl9sxrV… #HaystackConf
English
OpenSource Connections retweetledi

Some people blog about K8s, I prefer to blog about K9. opensourceconnections.com/blog/2024/10/1… #solr #elasticsearch #opensearch #opensource #lucene
English
OpenSource Connections retweetledi

During "Scaling @ApacheSolr: From Desktop to Cloud Scale” today at #CommunityOverCode NA 2024 (2 PM MDT), @dep4b chats with our #Search Infrastructure Team Lead Andrey Ukhanov about a few challenges our team has encountered in scaling #Solr
bloom.bg/3TX9X36
#opensource
English
OpenSource Connections retweetledi

I'll be at the inaugural #OpenSearch London Meetup next week - I can see a few familiar faces are coming! meetup.com/opensearch-pro…
English

Join @wrigley_dan at the Munich OpenSearch Meetup this Thursday October 10th to hear more about User Behaviour Insights, our joint project with AWS to create a shared, open way to record how users interact with search opensourceconnections.com/event/improve-…
English
OpenSource Connections retweetledi
OpenSource Connections retweetledi

@qdrant_engine @Haystack_AI Hey @qdrant_engine would you please take down and fix this post? The Haystack conference was nothing to do with Deepset's Haystack product. Correct attribution would be nice!
English

🎙 #VectorWeekly 𝐎𝐧 𝐂𝐨𝐥𝐏𝐚𝐥𝐢 𝐚𝐧𝐝 𝐆𝐫𝐚𝐩𝐡-𝐁𝐚𝐬𝐞𝐝 𝐀𝐝𝐚𝐩𝐭𝐢𝐯𝐞 𝐑𝐞-𝐑𝐚𝐧𝐤𝐢𝐧𝐠
This Monday-Tuesday, we visited the @Haystack_AI conference in Berlin and gathered insights for you, part of which formed this #VectorWeekly. Kudos to all the fantastic speakers and organizers!
📄 Graph-Based Adaptive Re-Ranking (talk by @macavaney)
Reranking typically works like this: retrieve X documents using a simple retriever (dense, sparse, or lexical), then rerank them with a more sophisticated heavy model, like a late interaction one or even a cross-encoder.
However, if your first-stage retriever misses the right document, no matter how good your reranker is, that document’s gone. What if, at the reranking stage, we would get access to the documents outside of the initially retrieved set?
Inspired by the Battleship game, adaptive reranking uses a Hierarchical Navigable Small World graph. It is built at the moment of indexing; therefore, it can be accessed with a simple lookup.
ℹ️ Here’s the process:
1. Retrieve the top-X results
2. Rerank them, and take the top-Z (Z < X) results (e.g., top 10).
3. Look up their neighbours in the HNSW graph.
4. If their similarity scores improve, continue exploring, staying within a chosen budget.
5. If not, stop.
Essentially, you’re adapting your reranking based on the discovered neighbours, like zeroing in on a target in Battleship.
🔗 More details here: arxiv.org/pdf/2405.01122
📄 ColPali (talk by @jobergum)
PDF retrieval is messy: good OCR, crafting heuristics to combine text-heavy and image-heavy parts, tricks to capture context... Contextualized Late Interaction over PaliGemma (ColPali) could be a way out of this maze of despair. It’s based on the visual language model PaliGemma and the late interaction approach (covered in #VectorWeekly two weeks ago).
ℹ️ ColPali directly embeds PDFs into vector representations, regardless of what’s on the page — if it can be printed, it can be embedded. Each page is represented by a contextualized set of 128-dimensional vectors, each encoding an image patch of 32x32 pixels.
The query is also encoded as a set of vectors, one per token. To compute similarity, for each query token, the most similar image patch on a page is found, and the dot product is summed across all query tokens, similar to any late interaction model.
ColPali has impressive results on the Visual Document Retrieval Benchmark (ViDoRe), which is reliable for evaluating retrieval not just on text-heavy documents but also on those with visual and tabular data.
🔗 More details here: arxiv.org/pdf/2407.01449
✍ Written by @krotenWanderung

English
OpenSource Connections retweetledi

@qdrant_engine @Haystack_AI Actually the event is nothing to do with deepset's HaystackAI ! We use the tag #HaystackConf
English
OpenSource Connections retweetledi

Here we are with a blog post about #VectorSearch and how to evaluate it!
sease.io/2024/10/evalua…
English
OpenSource Connections retweetledi

Recording and slides will be available after the conference! Great fun to talk about a topic I’m passionate about!
Aditya Varun Chadha@adichad
Directly indexing documents as visual artifacts to support multimodal search with vision language models #VLM, especially #ColPali in a #BiEncoder multivector late interaction architecture. Naturally repped as a Vespa tensor. @@jobergum’s motivating masterclass at #haystackconf
English
OpenSource Connections retweetledi

First up after lunch on day 2 of #HaystackConf is Pallavi Patil of @Yelp on LLM powered annotations

English
OpenSource Connections retweetledi

Gregor and Alexandra from @knowunity telling us about their mission to change education at #HaystackConf

English

