vespa.ai

592 posts

vespa.ai

@vespaengine

https://t.co/abkb8IjPSH - the open source platform for combining data and AI, online. Vectors/tensors, full-text, structured data; ML model inference at scale.

Katılım Eylül 2017

3 Takip Edilen3.6K Takipçiler

vespa.ai retweetledi

Jon Bratseth@jonbratseth·5d

Your agent wants to search like a 2010 quant. Retrieval for AI has passed through several stages of enlightenment. After the vector database craze, and the absorption of learnings from human information retrieval over the last half century, we might be officially entering the third stage with Perplexity's announcement of Search as Code. Search for humans has over the years been dumbed down to accommodate the average user's capacity for formulating advanced queries, to the point where the words typed into query boxes are merely treated as vague indicators of what the user might want. I see many replicating this way of thinking with AI agents, which leaves a lot of quality improvement on the table, as can be seen in Perplexity's benchmarks (ignore the "code" aspect as code execution is generally useful and where it runs doesn't matter for quality). Models are quite capable and there's no reason to limit their options to those of a casual human user! They should be able to search for the names of those involved near each other in text when researching a legal case, choose a pure semantic search prioritizing high-quality sources when seeking a broad overview of a topic, or select a year range and group by month when constructing a timeline of some events, and so on. An agent will typically string together many of these queries to reach its goal. First gaining an overview, then researching more specific topics, forming hypotheses, verifying important details in them. In short, search like an expert who knows what they are doing and really cares about the results— like a quant doing financial analysis. Doing this in practice, with your own data, is actually quite easy. The models already know how to write complex queries in the languages of well-known AI search engines; they just need to be told that they can, what fields are available, and what they mean, and what choices they have in ranking the results. How you tell them doesn’t actually matter that much; any simple textual description of what fields and ranking options are available will do. Models today are smart enough to use this effectively to connect their intents to specific YQL queries. When creating search for humans, developers need to implement solutions that work well across a broad set of use cases, which involves making trade-offs where some types of queries cannot be improved because doing so would impact other types. When implementing for agents, the focus shifts to providing a wide toolbox for the agent to use to address their varied informational needs: broad and highly specific lexical recall, metadata attributes for filtering, grouping, and aggregation, as well as different ranking methods suited to different needs. Developers working on agentic search should shift their focus from replicating search for average humans the much richer capabilities traditionally provided by solutions for competent professionals. It’s time to let your agents search like a 2010 quant.

English

486

vespa.ai@vespaengine·25 Haz

vespaai.live

ZXX

vespa.ai@vespaengine·25 Haz

Who’s Vespa.ai Live for? If you’re building search, RAG, personalization, or recommendation systems at scale, this one’s for you. Engineers, architects, researchers, and practitioners are heading to London this September to swap notes on what’s actually working (and what isn’t) in production AI search, ranking, recommendations, and retrieval. Real talks. Real production lessons. No fluff. Hear real production stories from teams at Walmart, Etsy, RavenPack, Sease, and Kleinanzeigen. 📍 London, Lumiere London 📅 9–10 September 2026

English

242

vespa.ai@vespaengine·22 Haz

A nice overview of the features that makes Vespa the AI search platform sease.io/2026/06/the-ai…

English

277

vespa.ai@vespaengine·22 Haz

check out the demo here: vespa-demos.ai

English

194

vespa.ai@vespaengine·22 Haz

Image search usually means stitching together a vector database, a keyword engine, and an inference service: three systems and a pile of glue. We built a demo that does it in one Vespa application instead. Type a phrase, get the right photos! 💥 It's deliberately simple, and that's the point. Vespa does the whole pipeline in one place: 🔤 BM25 matches the literal terms in captions: "Nikon", "red", "Berlin" 🧠 SigLIP embeddings match meaning: "cozy reading nook" finds the vibe, no keywords needed ⚖️ Blended in a single query: semantic for intent, lexical for precision, fused into one ranking 🧩 Embeddings generated inside Vespa: text in, vectors out, no separate inference service ⚡ ~25k photos, sub-second, end to end

English

3.4K

vespa.ai@vespaengine·17 Haz

@themintsv yes, we do. please take a look at: #ranking.matching.anntimeout.enable" target="_blank" rel="nofollow noopener">docs.vespa.ai/en/reference/a… and docs.vespa.ai/en/reference/q…

English

The Mint@themintsv·16 Haz

@vespaengine Do you also return a flag that you returned an incomplete result? Otherwise, it will be hard to debug failure cases (you will not know whether the matching or kNN look-up failed).

English

vespa.ai@vespaengine·16 Haz

Stop tuning your vector search. Start budgeting it ⏱️ Vespa's HNSW search can now terminate early - and it changes how you think about latency vs. recall! HNSW keeps a queue of the best results found so far. So instead of waiting for it to finish, you can just take the queue early. Vespa now exposes this in two independent ways: 🎯 ANN time budget: stop hand-tuning targetHits and exploration params. Set them generously, cap search time with ranking.matching.anntimebudget, and get the best recall your latency budget allows. 🚨 ANN timeout: a safety net that cuts ANN short before the soft timeout, leaving time for matching. You return degraded results instead of no results. And when it fires, you know; it's flagged in the query response and counted in a new metric. (Heads up: this will soon be on by default.) 📊 Bonus: the new query_approximate_nns_time metric tracks time spent in ANN, and its count dimension tells you exactly how many queries actually ran ANN vs. falling back to exact search. One is for tweaking. One is for rescue. Both mean no more parameter whack-a-mole. ⚡️ Start budgeting your ANN searches.

English

1.4K

vespa.ai@vespaengine·16 Haz

Doc link: #early-termination-of-approximate-nearest-neighbor-search" target="_blank" rel="nofollow noopener">docs.vespa.ai/en/performance…

English

187

vespa.ai retweetledi

Jon Bratseth@jonbratseth·5 Haz

Perplexity taking this to its logical conclusion and then some, but the core idea here is important and not yet widely understood: Agents are able to tailor queries in detail to what they are trying to do. This is makes a huge difference in cost and quality. For example: - When researching case law, search for the names of those involved near each other in text. - When seeking an overview of a broader topic, do a pure vector search prioritizing authoritative sources. - When making timeline of some historic development, limit by year range and group by month. And so on. Models already know YQL, all you need is to tell them what they have to work with.

Aravind Srinivas@AravSrinivas

We’re moving away from search as a web fetch tool call to search as codegen to be future proof in a world where code execution inside agent harnesses is the way to do almost all of our knowledge work. Doing this lets you compose multi-step primitives far more naturally and be much more adaptable to changes made to the agent harness, as well as benefit from improvements in coding capabilities that are guaranteed to come from the next generation of frontier models.

English

758

vespa.ai@vespaengine·29 May

Vespa skills for coding agents: Claude Code and competitors are very useful for working on Vespa applications. Everything is always completely specified in application packages, which are in their favorite form factor. Still, they do better with a good collection of skills. Skills are a dime a dozen though, what you want are skills that are proven by evals to carry more than their own weight. We're maintaining a collection of optimized and evaluated skills you can grab here: github.com/vespaai-playgr…

English

229

vespa.ai retweetledi

Haystack@Haystack_AI·1 Haz

Vespa is now available as a Document Store in Haystack. Use VespaDocumentStore for hybrid and semantic search with a powerful, production-ready engine, and pair it with VespaEmbeddingRetriever to index and retrieve documents directly in your pipelines. Metadata filtering included. @vespaengine excels at large-scale information retrieval with advanced features like real-time indexing, multi-modal search, and distributed document management - ideal for applications that demand both speed and sophistication. 🐍 pip install vespa-haystack 🔗 haystack.deepset.ai/integrations/v…

English

365

vespa.ai retweetledi

vespa.ai@vespaengine·29 May

Detailed metric dashboards: Some people like to dig deep, and using a cloud solution shouldn't prevent them. We've made all the dashboards user by Vespa operators available in the Vespa Cloud console out of the box.

English

259

vespa.ai@vespaengine·29 May

@MistralAI Docs to get started: docs.mistral.ai/studio-api/kno…

English

337

vespa.ai@vespaengine·29 May

Vespa isn't only powering American AI, now it's also @MistralAI's retrieval solution

English

502

vespa.ai retweetledi

Thomas Thoresen@thomas_thoresen·29 May

bm25 is nice and all, but you won't believe how easy it is to improve upon it with and how much more you can squeeze from lexical features in @vespaengine

English

vespa.ai@vespaengine·29 May

Cluster-size independent config of relevance effort: Vespa has several parameters that let you influence how much effort should be spent on relevance quality for a query, such as ANN and WeakAnd targetHits, and second-phase rerank-coint. These have been set in terms of amounts per node, which is impractical when the number of nodes change, and doesn't work well with autoscaling. Now we have added variants prefixed by "total" which sets the effort over the whole cluster instead.

English

287

vespa.ai@vespaengine·28 May

The May Vespa.ai Newsletter is out! This month we’re announcing updates focused on retrieval quality, ranking flexibility, and developer productivity (agents: try out the skills and let us know). - Vespa Cloud: Detailed metric dashboards - Vespa Cloud: Index backup - Vespa Cloud: Fine-grained maintenance controls - Vespa Cloud: Custom resource tags - Vespa skills for agents - @VoyageAI , @OpenAI , and @MistralAI embedders - A new query operator for text matching - Cluster-size independent config of relevance effort - Boolean array fields - Match specific array elements - In-memory document ids - Search group pinning - Near matching aware ranking - Detect ignored write operations - Accessing the max first phase score in re-ranking - Geo distance in grouping Read it here: blog.vespa.ai/vespa-newslett…

English

857

vespa.ai@vespaengine·29 May

One skill that will particularly accelerate the move to modern AI search is our ElasticSearch migration skill: github.com/vespaai-playgr… We're already getting feedback saying this enabled people to migrate complex ES applications to Vespa in less than a day.

English

232

vespa.ai@vespaengine·29 May

Fine-grained maintenance controls: One consequence of Mythos-like capabilities soon being broadly available is that we'll need to get used to upgrade all parts of the software stack much more frequently. You're already working on a plan for this, right? Those running on Vespa Cloud are already in great shape here since platform and OS upgrades just happen automatically. We'll need to make OS upgrades more frequent though, which means they'll be more intrusive. That's why we have added controls that allows you to specify at what time windows they can happen.

English

Keşfet

@themintsv @MistralAI @VoyageAI @OpenAI @elonmusk @BarackObama @taylorswift13 @cristiano