Prajjwal Yadav

237 posts

Prajjwal Yadav

@PrajjwalYd

Developer Advocate @weaviate_io | UX Approver @KnativeProject

India เข้าร่วม Temmuz 2018

231 กำลังติดตาม82 ผู้ติดตาม

ทวีตที่ปักหมุด

Prajjwal Yadav@PrajjwalYd·16h

Just told Claude to import a PDF into Weaviate. It figured out the schema, vectorised every page, batched the whole thing. I typed one sentence (works with CSV and JSON/JSONL too, but that feels less impressive to say) This is the part where I'm supposed to feel useful Try the Weaviate agent skills: github.com/weaviate/agent…

English

1.6K

Prajjwal Yadav@PrajjwalYd·19h

@kenton_parton @victorialslocum Yes, exactly!

English

Kenton Parton@kenton_parton·1d

@victorialslocum When using this retrieval approach, how do you recommend handling the QA/generation stage? Am I right you would need to use a Multi-modal LLM and feed the image/audio into it and have it generate an answer?

English

161

Prajjwal Yadav รีทวีตแล้ว

Victoria Slocum@victorialslocum·1d

We've been cramming podcasts, PDFs, and videos through a text converter for years. Every single conversion loses something critical. Got a podcast? Transcribe it. A PDF with diagrams? OCR it and hope for the best. A video tutorial? Pray someone wrote good captions. Every conversion came with a tax - distortion, loss, a little less of the original thing. But what if we could work with data in its native form and still search across all of it? That's exactly what 𝗺𝘂𝗹𝘁𝗶𝗺𝗼𝗱𝗮𝗹 𝗲𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴𝘀 make possible. They map text, images, audio, and video into the same embedding space - meaning you can search across all of them 𝘸𝘪𝘵𝘩𝘰𝘶𝘵 converting anything to text first. Query with text, get back the relevant audio clip. Search with an image, retrieve similar video moments. The format doesn't matter anymore because semantically similar content lands near each other in vector space, regardless of modality. This is enabled through 𝗰𝗼𝗻𝘁𝗿𝗮𝘀𝘁𝗶𝘃𝗲 𝗹𝗲𝗮𝗿𝗻𝗶𝗻𝗴. You train encoders for different modalities simultaneously - paired inputs (like an image and its caption) should end up close in embedding space, unpaired inputs should land far apart. Run this over hundreds of millions of pairs, and the encoders converge on a shared geometry where meaning dominates over format. CLIP proved this at scale for image-text back in 2021. ImageBind extended it to six modalities. But there was a persistent problem: training separate encoders for each modality created gaps in the embedding space that degraded accuracy. The latest generation of models (like Gemini Embedding 2) solve this by training all modalities jointly from scratch in a single unified architecture. And 𝘵ℎ𝘢𝘵'𝘴 what makes the examples below practical rather than theoretical. 𝗧𝗵𝗿𝗲𝗲 𝘀𝘆𝘀𝘁𝗲𝗺𝘀 𝘆𝗼𝘂 𝗰𝗮𝗻 𝗯𝘂𝗶𝗹𝗱 𝘁𝗼𝗱𝗮𝘆: 1️⃣ 𝗔𝘂𝗱𝗶𝗼 𝘀𝗲𝗮𝗿𝗰𝗵 𝘄𝗶𝘁𝗵𝗼𝘂𝘁 𝘁𝗿𝗮𝗻𝘀𝗰𝗿𝗶𝗽𝘁𝘀: Split audio into chunks, embed them natively, query in text or audio. The generation model listens to retrieved clips and answers based on what it ℎ𝘦𝘢𝘳𝘴 - breath, pauses, emphasis - not just words. 2️⃣ 𝗣𝗗𝗙𝘀 𝗮𝘀 𝘃𝗶𝘀𝘂𝗮𝗹 𝗱𝗼𝗰𝘂𝗺𝗲𝗻𝘁𝘀: Convert each page to an image and index it. Diagrams, tables, complex layouts stay intact. The LLM reads the page as an image. 3️⃣ 𝗙𝗶𝗻𝗱𝗶𝗻𝗴 𝗺𝗼𝗺𝗲𝗻𝘁𝘀 𝗶𝗻 𝘃𝗶𝗱𝗲𝗼: 15-second video chunks get indexed as raw video bytes. Query retrieves segments where the right thing ℎ𝘢𝘱𝘱𝘦𝘯𝘦𝘥, not just where the right words were spoken. 𝗧𝗵𝗶𝘀 𝗶𝘀𝗻'𝘁 𝗮 𝘂𝗻𝗶𝘃𝗲𝗿𝘀𝗮𝗹 𝘂𝗽𝗴𝗿𝗮𝗱𝗲. Text embeddings are still better (and cheaper) for pure text retrieval. But multimodal models enable working with data in its native form - and they're only getting better. Check out this blog by @PrajjwalYd for more, and links to all the notebooks! weaviate.io/blog/multimoda…

English

187

6.7K

Prajjwal Yadav รีทวีตแล้ว

Arindam Majumder 𝕏@Arindam_1729·2d

Most AI coding agents are bad at vector DB queries. They write code that looks right, but the retrieval strategy is wrong (semantic vs hybrid, filters, limits, schema assumptions). @weaviate_io Agent Skills fixes this. It gives the agent structured guidance for: → schema inspection → ingestion patterns → semantic + hybrid search → query planning So tools like Claude Code generate RAG pipelines you can actually ship, not vibes-based queries. Result: faster builds + far fewer broken retrieval systems.

English

3.8K

Prajjwal Yadav รีทวีตแล้ว

Victoria Slocum@victorialslocum·26 Mar

If you're building a PDF RAG pipeline: Should you be using OCR and 𝘁𝗲𝘅𝘁-𝗯𝗮𝘀𝗲𝗱 𝗿𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹 methods, or just 𝗲𝗺𝗯𝗲𝗱 𝗶𝗺𝗮𝗴𝗲𝘀 𝗱𝗶𝗿𝗲𝗰𝘁𝗹𝘆 using late interaction models? This paper says the answer might actually be 𝘣𝘰𝘵𝘩. My colleagues at Weaviate released IRPAPERS, a benchmark comparing 𝗶𝗺𝗮𝗴𝗲-𝗯𝗮𝘀𝗲𝗱 and 𝘁𝗲𝘅𝘁-𝗯𝗮𝘀𝗲𝗱 retrieval over 3,230 pages from 166 scientific papers. The setup: Take the same PDFs and process them two ways. For text, run OCR with GPT-4.1 and embed with Arctic 2.0 + BM25 hybrid search. For images, embed raw page images with ColModernVBERT multi-vector embeddings. Test both on 180 needle-in-the-haystack questions. 𝗧𝗵𝗲 𝗿𝗲𝘀𝘂𝗹𝘁𝘀: Text edges out images at the top rank: 46% vs 43% Recall@1 But images match or exceed text at deeper recall: 93% vs 91% Recall@20 But text and image based methods actually fail on 𝘥𝘪𝘧𝘧𝘦𝘳𝘦𝘯𝘁 𝘲𝘶𝘦𝘳𝘪𝘦𝘴. At Recall@1: • 22 queries succeed with text but fail with images • 18 queries succeed with images but fail with text This complementarity is what makes 𝗠𝘂𝗹𝘁𝗶𝗺𝗼𝗱𝗮𝗹 𝗛𝘆𝗯𝗿𝗶𝗱 𝗦𝗲𝗮𝗿𝗰𝗵 work. By fusing scores from both text and image retrieval, they achieved: • 49% Recall@1 (beating either modality alone) • 81% Recall@5 • 95% Recall@20 More in the video below 🔽 Dataset: huggingface.co/datasets/mteb/… Paper: arxiv.org/abs/2602.17687 Code: github.com/weaviate/IRPAP…

English

102

747

43K

Prajjwal Yadav@PrajjwalYd·25 Mar

@kamtybor I think this happened beacause you forgot adding "make no mistakes"

English

Kamil Tyborowski@kamtybor·25 Mar

ZXX

Prajjwal Yadav@PrajjwalYd·17 Mar

@kamtybor sorry, my claude got confused and started practising on the wrong platform

English

Kamil Tyborowski@kamtybor·17 Mar

@PrajjwalYd wrong platform sir

English

Kamil Tyborowski@kamtybor·17 Mar

if you are not cross posting to linkedin like this you ngmi

English

Prajjwal Yadav รีทวีตแล้ว

Bob van Luijt@bobvanluijt·16 Mar

👀 And what did my eyes see during Jensen's #GTC keynote...?

English

5.5K

Prajjwal Yadav@PrajjwalYd·13 Mar

Working with rich media often means turning everything into text. OCR for docs. Whisper for audio. Captioning for video. Basically, flattening everything into sad little strings and hoping the meaning carries through. (Often, it doesn’t) Because, real world knowledge doesn’t really live as clean strings. @googleaidevs recently released Gemini Embedding 2, which maps text, images, audio, and video into a single embedding space, enabling multimodal retrieval and classification across different types of media. We wired it up with the multi2vec_google_gemini module in @weaviate_io and and now the pipeline for embedding multimodal data looks a bit different: - For PDFs: You can skip the text parsing. Convert pages directly to images, embed the visual layout, and let Gemini 3 Flash extract the answers. - For Audio: Slice mp3s, embed the raw audio chunks, and run semantic search directly over your sound files. - For Video: Drop frame captioning. Slice and chunk the mp4, index the clips, and pass the retrieved context directly to the generative model. Wrote three notebooks mapping out the exact steps: - PDF: github.com/weaviate/recip… - Video: github.com/weaviate/recip… - Audio: github.com/weaviate/recip…

English

1.1K

Prajjwal Yadav@PrajjwalYd·13 Mar

wait… is this that mysterious thing called filtering?

Philip Vollet@philipvollet

sure, but:

English

Prajjwal Yadav รีทวีตแล้ว

Philip Vollet@philipvollet·13 Mar

sure, but:

Nishkarsh@contextkingceo

We've raised $6.5M to kill vector databases. Every system today retrieves context the same way: vector search that stores everything as flat embeddings and returns whatever "feels" closest. Similar, sure. Relevant? Almost never. Embeddings can’t tell a Q3 renewal clause from a Q1 termination notice if the language is close enough. A friend of mine asked his AI about a contract last week, and it returned a detailed, perfectly crafted answer pulled from a completely different client’s file. Once you’re dealing with 10M+ documents, these mix-ups happen all the time. VectorDB accuracy goes to shit. We built @hydra_db for exactly this. HydraDB builds an ontology-first context graph over your data, maps relationships between entities, understands the 'why' behind documents, and tracks how information evolves over time. So when you ask about 'Apple,' it knows you mean the company you're serving as a customer. Not the fruit. Even when a vector DB's similarity score says 0.94. More below ⬇️

English

7.3K

Prajjwal Yadav รีทวีตแล้ว

Weaviate AI Database@weaviate_io·12 Mar

The era of juggling 5 different embedding models is over. Google just unified text, images, video, audio, and PDFs into one vector space. 𝗢𝗻𝗲 𝗺𝗼𝗱𝗲𝗹, 𝗺𝘂𝗹𝘁𝗶𝗽𝗹𝗲 𝗺𝗼𝗱𝗮𝗹𝗶𝘁𝗶𝗲𝘀: Text, images, video, audio, and PDFs all mapped into a single unified vector space. No more juggling different embedding models or complex preprocessing pipelines. 𝗕𝘂𝗶𝗹𝘁 𝗼𝗻 𝗚𝗲𝗺𝗶𝗻𝗶 𝗮𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲 with support for 100+ languages and some impressive specs: • 8192 max input tokens • Flexible output dimensions (128-3072) • Top 5 performance on MTEB Multilingual leaderboard • SOTA among proprietary models across most modalities 𝗪𝗵𝘆 𝘁𝗵𝗶𝘀 𝗺𝗮𝘁𝘁𝗲𝗿𝘀 𝗳𝗼𝗿 𝘆𝗼𝘂𝗿 𝗥𝗔𝗚 𝗮𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻𝘀: By natively handling interleaved data without intermediate processing steps, Gemini Embedding 2 simplifies complex pipelines. You can now build semantic search and recommendation systems that seamlessly work across text documents, images, videos, and audio files. The model is available now via Gemini API and Vertex AI, and works with Weaviate's existing text2vec-google integration 💚 Check out these recipes to get started 👇 Semantic search/RAG over video: github.com/weaviate/recip… Semantic search/RAG over audio: github.com/weaviate/recip… Multimodal PDF RAG: github.com/weaviate/recip…

English

4.3K

Prajjwal Yadav@PrajjwalYd·5 Mar

First I accidentally wrote a for loop by hand. Then I accidentally understood it. It's getting worse.

English

Prajjwal Yadav@PrajjwalYd·27 Şub

There is a version of me that clicked option 3. I don't think about him much. He's probably fine, and his database is definitely fine.

English

114

Prajjwal Yadav@PrajjwalYd·27 Şub

Talk is cheap. Show me your CLAUDE.md

English

Prajjwal Yadav@PrajjwalYd·26 Şub

I keep re-introducing myself to my agents. Every new session, I’m back to explaining how I like my code structured, which libraries I’m avoiding, and how I prefer explanations before approving any action. It’s a weird, repetitive dance. At some point, the act of "reminding" the system becomes more work than the actual value it provides. We call this “The Limit in the Loop”. It’s the wall you hit when an interaction feels disposable (because without continuity, it is). And while it’s annoying for a human, it’s also not great for an agent. At machine speed, forgetfulness becomes churn: agents burning cycles re-deriving the same conclusions, stacking new outputs on top of stale facts, creating contradictions faster than anyone can catch. Memory isn't just storage, and if we embed everything without any curation and hope for the best, we're simply building a landfill. Because facts change, preferences evolve, and reality drifts. A useful memory system requires custodianship: - Write Control: Not every passing message deserves to be a permanent fact. - Deduplication: You don't need ten versions of a user’s favourite format, you need one stable preference. - Reconciliation: When new info contradicts the old, the system needs to resolve it. - Amendment: Correcting a wrong fact rather than just appending newer versions. - Purposeful Forgetting: Useful memory is as much about what you delete as what you keep. Without these mechanisms, memory becomes an ever-growing pile of notes. That's why it can't simply be a feature slapped on top of an app. It needs to be maintained to be useful and trustworthy at any scale. Or in other words, it has to become infrastructure - baked into the storage layer, inheriting the isolation, durability, and guarantees we expect from the database itself. Explore this topic further in our blog: weaviate.io/blog/limit-in-…

English

520

Prajjwal Yadav@PrajjwalYd·26 Şub

It's not vibe coding until you're scared of your own repo

English

Prajjwal Yadav@PrajjwalYd·26 Şub

The crab🦀 is for people who fear undefined behaviour. The lobster🦞 is for people who fear doing things themselves. And, this is the entire spectrum of tech Twitter.

English

Prajjwal Yadav รีทวีตแล้ว

Femke Plantinga@femke_plantinga·25 Şub

Your AI coding agent is giving you broken code. We’ve all been there: You’re "Vibe Coding" with Claude Code, Cursor, or Copilot. You describe a feature, and the agent blueprints the logic. It feels like magic... until it crashes. The problem? Most agents hallucinate legacy v3 Weaviate syntax or guess at hybrid search parameters. Speed meets a brick wall of "undefined" errors. That’s why we’re launching 𝗪𝗲𝗮𝘃𝗶𝗮𝘁𝗲 𝗔𝗴𝗲𝗻𝘁 𝗦𝗸𝗶𝗹𝗹𝘀 - a direct bridge between your favorite coding agents and @weaviate_io infrastructure. Now, instead of your agent guessing, it has a dedicated toolkit that includes: 1️⃣ Weaviate Skills:Scripts for schema inspection, data ingestion, and precision search. 2️⃣ Cookbooks: End-to-end blueprints for FastAPI + Next.js apps. What can your agent do with these skills? ✅ Automated schema & collection creation ✅ Streamlined CSV/JSON imports ✅ Advanced hybrid and semantic search ✅ Natural language querying Getting started is simple: npx skills add weaviate/agent-skills Or if you're using the Claude Code plugin manager: /plugin marketplace add weaviate/agent-skills /plugin install weaviate@weaviate-plugins Then set your environment variables and run /weaviate:quickstart for full setup instructions. Check out our step-by-step guide + GH repo to Weaviate Skills here: weaviate.io/blog/weaviate-…

English

2.5K

ค้นพบ

@kenton_parton @victorialslocum @weaviate_io @kamtybor @googleaidevs @elonmusk @BarackObama @taylorswift13