Richie Caputo
736 posts









🚨 This Python tool just made vector databases optional for RAG. It's called PageIndex. It reads documents the way you do. No embeddings. No chunking. No vector database needed. Here's the problem with normal RAG: It takes your document, cuts it into tiny pieces, turns those pieces into numbers, and searches for the closest match. But closest match doesn't mean best answer. PageIndex works completely different. → It reads your full document → Builds a tree structure like a table of contents → When you ask a question, the AI walks through that tree → It thinks step by step until it finds the exact right section Same way you'd find an answer in a textbook. You don't read every page. You check the chapters, pick the right one, and go straight to the answer. That's exactly what PageIndex teaches AI to do. Here's the wildest part: It scored 98.7% accuracy on FinanceBench. That's a test where AI answers real questions from SEC filings and earnings reports. Most traditional RAG systems can't touch that number. Works with PDFs, markdown, and even raw page images without OCR. 100% Open Source. MIT License.









🎶 I've been using voice mode to write much of my CLI code this last week Can't wait to hear what you think.



We've raised $6.5M to kill vector databases. Every system today retrieves context the same way: vector search that stores everything as flat embeddings and returns whatever "feels" closest. Similar, sure. Relevant? Almost never. Embeddings can’t tell a Q3 renewal clause from a Q1 termination notice if the language is close enough. A friend of mine asked his AI about a contract last week, and it returned a detailed, perfectly crafted answer pulled from a completely different client’s file. Once you’re dealing with 10M+ documents, these mix-ups happen all the time. VectorDB accuracy goes to shit. We built @hydra_db for exactly this. HydraDB builds an ontology-first context graph over your data, maps relationships between entities, understands the 'why' behind documents, and tracks how information evolves over time. So when you ask about 'Apple,' it knows you mean the company you're serving as a customer. Not the fruit. Even when a vector DB's similarity score says 0.94. More below ⬇️









