Alex Alexapolsky

551 posts

Alex Alexapolsky

@TheWake

Python developer/contractor

Ukraine Katılım Ocak 2009

636 Takip Edilen119 Takipçiler

Sabitlenmiş Tweet

Alex Alexapolsky@TheWake·3 Şub

Built an open-source tool for debugging LLM agents: - Record runs with full traces - Replay w/o API calls - Diff to see what changed "Agent worked yesterday, broke today" - now you can see exactly why. Works with PydanticAI, LangGraph, CrewAI, etc. github.com/metawake/work-…

English

Alex Alexapolsky@TheWake·9h

Built a chunker that knows where documents actually break: articles, sections, tables stay intact instead of getting sliced at character 600. Now I can stuff good ingredients into LangChain! pip install chunkweaver github.com/metawake/chunk…

English

Alex Alexapolsky@TheWake·1d

pypi.org/project/ragpro… github.com/metawake/ragpr…

ZXX

Alex Alexapolsky@TheWake·1d

Built ragprobe — pre-deployment domain difficulty diagnostic for RAG. One specificity score tells you whether your benchmark will transfer to your actual domain. No embeddings. No API keys. Just: is this domain easy or hard to retrieve? pip install ragprobe

English

Alex Alexapolsky@TheWake·10 Mar

@svpino Skill issue" is unfalsifiable without tooling. When a web server breaks you get a stack trace. When an agent breaks you get vibes. Record the run, replay it, diff it against the one that worked. That's what work-ledger does. github.com/metawake/work-…

English

Santiago@svpino·9 Mar

People are lying to you. These agents don't work as they promised.

English

624

605

5.9K

849.7K

Alex Alexapolsky@TheWake·4 Mar

@ysu_ChatData @ihtesham2005 Needle coverage: annotate queries with exact answer spans, check if they appear in top-k. Recall@K can be 1.0 while needle coverage is ~30%. Measure retrieval first—more steps on weak retrieval mostly add cost. github.com/metawake/ragtu…

English

Yongrui Su@ysu_ChatData·3 Mar

I buy the retrieval as a decision framing. In practice, what signal do you use to decide when to retrieve again without spiraling cost. Also are you evaluating with answer attribution and citation faithfulness, not just accuracy. Would love to see a simple baseline recipe that is robust under noisy corpora.

English

Ihtesham Ali@ihtesham2005·3 Mar

RAG is dead. I just tested Modular RAG and it’s making AI systems 30-40% more accurate on real production tasks. The accuracy gains made me question everything I thought I knew about retrieval. And the core insight destroyed my mental model in the best way possible. Naive RAG forces a fixed pipeline. Retrieve → Stuff → Generate. Every time. But that’s not how expert researchers actually find answers. Analysts don’t retrieve everything upfront. They decide what’s worth pulling, when to pull more, and when they already have enough. Modular RAG finally matches that. Instead of a pipeline, you build decisions. The system asks whether to retrieve at all. How many times. From where. In what format. Self-RAG lets models critique their own outputs and pull more context when confidence drops. One bad retrieval doesn’t collapse the entire answer. The numbers from the paper broke me: 30% accuracy boost from adding controlled noise that teaches models to filter signal. Modular systems beating Advanced RAG on complex multi-hop questions. Performance gaps widening on tasks requiring synthesis across sources. The prompt shift is embarrassingly simple: Stop treating retrieval as a step. Start treating it as a decision the model makes dynamically. That’s it. That’s the whole unlock. I’ve been applying this to production pipelines for 2 weeks. The output quality difference is not subtle. Naive RAG made AI retrieve like a search engine. Modular RAG makes it retrieve like a researcher.

English

570

57.1K

Alex Alexapolsky@TheWake·4 Mar

"RAG is dead"? I tested PageIndex on GDPR. 44%. Same as vector RAG. Maybe RAG is undead, like Raggie. See why: @TheWake/three-rag-architectures-one-legal-document-25-needles-none-found-more-than-half-cebdc7ab3a90" target="_blank" rel="nofollow noopener">medium.com/@TheWake/three…

English

Alex Alexapolsky@TheWake·25 Şub

@idzikbartosz or if too low (out-of-domain query). Basically tells you whether it's a chunking problem, an embedding problem, or a prompt problem.

English

Bartosz Idzik@idzikbartosz·19 Şub

@TheWake This is such a needed tool. Been stuck wondering if my chunks are even getting retrieved or if it's a prompt issue. Does it surface the similarity scores alongside the actual chunks being pulled?

English

Alex Alexapolsky@TheWake·9 Oca

I built "EXPLAIN ANALYZE" for RAG retrieval. After debugging LLM pipelines blind for months, I made a CLI to see what's actually happening. Works with pgvector, Qdrant, Weaviate, Pinecone — real infrastructure, not toy demos. github.com/metawake/ragtu…

GIF

English

Alex Alexapolsky@TheWake·25 Şub

@idzikbartosz Yes -- that's the core of "ragtune explain". It shows each chunk with its similarity score, source doc, and text. Plus it runs score diagnostics automatically: distribution shape, spread, top-gap between #1 and #2, and warns you if scores are too tight (chunks indistinguishable)

English

Alex Alexapolsky@TheWake·23 Şub

@akshay_pachaar It's just not correct. PageIndex is good thing, but not a full replacement. Let's be thoughtful and correct, like old time engineers. We still need to employ our brain and not jump to the conclusions.

English

Akshay 🚀@akshay_pachaar·22 Şub

Researchers built a new RAG approach that: - does not need a vector DB. - does not embed data. - involves no chunking. - performs no similarity search. And it hit 98.7% accuracy on a financial benchmark (SOTA). Here's the core problem with RAG that this new approach solves: Traditional RAG chunks documents, embeds them into vectors, and retrieves based on semantic similarity. But similarity ≠ relevance. When you ask "What were the debt trends in 2023?", a vector search returns chunks that look similar. But the actual answer might be buried in some Appendix, referenced on some page, in a section that shares zero semantic overlap with your query. Traditional RAG would likely never find it. PageIndex (open-source) solves this. Instead of chunking and embedding, PageIndex builds a hierarchical tree structure from your documents, like an intelligent table of contents. Then it uses reasoning to traverse that tree. For instance, the model doesn't ask: "What text looks similar to this query?" Instead, it asks: "Based on this document's structure, where would a human expert look for this answer?" That's a fundamentally different approach with: - No arbitrary chunking that breaks context. - No vector DB infrastructure to maintain. - Traceable retrieval to see exactly why it chose a specific section. - The ability to see in-document references ("see Table 5.3") the way a human would. But here's the deeper issue that it solves. Vector search treats every query as independent. But documents have structure and logic, like sections that reference other sections and context that builds across pages. PageIndex respects that structure instead of flattening it into embeddings. Do note that this approach may not make sense in every use case since traditional vector search is still fast, simple, and works well for many applications. But for professional documents that require domain expertise and multi-step reasoning, this tree-based, reasoning-first approach shines. For instance, PageIndex achieved 98.7% accuracy on FinanceBench, significantly outperforming traditional vector-based RAG systems on complex financial document analysis. Everything is fully open-source, so you can see the full implementation in GitHub and try it yourself. I have shared the GitHub repo in the replies!

English

326

2.7K

233.4K

Alex Alexapolsky@TheWake·12 Şub

Benchmarked structural vs naive chunking for RAG on GDPR + RFCs. Both find the right doc. Only one gives the LLM enough context to cite the right section. Recall@K is incomplete — we need context-quality metrics. Full experiment: [linkedin.com/posts/alexey-a…]

English

Alex Alexapolsky@TheWake·7 Şub

@goyalshaliniuk Great overview. One thing that cuts across all these RAG types is evaluation. As systems become agentic, reproducibility and retrieval matter more than new architectures. That’s exactly the gap we’re exploring with github.com/metawake/ragtu… — query-level, LLM-free RAG benchmarking.

English

108

Shalini Goyal@goyalshaliniuk·6 Şub

RAG is not just one technique, it is an entire ecosystem of intelligence. From context-aware assistants to domain-specific systems, here are 16 types of RAG models shaping the next wave of AI innovation - 1. Standard RAG The foundation of all RAG systems - combines retrieval and generation for question answering and knowledge synthesis. 2. Agentic RAG Empowers AI agents to retrieve and act autonomously, perfect for assistants that need dynamic, tool-based reasoning. 3. Graph RAG Uses knowledge graphs for relational reasoning - ideal for expert systems in law, medicine, and semantic search. 4. Modular RAG Breaks retrieval, reasoning, and generation into independent components - enabling collaborative, scalable AI workflows. 5. Memory-Augmented RAG Adds persistent external memory for context retention, powering long-term chatbots and personalized experiences. 6. Multi-Modal RAG Processes text, images, and audio together - perfect for video summarization, captioning, and multi-modal AI tools. 7. Federated RAG Enables privacy-preserving retrieval from decentralized sources, used in healthcare and secure enterprise systems. 8. Streaming RAG Performs real-time retrieval and generation, ideal for financial dashboards, live feeds, and social media monitoring. 9. ODQA RAG (Open-Domain QA) Handles large, diverse datasets - ideal for search engines and intelligent virtual assistants. 10. Contextual Retrieval RAG Maintains session-level awareness, great for conversational AI and customer support chatbots. 11. Knowledge-Enhanced RAG Integrates structured domain data, useful for legal, educational, and professional knowledge applications. 12. Domain-Specific RAG Custom-tailored for specific industries - like finance, healthcare, or legal analytics. 13. Hybrid RAG Combines multiple retrieval approaches, bridging structured and unstructured data for high precision. 14. Self-RAG Introduces self-reflection to refine its own answers, enabling AI models to fact-check and improve reasoning autonomously. 15. HyDE RAG (Hypothetical Document Embeddings) Generates hypothetical documents to guide retrieval, excellent for complex or niche query contexts. 16. Recursive / Multi-Step RAG Performs multiple retrieval-generation loops, enabling advanced problem-solving and reasoning chains. From simple retrievals to self-improving AI reasoning loops, RAG is evolving fast. Which type do you think will dominate enterprise AI systems in 2026?

GIF

English

145

661

33.3K

Alex Alexapolsky@TheWake·5 Şub

@martinfowler "There are no unit tests for context engineering" - love this framing. But there could be: record a run, replay deterministically, diff against the next run. That's how you test probabilistic systems.

English

348

Martin Fowler@martinfowler·5 Şub

NEW POST Powerful context engineering is becoming a huge part of the developer experience of modern LLM tools. Birgitta Böckeler explains the current state of context configuration features, using Claude Code as an example. martinfowler.com/articles/explo…

English

626

60.5K

Alex Alexapolsky@TheWake·5 Şub

An LLM call is superposition - infinite possible outputs exist simultaneously. Recording the call and it's result - is the observation that collapses it to one.

English

Alex Alexapolsky@TheWake·4 Şub

@agrover112 Exactly. Retrieval is fundamental - it's just specializing by domain. Code - agentic search. Structured - tree based. Semantic-embeddings. "RAG is dead" really means "naive RAG is dead." The retrieval problem didn't go away.

English

agrover112@agrover112·5 Oca

I remember a while back someone said RAG is dead. Honestly, biggest L take ever. For any problem unless if a model has infinite computation power, and infinite reasoning power retrieval will always help.

English

Alex Alexapolsky@TheWake·4 Şub

@dr_cintas "Vector DBs disrupted" is a fun headline, but facts are: PageIndex uses document trees, great for structured docs (finance, legal). But embeddings still dominate for: Unstructured text, Semantic similarity, Cross-modal retrieval. The right answer is: benchmark both on YOUR data.

English

276

Alvaro Cintas@dr_cintas·4 Şub

Vector databases just got disrupted 🤯 You can now build RAG without Vector DBs. PageIndex is a new open-source library that uses document trees instead of embeddings. It achieves 98.7% on FinanceBench by letting LLMs reason over structure rather than matching keywords. → No Embeddings → No Chunking 100% Open Source.

English

133

1.1K

81.2K

Alex Alexapolsky@TheWake·4 Şub

@rajshetgar @techwith_ram @iPullRank Totally agree: low temperature + highly relevant retrieval is the combo that actually works in production. Curious, what re-ranker do you use?

English

Raj S 🇦🇺@rajshetgar·4 Şub

@TheWake @techwith_ram @iPullRank Retrieval alone does not work well as giving more options to LLM makes things worse, best is to retrieve top-3 & re-rank to extract top-1 only, LLM works best with less options. Setting Temperature to slightly lowest level helps (never the lowest).

English

𝗿𝗮𝗺𝗮𝗸𝗿𝘂𝘀𝗵𝗻𝗮— 𝗲/𝗮𝗰𝗰@techwith_ram·3 Şub

Every second RAG tutorial is either a toy or a research paper pretending to be a product. This is neither. Agentic RAG, built properly: github.com/GiovanniPasq/a… → hierarchical retrieval (child first, parent on-demand) → conversation memory → query clarification → parallel agents

English

351

55.1K

Alex Alexapolsky@TheWake·4 Şub

@rajshetgar @techwith_ram @iPullRank This is the underrated problem. You can't fix what you can't measure. We built github.com/metawake/ragtu… to debug exactly this - run your queries, see which ones retrieve confidently, which ones are borderline, and why. The "in-between" come from chunking, embeddings etc.

English

Raj S 🇦🇺@rajshetgar·4 Şub

@techwith_ram @iPullRank Real time applications with RAG are harder especially for Q&A voice agents. High confidence retrieval are fine low confidence retrieval are fine too but all others in-between are hard for LLMs to deal with.

English

206

Alex Alexapolsky@TheWake·3 Şub

@adityaraja0 @vllm_project Nice. We're doing something similar for agent runs - diff two executions to see which step diverged (outputs, tool calls, token usage). Different domain but same insight: traces are only useful if you can compare them. github.com/metawake/work-… (for LLM agents)

English

Aditya Rajagopal@adityaraja0·29 Oca

we're introducing the concept of diffs to trace data - pick two traces, run the diff and get information on changes in durations, launch args, renames, additions and deletions of events in the trace. check out how you can view diffs between @vllm_project 0.12.0 and 0.13.0 on the same workload - docs.ncompass.tech/tracediff

English

100

Alex Alexapolsky@TheWake·3 Şub

@hasantoxr The interesting question isn't auto-fixing - it's figuring out *what* to fix. Most agent failures are silent. Output looks wrong but you don't know which step broke. Without good failure forensics, the RL signal is noisy. Anyone combining this with structured trace diffing?

English

119

Hasan Toor@hasantoxr·2 Şub

🚨BREAKING: Microsoft just solved the "Agent Loop" problem. Agent Lightning is an open-source framework that lets agents learn from their own mistakes using Reinforcement Learning. Your agent fails a task → Agent Lightning analyzes why → Updates the prompt automatically → Next run succeeds. 100% Opensource.

English

108

304

2.2K

185.4K

Alex Alexapolsky@TheWake·3 Şub

@evilmartians Nice! Visualization is huge for debugging. We've been focused on the replay/diff side - record a working trace, then diff against a broken one to pinpoint what changed. Curious if AgentPrism could render diffs between two traces?

English

Evil Martians@evilmartians·15 Eki

Agentic traces contain perfect information about an agent’s behavior with every plan, action, and retry. But that information gets lost in a sea of JSON. So we built AgentPrism: open source React components that turn traces into visual diagrams for debugging AI agents. You can plug in your OpenTelemetry data and see your agent’s process unfold: messages, tool calls, retries. @QuotientAI automatically monitors, analyzes, and improves AI agents. Being able to review traces quickly is paramount for their research, and for their customers. “Dealing with agent traces was one of the biggest frustrations for our researchers and a huge time sink. All of that has gone away since adding AgentPrism, and we’re excited to bring that functionality to our users.” — @julianeagu

English

815

97.1K

Keşfet

@svpino @ysu_ChatData @ihtesham2005 @idzikbartosz @akshay_pachaar @goyalshaliniuk @martinfowler @elonmusk