Alex Alexapolsky

551 posts

Alex Alexapolsky

Alex Alexapolsky

@TheWake

Python developer/contractor

Ukraine Katılım Ocak 2009
636 Takip Edilen119 Takipçiler
Sabitlenmiş Tweet
Alex Alexapolsky
Alex Alexapolsky@TheWake·
Built an open-source tool for debugging LLM agents: - Record runs with full traces - Replay w/o API calls - Diff to see what changed "Agent worked yesterday, broke today" - now you can see exactly why. Works with PydanticAI, LangGraph, CrewAI, etc. github.com/metawake/work-…
English
0
0
0
43
Alex Alexapolsky
Alex Alexapolsky@TheWake·
Built a chunker that knows where documents actually break: articles, sections, tables stay intact instead of getting sliced at character 600. Now I can stuff good ingredients into LangChain! pip install chunkweaver github.com/metawake/chunk…
English
0
0
0
4
Alex Alexapolsky
Alex Alexapolsky@TheWake·
Built ragprobe — pre-deployment domain difficulty diagnostic for RAG. One specificity score tells you whether your benchmark will transfer to your actual domain. No embeddings. No API keys. Just: is this domain easy or hard to retrieve? pip install ragprobe
English
1
0
0
4
Alex Alexapolsky
Alex Alexapolsky@TheWake·
@svpino Skill issue" is unfalsifiable without tooling. When a web server breaks you get a stack trace. When an agent breaks you get vibes. Record the run, replay it, diff it against the one that worked. That's what work-ledger does. github.com/metawake/work-…
English
0
0
0
19
Santiago
Santiago@svpino·
People are lying to you. These agents don't work as they promised.
English
624
605
5.9K
849.7K
Yongrui Su
Yongrui Su@ysu_ChatData·
I buy the retrieval as a decision framing. In practice, what signal do you use to decide when to retrieve again without spiraling cost. Also are you evaluating with answer attribution and citation faithfulness, not just accuracy. Would love to see a simple baseline recipe that is robust under noisy corpora.
English
2
0
0
2K
Ihtesham Ali
Ihtesham Ali@ihtesham2005·
RAG is dead. I just tested Modular RAG and it’s making AI systems 30-40% more accurate on real production tasks. The accuracy gains made me question everything I thought I knew about retrieval. And the core insight destroyed my mental model in the best way possible. Naive RAG forces a fixed pipeline. Retrieve → Stuff → Generate. Every time. But that’s not how expert researchers actually find answers. Analysts don’t retrieve everything upfront. They decide what’s worth pulling, when to pull more, and when they already have enough. Modular RAG finally matches that. Instead of a pipeline, you build decisions. The system asks whether to retrieve at all. How many times. From where. In what format. Self-RAG lets models critique their own outputs and pull more context when confidence drops. One bad retrieval doesn’t collapse the entire answer. The numbers from the paper broke me: 30% accuracy boost from adding controlled noise that teaches models to filter signal. Modular systems beating Advanced RAG on complex multi-hop questions. Performance gaps widening on tasks requiring synthesis across sources. The prompt shift is embarrassingly simple: Stop treating retrieval as a step. Start treating it as a decision the model makes dynamically. That’s it. That’s the whole unlock. I’ve been applying this to production pipelines for 2 weeks. The output quality difference is not subtle. Naive RAG made AI retrieve like a search engine. Modular RAG makes it retrieve like a researcher.
Ihtesham Ali tweet media
English
29
95
570
57.1K
Alex Alexapolsky
Alex Alexapolsky@TheWake·
"RAG is dead"? I tested PageIndex on GDPR. 44%. Same as vector RAG. Maybe RAG is undead, like Raggie. See why: @TheWake/three-rag-architectures-one-legal-document-25-needles-none-found-more-than-half-cebdc7ab3a90" target="_blank" rel="nofollow noopener">medium.com/@TheWake/three…
English
0
0
0
15
Alex Alexapolsky
Alex Alexapolsky@TheWake·
@idzikbartosz or if too low (out-of-domain query). Basically tells you whether it's a chunking problem, an embedding problem, or a prompt problem.
English
0
0
0
5
Bartosz Idzik
Bartosz Idzik@idzikbartosz·
@TheWake This is such a needed tool. Been stuck wondering if my chunks are even getting retrieved or if it's a prompt issue. Does it surface the similarity scores alongside the actual chunks being pulled?
English
2
0
0
9
Alex Alexapolsky
Alex Alexapolsky@TheWake·
I built "EXPLAIN ANALYZE" for RAG retrieval. After debugging LLM pipelines blind for months, I made a CLI to see what's actually happening. Works with pgvector, Qdrant, Weaviate, Pinecone — real infrastructure, not toy demos. github.com/metawake/ragtu…
GIF
English
1
0
1
89
Alex Alexapolsky
Alex Alexapolsky@TheWake·
@idzikbartosz Yes -- that's the core of "ragtune explain". It shows each chunk with its similarity score, source doc, and text. Plus it runs score diagnostics automatically: distribution shape, spread, top-gap between #1 and #2, and warns you if scores are too tight (chunks indistinguishable)
English
0
0
0
8
Alex Alexapolsky
Alex Alexapolsky@TheWake·
@akshay_pachaar It's just not correct. PageIndex is good thing, but not a full replacement. Let's be thoughtful and correct, like old time engineers. We still need to employ our brain and not jump to the conclusions.
English
0
0
0
32
Akshay 🚀
Akshay 🚀@akshay_pachaar·
Researchers built a new RAG approach that: - does not need a vector DB. - does not embed data. - involves no chunking. - performs no similarity search. And it hit 98.7% accuracy on a financial benchmark (SOTA). Here's the core problem with RAG that this new approach solves: Traditional RAG chunks documents, embeds them into vectors, and retrieves based on semantic similarity. But similarity ≠ relevance. When you ask "What were the debt trends in 2023?", a vector search returns chunks that look similar. But the actual answer might be buried in some Appendix, referenced on some page, in a section that shares zero semantic overlap with your query. Traditional RAG would likely never find it. PageIndex (open-source) solves this. Instead of chunking and embedding, PageIndex builds a hierarchical tree structure from your documents, like an intelligent table of contents. Then it uses reasoning to traverse that tree. For instance, the model doesn't ask: "What text looks similar to this query?" Instead, it asks: "Based on this document's structure, where would a human expert look for this answer?" That's a fundamentally different approach with: - No arbitrary chunking that breaks context. - No vector DB infrastructure to maintain. - Traceable retrieval to see exactly why it chose a specific section. - The ability to see in-document references ("see Table 5.3") the way a human would. But here's the deeper issue that it solves. Vector search treats every query as independent. But documents have structure and logic, like sections that reference other sections and context that builds across pages. PageIndex respects that structure instead of flattening it into embeddings. Do note that this approach may not make sense in every use case since traditional vector search is still fast, simple, and works well for many applications. But for professional documents that require domain expertise and multi-step reasoning, this tree-based, reasoning-first approach shines. For instance, PageIndex achieved 98.7% accuracy on FinanceBench, significantly outperforming traditional vector-based RAG systems on complex financial document analysis. Everything is fully open-source, so you can see the full implementation in GitHub and try it yourself. I have shared the GitHub repo in the replies!
English
94
326
2.7K
233.4K
Alex Alexapolsky
Alex Alexapolsky@TheWake·
Benchmarked structural vs naive chunking for RAG on GDPR + RFCs. Both find the right doc. Only one gives the LLM enough context to cite the right section. Recall@K is incomplete — we need context-quality metrics. Full experiment: [linkedin.com/posts/alexey-a…]
English
0
0
0
14
Alex Alexapolsky
Alex Alexapolsky@TheWake·
@goyalshaliniuk Great overview. One thing that cuts across all these RAG types is evaluation. As systems become agentic, reproducibility and retrieval matter more than new architectures. That’s exactly the gap we’re exploring with github.com/metawake/ragtu… — query-level, LLM-free RAG benchmarking.
English
0
0
0
108
Shalini Goyal
Shalini Goyal@goyalshaliniuk·
RAG is not just one technique, it is an entire ecosystem of intelligence. From context-aware assistants to domain-specific systems, here are 16 types of RAG models shaping the next wave of AI innovation - 1. Standard RAG The foundation of all RAG systems - combines retrieval and generation for question answering and knowledge synthesis. 2. Agentic RAG Empowers AI agents to retrieve and act autonomously, perfect for assistants that need dynamic, tool-based reasoning. 3. Graph RAG Uses knowledge graphs for relational reasoning - ideal for expert systems in law, medicine, and semantic search. 4. Modular RAG Breaks retrieval, reasoning, and generation into independent components - enabling collaborative, scalable AI workflows. 5. Memory-Augmented RAG Adds persistent external memory for context retention, powering long-term chatbots and personalized experiences. 6. Multi-Modal RAG Processes text, images, and audio together - perfect for video summarization, captioning, and multi-modal AI tools. 7. Federated RAG Enables privacy-preserving retrieval from decentralized sources, used in healthcare and secure enterprise systems. 8. Streaming RAG Performs real-time retrieval and generation, ideal for financial dashboards, live feeds, and social media monitoring. 9. ODQA RAG (Open-Domain QA) Handles large, diverse datasets - ideal for search engines and intelligent virtual assistants. 10. Contextual Retrieval RAG Maintains session-level awareness, great for conversational AI and customer support chatbots. 11. Knowledge-Enhanced RAG Integrates structured domain data, useful for legal, educational, and professional knowledge applications. 12. Domain-Specific RAG Custom-tailored for specific industries - like finance, healthcare, or legal analytics. 13. Hybrid RAG Combines multiple retrieval approaches, bridging structured and unstructured data for high precision. 14. Self-RAG Introduces self-reflection to refine its own answers, enabling AI models to fact-check and improve reasoning autonomously. 15. HyDE RAG (Hypothetical Document Embeddings) Generates hypothetical documents to guide retrieval, excellent for complex or niche query contexts. 16. Recursive / Multi-Step RAG Performs multiple retrieval-generation loops, enabling advanced problem-solving and reasoning chains. From simple retrievals to self-improving AI reasoning loops, RAG is evolving fast. Which type do you think will dominate enterprise AI systems in 2026?
GIF
English
29
145
661
33.3K
Alex Alexapolsky
Alex Alexapolsky@TheWake·
@martinfowler "There are no unit tests for context engineering" - love this framing. But there could be: record a run, replay deterministically, diff against the next run. That's how you test probabilistic systems.
English
0
0
0
348
Martin Fowler
Martin Fowler@martinfowler·
NEW POST Powerful context engineering is becoming a huge part of the developer experience of modern LLM tools. Birgitta Böckeler explains the current state of context configuration features, using Claude Code as an example. martinfowler.com/articles/explo…
English
26
94
626
60.5K
Alex Alexapolsky
Alex Alexapolsky@TheWake·
An LLM call is superposition - infinite possible outputs exist simultaneously. Recording the call and it's result - is the observation that collapses it to one.
English
0
0
0
6
Alex Alexapolsky
Alex Alexapolsky@TheWake·
@agrover112 Exactly. Retrieval is fundamental - it's just specializing by domain. Code - agentic search. Structured - tree based. Semantic-embeddings. "RAG is dead" really means "naive RAG is dead." The retrieval problem didn't go away.
English
0
0
1
34
agrover112
agrover112@agrover112·
I remember a while back someone said RAG is dead. Honestly, biggest L take ever. For any problem unless if a model has infinite computation power, and infinite reasoning power retrieval will always help.
English
1
0
0
18
Alex Alexapolsky
Alex Alexapolsky@TheWake·
@dr_cintas "Vector DBs disrupted" is a fun headline, but facts are: PageIndex uses document trees, great for structured docs (finance, legal). But embeddings still dominate for: Unstructured text, Semantic similarity, Cross-modal retrieval. The right answer is: benchmark both on YOUR data.
English
0
0
2
276
Alvaro Cintas
Alvaro Cintas@dr_cintas·
Vector databases just got disrupted 🤯 You can now build RAG without Vector DBs. PageIndex is a new open-source library that uses document trees instead of embeddings. It achieves 98.7% on FinanceBench by letting LLMs reason over structure rather than matching keywords. → No Embeddings → No Chunking 100% Open Source.
Alvaro Cintas tweet media
English
57
133
1.1K
81.2K
Raj S 🇦🇺
Raj S 🇦🇺@rajshetgar·
@TheWake @techwith_ram @iPullRank Retrieval alone does not work well as giving more options to LLM makes things worse, best is to retrieve top-3 & re-rank to extract top-1 only, LLM works best with less options. Setting Temperature to slightly lowest level helps (never the lowest).
English
1
0
0
19
Raj S 🇦🇺
Raj S 🇦🇺@rajshetgar·
@techwith_ram @iPullRank Real time applications with RAG are harder especially for Q&A voice agents. High confidence retrieval are fine low confidence retrieval are fine too but all others in-between are hard for LLMs to deal with.
English
1
0
1
206
Aditya Rajagopal
Aditya Rajagopal@adityaraja0·
we're introducing the concept of diffs to trace data - pick two traces, run the diff and get information on changes in durations, launch args, renames, additions and deletions of events in the trace. check out how you can view diffs between @vllm_project 0.12.0 and 0.13.0 on the same workload - docs.ncompass.tech/tracediff
English
2
0
2
100
Alex Alexapolsky
Alex Alexapolsky@TheWake·
@hasantoxr The interesting question isn't auto-fixing - it's figuring out *what* to fix. Most agent failures are silent. Output looks wrong but you don't know which step broke. Without good failure forensics, the RL signal is noisy. Anyone combining this with structured trace diffing?
English
0
0
0
119
Hasan Toor
Hasan Toor@hasantoxr·
🚨BREAKING: Microsoft just solved the "Agent Loop" problem. Agent Lightning is an open-source framework that lets agents learn from their own mistakes using Reinforcement Learning. Your agent fails a task → Agent Lightning analyzes why → Updates the prompt automatically → Next run succeeds. 100% Opensource.
Hasan Toor tweet media
English
108
304
2.2K
185.4K
Alex Alexapolsky
Alex Alexapolsky@TheWake·
@evilmartians Nice! Visualization is huge for debugging. We've been focused on the replay/diff side - record a working trace, then diff against a broken one to pinpoint what changed. Curious if AgentPrism could render diffs between two traces?
English
1
0
1
33
Evil Martians
Evil Martians@evilmartians·
Agentic traces contain perfect information about an agent’s behavior with every plan, action, and retry. But that information gets lost in a sea of JSON. So we built AgentPrism: open source React components that turn traces into visual diagrams for debugging AI agents. You can plug in your OpenTelemetry data and see your agent’s process unfold: messages, tool calls, retries. @QuotientAI automatically monitors, analyzes, and improves AI agents. Being able to review traces quickly is paramount for their research, and for their customers. “Dealing with agent traces was one of the biggest frustrations for our researchers and a huge time sink. All of that has gone away since adding AgentPrism, and we’re excited to bring that functionality to our users.” — @julianeagu
English
17
79
815
97.1K