Pranjal Verma
346 posts


“design a RAG pipeline for 10M docs with zero hallucination”
apparently this was asked in a Google L5 interview round. came across it somewhere on the internet and honestly it’s a way more interesting system design problem than most classic distributed systems questions
1. ingest + normalize docs
- remove duplicates, standardize formats, extract metadata, maintain version history
2. hybrid retrieval (BM25 + embeddings)
- BM25 handles exact keyword matching while embeddings capture semantic meaning
- semantic search alone usually struggles with precision at massive scale
3. ANN retrieval + reranking
- ANN (Approximate nearest neighbor ) quickly pulls top candidate chunks from millions of docs
- then a reranker rescoring step improves relevance by deeply comparing query vs retrieved chunks
4. source confidence scoring
- every retrieved chunk gets scored based on freshness, trust level, overlap and retrieval consistency
- low-confidence context should never heavily influence generation
5. constrained generation
- the model is only allowed to answer using retrieved context (nothing new to be invented outside of the retrieved context)
6. citation-backed responses
- every major claim links back to exact chunks, documents or timestamps
7. hallucination fallback layer
- if retrieval confidence drops below a threshold: “insufficient evidence found”
8. continuous evals
- run adversarial queries, retrieval recall benchmarks and hallucination tests continuously
9. caching + memory layer
- cache high-frequency enterprise queries and retrieval paths (improves latency and output)
10. observability everywhere
- trace retrieval paths, chunk rankings, token attribution and failure points
Also at 10M docs, retrieval quality matters more than the frontier model itself.

English

Am i too late to say this??
Pranjal Verma@pranjjaall
Dhurandhar 2 mei jaise hi mudi ji dikhe , samajh gyi mai🙏
English
Pranjal Verma retweetledi








