
Your LLM application is probably answering the same question dozens of times a day. It just doesn't realize it, because the words are different each time.
"What was Q4 revenue?" / "Show me last quarter's sales" / "Q4 2024 figures": three strings, three LLM calls, one answer.
🧠 pg_semantic_cache is a PostgreSQL extension from pgEdge (open source, PostgreSQL License) that fixes that. It caches LLM and query results keyed by vector embeddings, then serves cached answers when the embedding for a new query is within a similarity threshold (default 0.95) of one already stored.
⚙️ How it works in one line: store result + embedding, the next query computes its own embedding, pgvector cosine distance finds the closest match, return the cached result if similarity is above threshold or fall through to a miss.
📊 When to use it: LLM-backed chatbots, RAG pipelines, analytics assistants, anything where 40-70% of incoming queries are semantic duplicates. Production caches typically hit 60-80% rates against 15-25% for exact-match, and hits return in 2-3ms instead of the 500ms-2s a real LLM round trip costs.
🧰 What ships in the extension: tags for grouping, eviction policies (LRU, LFU, TTL, auto), cache_stats and cost_savings views, pg_cron-friendly maintenance functions, HNSW indexes for caches over 100K entries, and ACID semantics because the cache lives in your Postgres database.
⚠️ Worth knowing: not every query should be cached. Volatile queries ("what's the current stock price?") have stable embeddings but changing answers, so they need an application-layer classifier ahead of the cache lookup. Part 3 of the series below walks through a regex + LLM pattern for that.
⭐ Star or clone the repo on GitHub: hubs.la/Q04hCzS30
📚 The three-part blog series:
Hands-on intro: hubs.la/Q04hCKv20
Production ops (tags, eviction, monitoring, Python): hubs.la/Q04hCxqy0
Volatile queries done right: hubs.la/Q04hCLG_0
#postgres #postgresql #opensource #ai #llm #rag #vectorsearch #pgvector #caching #semanticcache #pgEdge

English



















