LLM memory systems can store facts.
They can't reason about what changes when one of those facts updates.
We tested 6 systems across 3 paradigms. All collapse on dependency reasoning: Cascade 3%, Absence 1%.
📜 MEME: Multi-entity & Evolving Memory Evaluation 🧵 1/n
Only one helps: MD-flat + Claude Opus 4.7 (Cas 32%, Abs 59%), at ~70× cost. No practical-cost architecture handles this today. We need memory that propagates at maintenance, not at ingest via a frontier LLM.
The failure isn't storage. The rule and the change event are both written. Retrieval just doesn't surface them together, so the answering LLM reports the old value. 5 fixes tried (prompt opt, deeper retrieval, stronger answer LLM, less noise, stronger internal LLM).
Only one helps: MD-flat + Claude Opus 4.7 (Cas 32%, Abs 59%), at ~70× cost. No practical-cost architecture handles this today. We need memory that propagates at maintenance, not at ingest via a frontier LLM.
The failure isn't storage. The rule and the change event are both written. Retrieval just doesn't surface them together, so the answering LLM reports the old value. 5 fixes tried (prompt opt, deeper retrieval, stronger answer LLM, less noise, stronger internal LLM).
2/3