
Hanqi Yan
99 posts

Hanqi Yan
@yan_hanqi
Lecturer (assistant professor) @kclinformatics Interpretable | Reliable Language models



Our work on agent memory, xMemory (Beyond RAG for Agent Memory: Retrieval by Decoupling and Aggregation), has been discussed by projects such as Claude Memory, PageIndex, and OpenClaw. Glad to see this direction resonating. @HZhanghao @yulanhe @LinGui_KCL @dair_ai @VentureBeat


Exciting PhD Internships in London: - Large-scale GPU cluster, high-quality data & research alongside world-experts in ML. - Focus on publications at top venues. - Train/Eval models at 100B+ scale. - Path to full-time roles + competitive comp. thomsonreuters.wd5.myworkdayjobs.com/External_Caree…

// Beyond RAG for Agent Memory // RAG wasn't designed for agent memory. And it shows. The default approach to agent memory today is still the standard RAG pipeline: embed stored memories, retrieve a fixed top-k by similarity, concatenate them into context, and generate an answer. Every major agent memory system follows this base pattern. But agent memory is fundamentally different from a document corpus. It's a bounded, coherent dialogue stream where candidate spans are highly correlated and often near duplicates. Fixed top-k similarity retrieval collapses into a single dense region, returning redundant evidence. And post-hoc pruning breaks temporally linked evidence chains rather than removing redundancy. This new research introduces xMemory, a hierarchical retrieval framework that replaces similarity matching with structured component-level selection. Agent memory needs redundancy control without fragmenting evidence chains. Structured retrieval over semantic components achieves both, consistently outperforming standard RAG and pruning approaches across multiple LLM backbones. The key idea: It decouples memories into semantic components, organize them into a four-level hierarchy (original messages, episodes, semantics, themes), and uses this structure to drive retrieval top-down. A sparsity-semantics objective guides split and merge operations to keep the high-level organization both searchable and semantically faithful. At retrieval time, xMemory selects a compact, diverse set of relevant themes and semantics first, then expands to episodes and raw messages only when doing so measurably reduces the reader's uncertainty. On LoCoMo with Qwen3-8B, xMemory achieves 34.48 BLEU and 43.98 F1 while using only 4,711 tokens per query, compared to the next best baseline Nemori at 28.51 BLEU and 40.45 F1 with 7,755 tokens. With GPT-5 nano, it reaches 38.71 BLEU and 50.00 F1, improving over Nemori while cutting token usage from 9,155 to 6,581. xMemory retrieves contexts that cover all answer tokens in 5.66 blocks and 975 tokens, versus 10.81 blocks and 1,979 tokens for naive RAG. Higher accuracy, half the tokens. Paper: arxiv.org/abs/2602.02007 Learn to build effective AI agents in our academy: academy.dair.ai

We are very happy to have Prof @yan_hanqi from @KingsCollegeLon to be our seminar speaker @UQSchoolEECS to talk about her work on Structured Representation Learning for Latent Thinking in LLMs (uq-ds-seminar.github.io/latentLLM-hanq…)!




⚡ Faster than Fast. Designed for Agentic AI. Introducing Xiaomi MiMo-V2-Flash — our new open-source MoE model: 309B total params, 15B active. Blazing speed meets frontier performance. 🔥 Highlights: 🏗️ Hybrid Attention: 5:1 interleaved 128-window SWA + Global | 256K context 📈 Performance: ⚔️ Matches DeepSeek-V3.2 on general benchmarks — at a fraction of the latency 🏆 SWE-Bench Verified: 73.4% | SWE-Bench Multilingual: 71.7% — new SOTA for open-source models 🚀 Speed: 150 output tokens/s with Day-0 support from @lmsysorg🤝 🤗 Model: hf.co/XiaomiMiMo/MiM… 📝 Blog Post: mimo.xiaomi.com/blog/mimo-v2-f… 📄 Technical Report: github.com/XiaomiMiMo/MiM… 🎨 AI Studio: aistudio.xiaomimimo.com







there’s only one right answer here, the @ylecun definition, and everyone should be able to recite it word for word

