Kai Mei retweetledi

👀 Can multimodal agents truly remember what they saw? Do they just rely on caption-level shortcuts?
MemEye: a visual-centric benchmark for multimodal agent memory, can test whether agents can preserve visual evidence and track evolving visual states.
📄 arxiv.org/pdf/2605.15128

English















