Explicitly separates stable LLM reasoning from a plastic, evolving memory, effectively addressing the stability-plasticity dilemma to allow for continuous runtime improvement and avoid catastrophic forgetting.
MEMRL: SELF-EVOLVING AGENTS VIA RUNTIME REINFORCEMENT LEARNING ON EPISODIC MEMORY
1. Introduces a two-phase retrieval system that first filters memory candidates by semantic relevance and then selects the final ones based on learned utility (Q-values), moving beyond purely semantic matching.
• Capability Saturation
Once a single agent achieves ~45% task performance baseline, additional agents often degrade results because coordination overhead dominates.
Towards a Science of Scaling Agent Systems-
This paper aims to move agentic AI systems (LLM-powered systems that reason, plan, and act) from heuristic practice toward a principled, quantitative science—especially for scaling those systems effectively
Authors introduce infinity-chat dataset with 26k real world open-ended questions. The also provide human annotations including absolute rating and pairwise preferences.
arxiv.org/abs/2510.22954