🔗 Source: arXiv

Are We Ready For An Agent-Native Memory System?

Mechanism: Decomposes agent memory into four modular components (representation/storage, extraction, retrieval/routing, maintenance) and conducts fine-grained ablation studies across 12 representative systems and 11 datasets.
Nuance: Shifts evaluation from monolithic end-to-end task metrics to a granular data-management perspective, explicitly isolating operational costs, dynamic update robustness, and long-horizon stability rather than relying solely on final accuracy scores.

Composite hybrid systems dominate conversational QA, while graph-based architectures excel in single-hop factual recall but struggle with temporal reasoning.
Conservative memory consolidation significantly outperforms delayed flushing or aggressive summarization for preserving answer-relevant context over extended interactions.
Highly structured memory systems incur orders-of-magnitude higher index construction and query latency without delivering proportional accuracy gains, highlighting critical efficiency bottlenecks.

Evaluation primarily targets textual and structured memory paradigms, potentially overlooking parametric or multimodal memory extensions.
Benchmark workloads may not fully capture the complexity of real-world, open-ended agent interactions and dynamic environment shifts.
Findings are highly workload-dependent, making universal architectural recommendations difficult without context-specific tuning.