🔗 Source: arXiv

MEMORYCACHING: RNNS WITH GROWING MEMORY

Mechanism: Segments input sequences and caches compressed hidden states at segment boundaries, enabling subsequent tokens to directly attend to historical memory checkpoints rather than relying solely on a fixed-size recurrent state.
Nuance: Interpolates between fixed O(L) RNN compression and growing O(L²) Transformer KV-caches via controllable aggregation strategies (e.g., gated residual connections, sparse selective routing), avoiding full quadratic attention costs while mitigating RNN memory overflow.

Achieves competitive accuracy on long-context understanding and in-context recall tasks, closely matching Transformers while outperforming state-of-the-art RNNs like Titans and Atlas.
Delivers significantly higher training throughput and lower memory overhead than Transformers as sequence length scales, with sparse variants adding minimal computational overhead to base models.

Design choices were intentionally simplified for demonstration; more expressive pooling or routing mechanisms could further boost performance in future iterations.
Effective complexity scales with the number of cached segments (O(NL)) rather than strictly O(L), requiring careful segment-size tuning to balance memory retention and computational speed.