šŸ”— Source: arXiv

MEMORYCACHING: RNNS WITH GROWING MEMORY

šŸš€ Technical Novelty

  • Mechanism: Segments input sequences and caches compressed hidden/memory states at boundaries, enabling subsequent tokens to attend to a growing set of past checkpoints via controllable aggregation functions.
  • Nuance: Interpolates between fixed-memory RNNs (O(L)) and full-attention Transformers (O(L²)) by dynamically scaling memory capacity with sequence length, avoiding KV-caching bottlenecks while preserving recurrence efficiency.

šŸ’” Yield

  • Achieves competitive long-context QA and recall performance against Transformers on LongBench; delivers significant training throughput gains over attention-based models at scale across linear attention and deep memory architectures.

āš ļø Limitations

  • Simplified pooling/routing choices were used to isolate the MC effect, leaving room for more expressive aggregation mechanisms; complexity scales with cached segments (O(NL)) rather than strictly O(L).