Memory Caching for RNNs
š Source: arXiv
MEMORYCACHING: RNNS WITH GROWING MEMORY
š Technical Novelty
- Mechanism: Segments input sequences and caches compressed hidden/memory states at boundaries, enabling subsequent tokens to attend to a growing set of past checkpoints via controllable aggregation functions.
- Nuance: Interpolates between fixed-memory RNNs (O(L)) and full-attention Transformers (O(L²)) by dynamically scaling memory capacity with sequence length, avoiding KV-caching bottlenecks while preserving recurrence efficiency.
š” Yield
- Achieves competitive long-context QA and recall performance against Transformers on LongBench; delivers significant training throughput gains over attention-based models at scale across linear attention and deep memory architectures.
ā ļø Limitations
- Simplified pooling/routing choices were used to isolate the MC effect, leaving room for more expressive aggregation mechanisms; complexity scales with cached segments (O(NL)) rather than strictly O(L).