Global Regression KV Cache
🔗 Source: arXiv
GRKV: Global Regression for Training-Free KV Cache Compression in Long-Context LLMs
🚀 Technical Novelty
- Mechanism: Formulates KV-cache merging as a global ridge-regression optimization problem that directly minimizes the attention-output discrepancy between compressed and full caches, distributing recovered information across all retained tokens.
- Nuance: Unlike prior local or key-similarity heuristics that funnel merges onto a few span-boundary tokens (causing over-merging and semantic blurring), GRKV treats every retained token as an active carrier and applies ridge regularization to prevent over-smoothing.
💡 Yield
- Achieves the highest overall performance across 16 LongBench and 13 RULER tasks when paired with modern span-based eviction methods (SnapKV, CriticalKV), outperforming prior merging baselines that typically degrade retrieval and QA accuracy due to information loss.
⚠️ Limitations
- Optimization is sensitive to hyperparameters; increasing update steps or surrogate window sizes beyond defaults degrades performance by overfitting the local attention surrogate.
- Relies on a fixed fraction of high-attention tokens as anchors, which may limit adaptability in highly dynamic or non-stationary context distributions.