🔗 Source: arXiv

GRKV: Global Regression for Training-Free KV Cache Compression in Long-Context LLMs

Mechanism: Formulates KV-cache merging as a global ridge-regression optimization problem that directly minimizes the attention-output discrepancy between compressed and full caches, distributing recovered information across all retained tokens.
Nuance: Unlike prior local or key-similarity heuristics that funnel merges onto a few span-boundary tokens (causing over-merging and semantic blurring), GRKV treats every retained token as an active carrier and applies ridge regularization to prevent over-smoothing.

Achieves the highest overall performance across 16 LongBench and 13 RULER tasks when paired with modern span-based eviction methods (SnapKV, CriticalKV), outperforming prior merging baselines that typically degrade retrieval and QA accuracy due to information loss.

Optimization is sensitive to hyperparameters; increasing update steps or surrogate window sizes beyond defaults degrades performance by overfitting the local attention surrogate.
Relies on a fixed fraction of high-attention tokens as anchors, which may limit adaptability in highly dynamic or non-stationary context distributions.