Efficient Long Context Compression
🔗 Source: arXiv
Adapting Language Models to Compress Contexts
🚀 Technical Novelty
- Mechanism: Introduces “summary accumulation” paired with randomized segment training to recursively generate compact summary vectors that act as dynamic soft prompts for subsequent text segments.
- Nuance: Unlike prior compression methods that only retain the most recent summary or require per-context optimization, AutoCompressors concatenate all historical summaries and learn a unified compression policy via unsupervised language modeling objectives, avoiding expensive distillation loops.
💡 Yield
- Extends effective context windows up to 30,720 tokens while preserving pre-trained capabilities on standard benchmarks.
- Compressed in-context demonstrations outperform plain-text few-shot prompting on 8/11 classification tasks with significantly lower inference costs.
- Fused summary vectors achieve superior perplexity gains and 1.7× throughput improvements over full-passage retrieval-augmented baselines like REPLUG.
⚠️ Limitations
- Fine-tuning induces domain mismatch, causing zero-shot accuracy degradation on certain Llama-2 tasks compared to the base model.
- Summary vector sequence length grows linearly with document length, potentially impacting memory overhead for extremely long inputs.
- Still underperforms full-context retrieval methods in complex multi-passage reasoning scenarios where higher-fidelity compression is needed.