🔗 Source: arXiv

Adapting Language Models to Compress Contexts

Mechanism: Introduces “summary accumulation” paired with randomized segment training to recursively generate compact summary vectors that act as dynamic soft prompts for subsequent text segments.
Nuance: Unlike prior compression methods that only retain the most recent summary or require per-context optimization, AutoCompressors concatenate all historical summaries and learn a unified compression policy via unsupervised language modeling objectives, avoiding expensive distillation loops.

Extends effective context windows up to 30,720 tokens while preserving pre-trained capabilities on standard benchmarks.
Compressed in-context demonstrations outperform plain-text few-shot prompting on 8/11 classification tasks with significantly lower inference costs.
Fused summary vectors achieve superior perplexity gains and 1.7× throughput improvements over full-passage retrieval-augmented baselines like REPLUG.

Fine-tuning induces domain mismatch, causing zero-shot accuracy degradation on certain Llama-2 tasks compared to the base model.
Summary vector sequence length grows linearly with document length, potentially impacting memory overhead for extremely long inputs.
Still underperforms full-context retrieval methods in complex multi-passage reasoning scenarios where higher-fidelity compression is needed.