GenerativeAdapter Test-Time Adaptation
🔗 Source: arXiv
GENERATIVE ADAPTER: CONTEXTUALIZING LANGUAGE MODELS IN PARAMETERS WITH A SINGLE FORWARD PASS
🚀 Technical Novelty
- Mechanism: A lightweight adapter generator network encodes streaming context hidden states from a frozen base LM and produces layer-wise additive delta weights in a single forward pass, which are directly added to the base model for inference.
- Nuance: Unlike prompting (which suffers from quadratic attention overhead with long contexts) or fine-tuning/continual pretraining (which requires expensive gradient-based updates), this method achieves on-the-fly parameter adaptation using only forward passes, decoupling context length from inference cost.
💡 Yield
- Achieves a 63.5% F1 score improvement over supervised fine-tuning on StreamingQA for 32K-token contexts.
- Outperforms the base model’s in-context learning capability on MetaICL (44.9% average accuracy across 26 tasks).
- Reduces computation and memory costs by 4x compared to full conversation history prompting in user personalization scenarios.
⚠️ Limitations
- Evaluated primarily on 7B-scale models (Mistral-7B-Instruct, Llama2-7B-Chat), leaving scalability to larger architectures unverified.
- Adapter generation is restricted to linear projection layers (attention and FFN down/up projections) rather than full network parameter updates.
- Performance heavily depends on the self-supervised pretraining data quality of the adapter generator, potentially limiting zero-shot generalization to highly specialized domains without instruction tuning.