🔗 Source: arXiv

GENERATIVE ADAPTER: CONTEXTUALIZING LANGUAGE MODELS IN PARAMETERS WITH A SINGLE FORWARD PASS

Mechanism: A lightweight adapter generator network encodes streaming context hidden states from a frozen base LM and produces layer-wise additive delta weights in a single forward pass, which are directly added to the base model for inference.
Nuance: Unlike prompting (which suffers from quadratic attention overhead with long contexts) or fine-tuning/continual pretraining (which requires expensive gradient-based updates), this method achieves on-the-fly parameter adaptation using only forward passes, decoupling context length from inference cost.

Achieves a 63.5% F1 score improvement over supervised fine-tuning on StreamingQA for 32K-token contexts.
Outperforms the base model’s in-context learning capability on MetaICL (44.9% average accuracy across 26 tasks).
Reduces computation and memory costs by 4x compared to full conversation history prompting in user personalization scenarios.

Evaluated primarily on 7B-scale models (Mistral-7B-Instruct, Llama2-7B-Chat), leaving scalability to larger architectures unverified.
Adapter generation is restricted to linear projection layers (attention and FFN down/up projections) rather than full network parameter updates.
Performance heavily depends on the self-supervised pretraining data quality of the adapter generator, potentially limiting zero-shot generalization to highly specialized domains without instruction tuning.