Context-to-Weight Equivalence
🔗 Source: arXiv
Equivalence of Context and Parameter Updates in Modern Transformer Blocks
🚀 Technical Novelty
- Mechanism: Derives exact analytical formulas for rank-1 weight patches and RMSNorm scale updates that mathematically absorb context into a Gemma-style transformer block, generalizing via input/output controllability properties.
- Nuance: Extends prior vanilla transformer proofs to modern architectures lacking biases by incorporating gating (SwiGLU), RMSNorm, and multi-layer induction, proving perfect equivalence rather than approximation.
💡 Yield
- Provides a constructive proof and algorithm for computing implicit weight patches across multi-layer modern LLMs; experimentally validates near-perfect logit matching and identical token generation on Gemma 3 1B without context vs. original with context.
⚠️ Limitations
- Updates are strictly token-dependent and must be recomputed at every inference step; framework is descriptive/theoretical rather than a prescriptive algorithm for efficient inference or global context absorption.