🔗 Source: arXiv

Learning, Fast and Slow: Towards LLMs That Adapt Continually

🚀 Technical Novelty

  • Mechanism: Treats optimized prompts/context as trainable “fast weights” that co-evolve in real-time with slow parametric updates via RL, distributing adaptation across both channels.
  • Nuance: Breaks the traditional sequential pipeline of fine-tuning followed by prompt tuning by jointly optimizing textual scaffolds and model parameters against verifiable rewards simultaneously, rather than treating them as disjoint or post-hoc steps.

💡 Yield

  • Achieves up to 3× sample efficiency gains over RL-only training on math, code, and reasoning tasks while reaching higher performance ceilings.
  • Reduces KL divergence from the base model by up to 70%, effectively mitigating catastrophic forgetting and preserving plasticity for downstream task shifts.
  • Demonstrates robust continual learning capabilities, successfully adapting to sequential domain changes where parameter-only RL stalls or collapses.

⚠️ Limitations

  • Computational overhead stems from maintaining a diverse population of candidate prompts and running interleaved optimization loops.
  • Fast-to-slow distillation alone cannot replicate joint reward optimization, confirming that both channels must be actively trained together for peak performance.
  • Relies on specific instantiations (GEPA for prompts, CISPO/RLVR for weights); broader optimizer ablations and efficiency improvements are deferred to future work.