Layer-Wise LoRA Expert Allocation

🔗 Source: arXiv

Higher Layers Need More LoRA Experts

🚀 Technical Novelty

Mechanism: Introduces MoLA, a PEFT framework that replaces uniform LoRA-MoE distributions with flexible, layer-wise expert allocation across Transformer blocks.
Nuance: Departs from prior SOTA by empirically proving lower layers suffer representational collapse/redundancy, enabling targeted expert specialization in higher layers rather than fixed per-layer counts.

💡 Yield

Achieves equal or superior performance on six NLP/commonsense QA benchmarks compared to all PEFT baselines using fewer total parameters.
Demonstrates that asymmetric allocation (e.g., 2-4-6-8 experts) significantly outperforms uniform configurations by mitigating lower-layer redundancy.
Exhibits strong continuous learning capabilities, minimizing domain knowledge forgetting during sequential fine-tuning across multiple subjects.

⚠️ Limitations

Relies on static, pre-defined layer-wise expert configurations rather than fully dynamic, input-aware allocation during training.
Computational overhead of maintaining multiple routers and expert pairs per layer is not explicitly optimized or compared against standard LoRA inference latency.
Validation is limited to decoder-only LLMs and instruction-tuning benchmarks, leaving encoder-decoder and multimodal extensions unexplored.