Sparse High Rank Adapters
π Source: arXiv
Rapid Switching and Multi-Adapter Fusion via Sparse High Rank Adapters
π Technical Novelty
- Mechanism: Directly finetunes 1β2% of pretrained weights using extremely sparse masks (98β99% zeros) during backpropagation, storing adapters as sparse weight-index pairs instead of dense low-rank matrices.
- Nuance: Unlike LoRAβs dense matrix addition that overwrites all base weights upon fusion, SHiRA only overwrites a tiny fraction of indices at inference time, eliminating latency spikes and drastically reducing cross-concept interference during multi-adapter use.
π‘ Yield
- Achieves up to 2.7% higher average accuracy than LoRA on LLaMA-7B commonsense reasoning tasks while modifying only 1% of parameters.
- Enables up to 10Γ faster CPU weight loading for rapid adapter switching and significantly reduces concept loss/artifacts in multi-adapter image generation (Stable Diffusion).
- Consumes ~16.6% lower peak GPU memory during training than LoRA and remains fully orthogonal/composable with advanced adapters like DoRA.
β οΈ Limitations
- Structured masking variants (SHiRA-Struct) underperform on complex LLM tasks due to inherent rank limitations.
- Performance heavily depends on mask selection strategy (e.g., SNIP, Grad, or Weight Magnitude), requiring task-specific calibration or ablation.
- Does not address test-time distribution shifts; remains a parameter-efficient fine-tuning method rather than a dynamic adaptation framework.