Inference Layer Skipping
π Source: arXiv
Skip the Good Part: Representation Structure & Inference-Time Layer Skipping in Diffusion vs. Autoregressive LLMs
π Technical Novelty
- Mechanism: Static, task-agnostic inference-time layer skipping that bypasses high-similarity early layers in native diffusion LLMs without KV-cache sharing or architectural modifications.
- Nuance: Exploits objective-induced hierarchical abstraction and representational redundancy rather than cache-centric optimizations, while revealing persistent initialization bias when AR models are adapted to diffusion training.
π‘ Yield
- Native dLLMs achieve up to 18.75% FLOPs reduction while retaining >90% performance on reasoning and code benchmarks, whereas AR models degrade sharply under comparable skipping.
- First systematic layer- and token-wise analysis proving diffusion objectives induce global abstraction with minimal recency bias, contrasting with ARβs incremental, depth-dependent refinement.
β οΈ Limitations
- Autoregressive and AR-initialized models exhibit brittle performance drops when layers are skipped, limiting cross-architecture applicability.
- Aggressive consecutive layer skipping causes steeper accuracy degradation, and safety/out-of-distribution behaviors remain unverified beyond standard benchmarks.