πŸ”— Source: arXiv

Skip the Good Part: Representation Structure & Inference-Time Layer Skipping in Diffusion vs. Autoregressive LLMs

πŸš€ Technical Novelty

  • Mechanism: Static, task-agnostic inference-time layer skipping that bypasses high-similarity early layers in native diffusion LLMs without KV-cache sharing or architectural modifications.
  • Nuance: Exploits objective-induced hierarchical abstraction and representational redundancy rather than cache-centric optimizations, while revealing persistent initialization bias when AR models are adapted to diffusion training.

πŸ’‘ Yield

  • Native dLLMs achieve up to 18.75% FLOPs reduction while retaining >90% performance on reasoning and code benchmarks, whereas AR models degrade sharply under comparable skipping.
  • First systematic layer- and token-wise analysis proving diffusion objectives induce global abstraction with minimal recency bias, contrasting with AR’s incremental, depth-dependent refinement.

⚠️ Limitations

  • Autoregressive and AR-initialized models exhibit brittle performance drops when layers are skipped, limiting cross-architecture applicability.
  • Aggressive consecutive layer skipping causes steeper accuracy degradation, and safety/out-of-distribution behaviors remain unverified beyond standard benchmarks.