Spiffy Accelerates Diffusion LLMs
🔗 Source: arXiv
Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding
🚀 Technical Novelty
- Mechanism: Auto-speculative drafting using the dLLM’s own distribution to generate candidate states, verified via a novel directed draft graph that respects bidirectional token dependencies.
- Nuance: Eliminates auxiliary draft model overhead and adapts speculative decoding to block-wise, bidirectional unmasking, enabling parallel verification with provable distribution preservation unlike AR-LLM tree-based methods.
💡 Yield
- Achieves 2.8–3.1× standalone speedup and up to 7.9× when combined with parallel decoding schemes while maintaining exact output quality across GSM8K, HumanEval, MBPP, and MATH; offline calibration converges with just 20–50 samples.
⚠️ Limitations
- Requires a fixed offline calibration phase and static graph structures during generation, limiting dynamic adaptation to highly variable prompt complexities; standalone speedups plateau without complementary optimizations like KV-caching.