TurboQuant Online Vector Quantization
🔗 Source: arXiv
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
🚀 Technical Novelty
- Mechanism: Randomly rotates high-dimensional input vectors to induce a concentrated Beta distribution on coordinates, applies optimal scalar quantizers per coordinate, and combines them with a 1-bit Quantized Johnson-Lindenstrauss (QJL) transform on the residual to produce an unbiased inner-product estimator.
- Nuance: Unlike product quantization or learned schemes that require heavy codebook storage or offline training, TurboQuant is data-oblivious, online-capable, and provably matches information-theoretic distortion bounds across all bit-widths without sacrificing accelerator vectorization compatibility.
💡 Yield
- Proves information-theoretic lower bounds on achievable distortion rate, demonstrating TurboQuant closely matches them within a ~2.7x constant factor.
- Empirically achieves absolute quality neutrality at 3.5 bits/channel for LLM KV cache compression and outperforms Product Quantization/RabitQ in nearest-neighbor recall while reducing indexing time to virtually zero.
⚠️ Limitations
- Optimized for worst-case/general data distributions rather than dataset-specific adaptations, potentially leaving gains on the table for highly structured inputs.
- Theoretical distortion guarantees differ by a small constant factor from absolute information-theoretic limits, and the two-stage residual pipeline introduces minor computational overhead compared to single-pass scalar quantization.