CALM: Next-Vector Language Models
🔗 Source: arXiv
CONTINUOUS AUTOREGRESSIVE LANGUAGE MODELS
🚀 Technical Novelty
- Mechanism: Compresses chunks of K discrete tokens into a single dense continuous vector via a lightweight autoencoder, enabling next-vector prediction instead of next-token prediction.
- Nuance: Differs from prior SOTA by abandoning softmax over massive vocabularies in favor of a likelihood-free framework using Energy Transformers and BrierLM evaluation, scaling semantic bandwidth per step rather than parameter count.
💡 Yield
- Achieves >99.9% token reconstruction accuracy with K=4; establishes BrierLM as a strictly proper, likelihood-free evaluation metric; demonstrates superior performance-compute trade-offs compared to discrete baselines at equivalent quality levels.
⚠️ Limitations
- Current autoencoder focuses heavily on reconstruction rather than semantic structure, leading to brittle latent spaces; performance gap remains between CALM (K=1) and standard Transformers; context-aware autoencoders and semantically grounded latent spaces are left for future work.