Latent Thought Bootstrapping
đź”— Source: arXiv
Reasoning to Learn from Latent Thoughts
🚀 Technical Novelty
- Mechanism: Employs an Expectation-Maximization algorithm to iteratively synthesize and train on “latent thoughts” that decompress highly compressed web text, creating a self-improving pretraining loop.
- Nuance: Unlike reward-based RL or fixed teacher-student distillation, it frames reasoning as a latent variable optimization problem, enabling scalable, task-agnostic data augmentation without external supervision.
đź’ˇ Yield
- A 1B LM successfully bootstraps its performance across three+ iterations using only self-generated latents, achieving 25.4% on MATH versus 5.74% for raw-data baselines.
- Performance gains scale linearly with additional inference compute (Monte Carlo samples) during the E-step, proving inference-time scaling directly boosts pretraining data efficiency.
⚠️ Limitations
- Validated only on a 1B parameter model and reasoning-intensive math text due to strict compute budgets, leaving general-domain scaling unproven.
- Self-bootstrapping may amplify dataset biases or degrade unrelated capabilities (e.g., observed GSM8K few-shot CoT degradation).
- Current autoregressive latent structure is locally myopic and cannot capture hierarchical, long-form planning processes.