🔗 Source: arXiv

Scaling Latent Reasoning via Looped Language Models

Mechanism: Introduces a weight-tied recurrent transformer stack (LoopLM) that iteratively updates latent states, coupled with an entropy-regularized gating mechanism that dynamically allocates computational depth via learned early-exit probabilities.
Nuance: Unlike standard transformers that scale via parameter count or explicit token generation (CoT), LoopLM decouples compute depth from model size through architectural recurrence, avoiding context-length bloat while enabling input-adaptive latent reasoning.

Ouro 1.4B and 2.6B models match or exceed 4B–8B standard LLMs across math, science, and language benchmarks after scaling to 7.7T training tokens.
Recurrence dramatically improves knowledge manipulation and multi-hop composition without increasing raw knowledge storage capacity (~2 bits/parameter).
Latent iterative updates yield reasoning traces with higher causal faithfulness and improved safety alignment compared to explicit CoT methods.

Requires careful calibration of exit thresholds and entropy regularization coefficients to prevent early-exit collapse or over-computation on simple inputs.
Gains stem from architectural compute allocation rather than expanded knowledge capacity, limiting direct transferability to tasks requiring massive factual recall without complementary scaling.