Recursive Long Context Scaling
🔗 Source: arXiv
RECURSIVE LANGUAGE MODELS
🚀 Technical Novelty
- Mechanism: Exposes the input prompt as a variable in an external REPL environment, allowing the LLM to write code that programmatically peeks into, chunks, and recursively invokes itself over sub-snippets of the context.
- Nuance: Differs from prior context compaction or retrieval scaffolds by avoiding lossy summarization/truncation; instead, it preserves full information density through dynamic, model-driven recursive decomposition and selective context loading during inference.
💡 Yield
- Successfully processes inputs up to 10M+ tokens (two orders of magnitude beyond standard context windows) with minimal performance degradation across dense reasoning tasks like OOLONG and BrowseComp-Plus.
- Outperforms direct LLM calls, context compaction, and retrieval agents by double-digit percentage gains while maintaining comparable or lower median inference costs per query.
- Demonstrates emergent, model-driven context management behaviors including regex-based filtering, uniform chunking for sub-calls, and variable-stitched long-output generation without explicit training.
⚠️ Limitations
- Performance slightly degrades on shorter prompts compared to base models, indicating a tradeoff point where direct inference is preferable.
- High variance in inference cost and runtime due to unpredictable trajectory lengths; outlier runs can be significantly more expensive than base model queries.
- Behavior and success rates vary substantially across different base models (e.g., GPT-5 vs. Qwen3-Coder) without task-specific prompt tuning or system prompt adjustments.