Reading List
📚 Centralized Reading List & Field Advancements
Select a research vector below to isolate the literature and view its trend.
🔄 Test-Time Adaptation
The field is converging on a paradigm shift from static parameter optimization to dynamic inference-time computation, where capability acquisition is decoupled from model weights and formalized as the continuous optimization of auxiliary state spaces and iterative numerical processes during the forward pass. This trajectory establishes runtime computational allocation—spanning external memory dynamics, text-space gradient optimization, and meta-control orchestration—as the primary bottleneck and frontier for scaling frozen architectures without parametric or architectural modification.
| Date | Paper | Core Takeaway |
|---|---|---|
2026-05 |
SkillOpt Text Space Agent Optimization | Treats agent skills as trainable external state optimized via text-space gradients, enabling weight-free procedural adaptation across models and harnesses. |
2026-05 |
Compact Test-Time Memory | An 8×8 online memory matrix dynamically updates during inference, enabling frozen LLMs to efficiently retain long-term context without fine-tuning or context expansion. |
2026-05 |
Training-Free Looped Transformers | Retrofitting frozen LLMs with inference-time layer looping and numerical integration yields significant accuracy gains without any training or architectural changes. |
2025-12 |
RL Conductor Agent Orchestration | A 7B RL-trained model dynamically orchestrates and recursively scales worker LLMs at inference time to achieve state-of-the-art reasoning performance. |
🧠 In-Context Learning
The current frontier converges on dynamic prompt optimization paradigms that fuse reinforcement learning with language-native iterative feedback, fundamentally addressing the dual bottlenecks of sample inefficiency and catastrophic forgetting inherent in static or gradient-sparse adaptation methods. This trajectory establishes a collective shift toward hybrid evolutionary-reflection mechanisms that navigate the prompt landscape via semantic self-correction rather than raw gradients, thereby decoupling rapid task-specific plasticity from the degradation of foundational model generalization.
| Date | Paper | Core Takeaway |
|---|---|---|
2026-05 |
Fast-Slow LLM Adaptation | Interleaving prompt optimization with reinforcement learning enables rapid task adaptation while preserving model plasticity and preventing catastrophic forgetting. |
2025-07 |
Reflective Prompt Evolution | GEPA replaces sparse RL gradients with iterative natural language reflection and evolutionary search to optimize LLM prompts with drastically higher sample efficiency. |
⚡ Efficient Architectures
The frontier is defined by a fundamental shift from discrete token attention to continuous functional representations governed by structured linear operators and recursive latent condensation, which reframe context processing as an optimization over compressed signal manifolds rather than fixed sequence lengths. This mathematical trajectory unifies algorithmic sparsity, modular memory integration, and dynamic system routing to resolve KV cache I/O bottlenecks, enabling order-of-magnitude efficiency gains through hardware-aware co-design and parameter-efficient workflows that preserve frontier performance without architectural overhaul.
| Date | Paper | Core Takeaway |
|---|---|---|
2026-05 |
Functional Attention Architecture | Lifts attention from discrete tokens to functional spaces via structured linear operators for efficient, resolution-invariant operator learning. |
2026-05 |
Global Regression KV Cache | Training-free global ridge regression aligns compressed KV caches with full-cache attention, eliminating over-merging while preserving long-context performance. |
2026-05 |
Self-Regulated Simulative Planning | Decomposing agentic reasoning into a self-regulated configurator and simulative planner slashes token consumption by up to 95% while matching trillion-parameter model performance. |
2026-05 |
Modular Memory Architecture | Plug-and-play modular memory enables efficient, noise-robust knowledge integration in frozen LLMs without retraining or context expansion. |
2026-05 |
Agentic Workflow Compilation | Fine-tuning small LLMs to internalize agentic workflows cuts inference costs by two orders of magnitude while maintaining near-frontier quality. |
2026-04 |
Latent Condensed Attention | LCA natively condenses context within MLA’s latent space, slashing KV cache by 90% and accelerating prefilling by 2.5× without extra parameters. |
2026-03 |
Efficient Sparse LLM Kernels | Custom CUDA kernels and sparse packing formats unlock >99% unstructured sparsity in LLMs for major throughput and memory gains with negligible accuracy loss. |
2026-02 |
DualPath KV Cache Optimization | DualPath eliminates KV-cache I/O bottlenecks in agentic LLM inference by dynamically routing cache loads across prefill and decode engines to double system throughput. |
2026-02 |
Progressive Thought Encoding | Encodes evicted KV cache tokens into LoRA adapters, enabling large reasoning models to train and infer under strict memory constraints without sacrificing accuracy. |
2025-10 |
Tandem S2S-LLM Architecture | A tandem architecture injects real-time LLM knowledge into a speech-to-speech model via oracle tokens, achieving cascaded-system quality without latency penalties. |
2025-06 |
DiffusionBlocks Block-Wise Training | Recasting residual networks as diffusion processes enables memory-efficient, independent block training that matches end-to-end performance. |
2023-05 |
Efficient Long Context Compression | Teaching LMs to recursively compress long contexts into accumulated soft prompts enables efficient window extension and faster inference without architectural overhaul. |
2023-04 |
Gist Token Prompt Compression | Compresses arbitrary LLM prompts into cached gist tokens via modified attention masks, slashing inference compute by up to 40% with minimal quality loss. |
👁️ Multimodal & Vision
The field is converging on temporally decoupled multimodal architectures that disentangle knowledge acquisition from real-time inference via asynchronous retrieval pipelines, fundamentally shifting away from monolithic synchronous generation. This trajectory demands rigorous optimization of cross-temporal information flow under strict causal latency bounds, with the collective bottleneck residing in aligning heterogeneous processing rates without inducing phase lag or degrading streaming coherence.
| Date | Paper | Core Takeaway |
|---|---|---|
2026-04 |
Asynchronous RAG for Speech | Integrates asynchronous retrieval into full-duplex speech models to boost factuality without sacrificing real-time conversational latency. |
🤖 Embodied AI & Robotics
The frontier is characterized by a convergence toward minimal, stable objective functions for end-to-end latent world modeling that eliminate collapse heuristics via intrinsic mathematical constraints, enabling direct mapping from raw pixels to predictive representations capable of supporting rapid planning and physical reasoning. This trajectory indicates a fundamental shift toward unifying representation learning and control within a single optimization landscape, where robust causal inference is achieved by reducing the loss topology to its essential components rather than relying on auxiliary regularization or modular decomposition.
| Date | Paper | Core Takeaway |
|---|---|---|
2026-03 |
Stable End-to-End World Models | LeWorldModel enables stable, end-to-end latent world modeling from raw pixels using only two loss terms, eliminating collapse heuristics while enabling fast planning and physical reasoning. |
📐 Theory & Optimization
The research frontier converges on a paradigm of stable, time-parallel optimization for recurrent systems that supplants unstable gradient propagation with continuous latent-space supervision and temporal derivative learning, effectively bridging biological plausibility with scalable error-driven approximation. This trajectory unifies generalization and representation fidelity by exploiting signal-reservoir output dynamics and signal-to-noise preconditioning to augment discrete prediction objectives, thereby enabling exact population-risk minimization while mitigating collapse through precise control of internal signal statistics.
| Date | Paper | Core Takeaway |
|---|---|---|
2026-06 |
Temporal Derivative Learning | Temporal derivative learning bridges biologically plausible neural circuits with the computational power of error-driven gradient approximation. |
2026-06 |
Pretraining RNNs Without Recurrence | SMT replaces unstable BPTT with time-parallel supervised learning on Transformer-generated memory states, enabling stable O(1) gradient paths for RNN pretraining. |
2026-05 |
Next Implicit Token Prediction | Augments discrete next-token prediction with continuous latent-space supervision to prevent representation collapse and boost downstream performance. |
2026-05 |
Deep Learning Generalization Theory | Unifies generalization phenomena through signal-reservoir output dynamics and enables exact population-risk training via a lightweight SNR preconditioner. |