🔗 Source: arXiv

RetICL: Sequential Retrieval of In-Context Examples with Reinforcement Learning

Mechanism: Models ICL example selection as a Markov Decision Process (MDP) and trains an LSTM-based retriever via reinforcement learning to sequentially pick examples, incorporating a novel confidence reward based on generated solution perplexity.
Nuance: Unlike prior methods that score examples independently or rely on static semantic similarity, RetICL dynamically conditions each selection on previously chosen examples and their ordering, implicitly capturing inter-example dependencies and reasoning strategies.

Consistently outperforms or matches heuristic and learnable baselines across math word problem (TabMWP, GSM8K) and scientific QA (QASC) benchmarks.
Qualitative latent space analysis reveals the model clusters examples by solution complexity and reasoning steps rather than superficial semantic features.

Uses a fixed number of in-context examples, preventing dynamic prompt length adaptation during inference.
Training rewards rely on LLM-generated solutions, which can introduce noise from arithmetic errors or flawed logic.
Primarily validated on structured math/QA tasks; generalization to open-ended generation or real-world educational settings remains unproven.