Reflective Prompt Evolution
🔗 Source: arXiv
GEPA: REFLECTIVE PROMPT EVOLUTION CAN OUTPERFORM REINFORCEMENT LEARNING
🚀 Technical Novelty
- Mechanism: Integrates multi-objective evolutionary search with natural language reflection on AI system trajectories (reasoning chains, tool calls, outputs) to iteratively mutate, diagnose, and refine prompts without weight updates.
- Nuance: Replaces sparse scalar policy gradients (used in GRPO/RLVR) with dense, interpretable textual feedback, while maintaining a Pareto front of diverse prompt candidates to avoid local optima and accelerate convergence compared to greedy or gradient-based optimizers.
💡 Yield
- Outperforms GRPO by up to 20% (+6% average across six tasks) while using ≤35× fewer rollouts, demonstrating superior sample efficiency.
- Surpasses leading prompt optimizer MIPROv2 by >10% aggregate gain (e.g., +12% accuracy on AIME-2025) across both open and proprietary models.
- Proves effective as an inference-time search strategy for code optimization (NPUEval, KernelBench) and adversarial prompt generation.
⚠️ Limitations
- Heavily dependent on the reflection LM’s diagnostic accuracy; hallucinations or misinterpretations in natural language feedback can propagate flawed mutations.
- Evolving prompts via text may struggle with highly constrained, non-textual, or strictly typed system components where linguistic abstraction loses precision.
- Maintaining a Pareto front and executing multiple reflection calls per rollout introduces computational overhead, though still significantly lower than RL rollout budgets.