🔗 Source: arXiv

GEPA: REFLECTIVE PROMPT EVOLUTION CAN OUTPERFORM REINFORCEMENT LEARNING

Mechanism: Integrates multi-objective evolutionary search with natural language reflection on AI system trajectories (reasoning chains, tool calls, outputs) to iteratively mutate, diagnose, and refine prompts without weight updates.
Nuance: Replaces sparse scalar policy gradients (used in GRPO/RLVR) with dense, interpretable textual feedback, while maintaining a Pareto front of diverse prompt candidates to avoid local optima and accelerate convergence compared to greedy or gradient-based optimizers.

Outperforms GRPO by up to 20% (+6% average across six tasks) while using ≤35× fewer rollouts, demonstrating superior sample efficiency.
Surpasses leading prompt optimizer MIPROv2 by >10% aggregate gain (e.g., +12% accuracy on AIME-2025) across both open and proprietary models.
Proves effective as an inference-time search strategy for code optimization (NPUEval, KernelBench) and adversarial prompt generation.

Heavily dependent on the reflection LM’s diagnostic accuracy; hallucinations or misinterpretations in natural language feedback can propagate flawed mutations.
Evolving prompts via text may struggle with highly constrained, non-textual, or strictly typed system components where linguistic abstraction loses precision.
Maintaining a Pareto front and executing multiple reflection calls per rollout introduces computational overhead, though still significantly lower than RL rollout budgets.