Weight-Compiled Agentic Workflows
🔗 Source: arXiv
Compiling Agentic Workflows into LLM Weights: Near-Frontier Quality at Two Orders of Magnitude Less Cost
🚀 Technical Novelty
- Mechanism: Fine-tuning a base LLM on synthetic, flowchart-derived conversations to embed decision trees and tool-use protocols directly into parameters.
- Nuance: Replaces transient, prompt-heavy external orchestrators with persistent “subterranean agents,” decoupling workflow complexity from context window consumption and runtime latency.
💡 Yield
- 8B compiled models reach 87–98% of frontier in-context quality; cuts per-conversation costs by 128–462× and latency by 2.8×; enables 30–50 minute workflow recompilation cycles.
⚠️ Limitations
- Slightly lower naturalness/graceful handling scores than frontier baselines (~82% for 3B); requires synthetic data generation via a larger model during training; workflow updates demand full recompile rather than instant prompt edits.