SkillOrchestra is a novel orchestration framework for compound AI systems that reframes multi-turn agent routing as skill-grounded decision making. It introduces a structured Skill Handbook to decouple abstract capability requirements from specific agent identities, achieving SOTA results across 10 benchmarks.
TL;DR
The transition from single LLM calls to Compound AI Systems has introduced a massive orchestration challenge: how do we route tasks to the right agent without going broke or losing accuracy? SkillOrchestra moves away from expensive, black-box Reinforcement Learning (RL) and instead utilizes a structured Skill Handbook. It achieves a 22.5% performance boost over the previous SOTA (Router-R1) while slashing training costs by up to 700x.
The "Routing Collapse" Crisis
Most current orchestration methods rely on end-to-end RL (like PPO or GRPO) to teach a router which model to call. However, the authors identify a fatal flaw: Routing Collapse.
Because the RL reward signal is often sparse or performance-heavy, the router becomes "lazy" and learns to call the most powerful/expensive model (e.g., Llama-3-70B) for every step, even for simple tasks like date formatting or basic retrieval. This leads to:
- Prohibitive Costs: Over-reliance on frontier models.
- Brittle Adaptability: If you add a new model to your pool, you have to retrain the whole RL policy.
- Coarse Decisions: Treating a multi-step problem as a single unit rather than a sequence of distinct skill requirements.
Methodology: The Skill Handbook
Instead of training weights, SkillOrchestra trains a Knowledge Base. The core engine is the Skill Handbook, which acts as an intermediate abstraction layer between the User Query and the Agent Pool.
1. Skill Discovery & Refinement
The system analyzes execution traces. If Agent A succeeds where Agent B fails, the system abstracts the difference into a "Skill" (e.g., Symbolic Logic Manipulation or Harmonic Denominator Analysis).
2. Agent Profiling
Each agent is assigned a "Profile" in the handbook. Using Beta distributions (), the system tracks the estimated success probability of every agent for every specific skill.

3. Pareto-Optimal Selection
Not every orchestrator is smart enough to handle 100+ fine-grained skills. SkillOrchestra performs a Pareto-optimal validation to select the right "granularity" of skills for a specific orchestrator backbone (e.g., a 3B model gets a simpler handbook than a 70B model) to ensure the system stays on the performance-cost frontier.
Quantitative Powerhouse
The results are striking. Across general QA, Multi-hop reasoning, and Math, SkillOrchestra consistently finds the "sweet spot" that RL routers miss.

- Efficiency: In the FRAMES benchmark, SkillOrchestra achieved 84.3% accuracy at a cost of 120.4.
- Transferability: Perhaps most impressively, a Skill Handbook trained using a Qwen-3B orchestrator can be "plugged into" a Llama-8B or Mistral-7B model without any retraining, resulting in immediate performance gains of 20%+.
Deep Insight: Why Why This Works
The fundamental "Why" behind SkillOrchestra's success is its Inductive Bias. RL tries to learn a mapping from State -> Action (Model). SkillOrchestra learns a mapping from State -> Skill and Skill -> Model (Competence).
By making the capability requirements explicit, the orchestrator doesn't have to "guess" if a model is good at math; it reads it in the handbook. This prevents the "Collapse" because the cost of each agent is explicitly subtracted from the utility function during the routing decision:
Conclusion & Future Outlook
SkillOrchestra proves that the future of compound AI systems isn't just "bigger models" or "more RL," but smarter knowledge management. By decoupling the Orchestration Knowledge from the Model Parameters, we can build agentic systems that are cheaper, more interpretable, and infinitely more scalable.
Limitations: The framework currently relies on an "exploratory phase" to build the handbook, which requires some initial diverse execution traces. Future work could likely automate this into an "on-the-fly" learning system.
