SkillOrchestra: Learning to Route Agents via Skill Transfer

WisPaper

学术搜索

学术问答

价格

TrueCite

工作空间

Home

Blog

SkillOrchestra: Learning to Route Agents via Skill Transfer

[CVPR 2026] SkillOrchestra: Breaking the Efficiency Barrier in Agent Orchestration

总结

问题

方法

结果

要点

摘要

SkillOrchestra is a novel orchestration framework for compound AI systems that reframes multi-turn agent routing as skill-grounded decision making. It introduces a structured Skill Handbook to decouple abstract capability requirements from specific agent identities, achieving SOTA results across 10 benchmarks.

TL;DR

The transition from single LLM calls to Compound AI Systems has introduced a massive orchestration challenge: how do we route tasks to the right agent without going broke or losing accuracy? SkillOrchestra moves away from expensive, black-box Reinforcement Learning (RL) and instead utilizes a structured Skill Handbook. It achieves a 22.5% performance boost over the previous SOTA (Router-R1) while slashing training costs by up to 700x.

The "Routing Collapse" Crisis

Most current orchestration methods rely on end-to-end RL (like PPO or GRPO) to teach a router which model to call. However, the authors identify a fatal flaw: Routing Collapse.

Because the RL reward signal is often sparse or performance-heavy, the router becomes "lazy" and learns to call the most powerful/expensive model (e.g., Llama-3-70B) for every step, even for simple tasks like date formatting or basic retrieval. This leads to:

Prohibitive Costs: Over-reliance on frontier models.
Brittle Adaptability: If you add a new model to your pool, you have to retrain the whole RL policy.
Coarse Decisions: Treating a multi-step problem as a single unit rather than a sequence of distinct skill requirements.

Methodology: The Skill Handbook

Instead of training weights, SkillOrchestra trains a Knowledge Base. The core engine is the Skill Handbook, which acts as an intermediate abstraction layer between the User Query and the Agent Pool.

1. Skill Discovery & Refinement

The system analyzes execution traces. If Agent A succeeds where Agent B fails, the system abstracts the difference into a "Skill" (e.g., Symbolic Logic Manipulation or Harmonic Denominator Analysis).

2. Agent Profiling

Each agent is assigned a "Profile" in the handbook. Using Beta distributions ( $α, β$ ), the system tracks the estimated success probability of every agent for every specific skill.

Overall Architecture

3. Pareto-Optimal Selection

Not every orchestrator is smart enough to handle 100+ fine-grained skills. SkillOrchestra performs a Pareto-optimal validation to select the right "granularity" of skills for a specific orchestrator backbone (e.g., a 3B model gets a simpler handbook than a 70B model) to ensure the system stays on the performance-cost frontier.

Quantitative Powerhouse

The results are striking. Across general QA, Multi-hop reasoning, and Math, SkillOrchestra consistently finds the "sweet spot" that RL routers miss.

Performance Comparison

Efficiency: In the FRAMES benchmark, SkillOrchestra achieved 84.3% accuracy at a cost of $72.7 * *, w hi l e GP T - 5 a tt ain e d o n l y * * 74.6$ 120.4.
Transferability: Perhaps most impressively, a Skill Handbook trained using a Qwen-3B orchestrator can be "plugged into" a Llama-8B or Mistral-7B model without any retraining, resulting in immediate performance gains of 20%+.

Deep Insight: Why Why This Works

The fundamental "Why" behind SkillOrchestra's success is its Inductive Bias. RL tries to learn a mapping from State -> Action (Model). SkillOrchestra learns a mapping from State -> Skill and Skill -> Model (Competence).

By making the capability requirements explicit, the orchestrator doesn't have to "guess" if a model is good at math; it reads it in the handbook. This prevents the "Collapse" because the cost of each agent is explicitly subtracted from the utility function during the routing decision: $A_{t}^{*} = ar g max_{A \in A_{ψ_{t}}} [e x t E s t ima t e d C o m p e t e n ce - λ_{c} \cdot e x t C os t]$

Conclusion & Future Outlook

SkillOrchestra proves that the future of compound AI systems isn't just "bigger models" or "more RL," but smarter knowledge management. By decoupling the Orchestration Knowledge from the Model Parameters, we can build agentic systems that are cheaper, more interpretable, and infinitely more scalable.

Limitations: The framework currently relies on an "exploratory phase" to build the handbook, which requires some initial diverse execution traces. Future work could likely automate this into an "on-the-fly" learning system.

发现相似论文

试试这些示例

Search for recent papers that address "routing collapse" in multi-agent reinforcement learning or LLM orchestration systems.
Identify the origin of "Skill Handbook" or "Skill Discovery" concepts in modular robotics or hierarchical RL and how this paper adapts them for LLMs.
Explore studies applying skill-based routing or transfer learning to multi-modal agent orchestration involving Vision-Language Models (VLM).

[CVPR 2026] SkillOrchestra: Breaking the Efficiency Barrier in Agent Orchestration

1. TL;DR

2. The "Routing Collapse" Crisis

3. Methodology: The Skill Handbook

3.1. 1. Skill Discovery & Refinement

3.2. 2. Agent Profiling

3.3. 3. Pareto-Optimal Selection

4. Quantitative Powerhouse

5. Deep Insight: Why Why This Works

6. Conclusion & Future Outlook