This paper introduces AgentSkillOS, a principled framework for managing and orchestrating large-scale AI agent skill ecosystems (up to 200,000 skills). It employs a hierarchical "Capability Tree" for efficient skill discovery and a DAG-based orchestration engine to compose multiple skills for complex task execution, achieving SOTA performance in artifact-rich benchmarks.
TL;DR
AgentSkillOS is the first framework designed to handle the "ecosystem-scale" explosion of agent skills. By organizing over 200,000 skills into a Capability Tree and executing them via Directed Acyclic Graphs (DAGs), it allows agents to solve complex, artifact-rich tasks (videos, web pages, professional docs) that were previously impossible for "flat" single-skill agents.
Positioning: This work is a foundational infrastructure piece. It moves the conversation from "how to build a skill" to "how to manage and compose 200k skills."
The "Flat" Invocation Wall
Current LLM agents typically interact with tools or "skills" in a flat manner: the model is given a list, and it chooses one. However, as of February 2026, there are over 280,000 publicly available skills.
The authors identify two fatal flaws in prior work:
- Discovery Failure: LLMs cannot "reason" through 200k manual descriptions at once. Semantic search (RAG) often misses non-obvious but functionally superior skills.
- Orchestration Failure: Native agents struggle to manage data flow between multiple tools. They lose track of dependencies, leading to "fragmented" outputs rather than a cohesive project (like a full presentation with custom animations).
Methodology: Managed Discovery and Graph Execution
1. Capability Tree Construction (Manage Skills)
Instead of a flat list, AgentSkillOS recursively partitions skills into a hierarchy. Starting from five root categories (Content Creation, Data Processing, etc.), it uses an LLM to discover sub-groups and assign skills until leaf nodes are reached. This supports coarse-to-fine localization, allowing the agent to "zoom in" on a capability field.

2. DAG-based Orchestration (Solve Tasks)
Once skills are retrieved, the framework doesn't just hand them to the model. It builds a Directed Acyclic Graph (DAG).
- Quality-First: Adds stages for preparation and refinement.
- Efficiency-First: Maximizes parallelism (e.g., generating 5 images simultaneously).
- Simplicity-First: Minimizes the footprint for speed.
Experimental Proof: Orchestration is the Key
The team constructed a benchmark of 30 "artifact-rich" tasks. They didn't just measure "Pass/Fail," but used a Bradley-Terry Model for pairwise comparison of result quality (a much more rigorous standard for creative work).
| Ecosystem Size | Method | Bradley-Terry Score | | :--- | :--- | :--- | | 200 | Quality-First | 100.0 | | 200 | w/ Full Pool (Flat) | 24.3 | | 200K | Quality-First | 100.0 | | 200K | w/ Full Pool (Flat) | 17.2 |
The gap is staggering. Even with 200k skills available, the "Flat" agent performed poorly because it became "blind" to the right tools.
The radar charts show that AgentSkillOS variants (large polygons) maintain balanced capabilities across Data, Document, Video, Visual, and Web tasks, while flat baselines collapse as complexity grows.
Qualitative Leap
The difference isn't just numerical; it's visual.
- Vanilla agent: Produces basic Matplotlib plots.
- AgentSkillOS: Invokes internal Manim skills to produce high-fidelity mathematical animations with smooth transitions and professional annotations.

Critical Insights
The most profound takeaway is that structured composition is the real "Intelligence" multiplier. Even when the agent was given the "Oracle" (perfect) set of skills, if it invoked them in a flat sequence, it failed to match the quality of the DAG-orchestrated approach.
Limitations & Future Work
- Skill Quality: The framework assumes skills are high-quality. Autonomously "evaluating" the 3rd-party skills before inclusion remains an open challenge.
- Self-Evolution: The authors suggest that because skills are Markdown-based, agents should eventually start editing their own skill files to optimize execution.
Conclusion
AgentSkillOS provides the "Operating System" for the agent era. It proves that as AI tools become more decentralized and massive, the winner won't be the one with the most tools, but the one with the best "Manager" to organize and orchestrate them.
