WisPaper
WisPaper
学术搜索
学术问答
价格
TrueCite
AgentFactory: Building a Self-Evolving AI Workforce through Executable Code Accumulation
总结
问题
方法
结果
要点
摘要

AgentFactory is a self-evolving framework that builds LLM-based agents by decomposing complex tasks into executable Python subagents. Unlike previous methods that store experiences as textual prompts, AgentFactory accumulates a library of reusable code modules that are continuously refined through execution feedback, achieving SOTA efficiency in task re-execution.

TL;DR

AgentFactory is a paradigm shift in agent autonomy. Instead of just "remembering" what it did via text logs, it writes, saves, and refines its own Python tools (subagents). This creates a library of robust, reusable, and portable capabilities that allow the agent to solve similar tasks in the future with 60%+ higher efficiency.

Background: Beyond Verbal Reflection

Most current LLM agents are "forgetful." Even those with "self-evolution" capabilities (like Reflexion) usually store their lessons as text prompts. While useful for high-level reasoning, text prompts are fragile for complex, multi-step procedures like "booking a meeting via Playwright" or "scraping specific data."

AgentFactory treats agent capabilities as software engineering artifacts. If an agent solves a problem once, it shouldn't just talk about it; it should ship a working, documented Python module that can be called by itself or other agents later.

Methodology: The Three-Phase Lifecycle

The core of AgentFactory lies in its ability to transform raw task requirements into a permanent library of skills. It operates in three distinct phases:

1. Phase 1: Install (From Scratch to Code)

When faced with a novel task, the Meta-Agent decomposes it into sub-tasks. It doesn't just execute them; it generates a specialized Python script for each. These are saved in a "Subagent Pool" with a SKILL.md file for documentation.

2. Phase 2: Self-Evolve (The Feedback Loop)

This is where the "growth" happens. If a saved subagent fails or encounters an edge case (e.g., a website layout changed), the Meta-Agent analyzes the error logs and rewrites the subagent's code. It moves from hardcoded fallbacks to robust logic (like switching from string parsing to Regex), making the subagent stronger over time.

3. Phase 3: Deploy (Portability)

Because the subagents are "Pure Python," they are framework-agnostic. You can train an agent in AgentFactory and then "export" its skills to Claude Code or LangChain simply by letting the new system read the generated documentation and execute the scripts.

AgentFactory Architecture Figure 1: The AgentFactory pipeline showing how Q1 (new task) creates new code, while Q2 (similar task) evolves existing code.

Experiments: Measuring Efficiency

The researchers tested AgentFactory against standard ReAct and "Textual Experience" baselines.

  • The Batch Test: They created two batches of tasks ( and ). tasks were similar in structure to but with different details.
  • Performance: Using Claude Opus 4.6, AgentFactory reduced the "Orchestration Tokens" (the effort the brain has to spend) from 8298 tokens (ReAct) to just 2971 tokens on the second batch.

Performance Comparison Table Table 1: Token consumption comparison across different models and task settings.

Cross-System Reuse

One of the most impressive demonstrations showed an agent in AgentFactory creating a "QQ Music Player" tool, which was then successfully utilized by Claude Code—a completely different agent system—just by reading the automatically generated SKILL.md.

Cross-System Reuse Demo Figure 2: Trajectory showing how subagents created in one system are successfully deployed and reused in another.

Critical Insights & Future Outlook

The genius of AgentFactory is its alignment with the Agent Skills open standard. It treats the development of AI agents like the development of a software library.

Key Takeaways:

  • Inductive Bias: By forcing the agent to output Python code rather than just text, the framework introduces a strong inductive bias toward structured, logical, and debuggable problem-solving.
  • Scalability: As the library grows, the cost of solving new tasks approaches a constant "lookup and call" cost rather than the "reason and act" cost of 0-shot prompting.
  • Limitation: Currently, the framework is heavily focused on web and shell-based tasks. The authors suggest that integrating Vision-Language Models (VLMs) will be necessary to expand this self-evolution into complex Graphical User Interfaces (GUIs).

In conclusion, AgentFactory moves us closer to a future where AI doesn't just represent knowledge—it accumulates executable wisdom.

发现相似论文

试试这些示例

  • Search for recent papers that focus on code-based self-evolution for LLM agents beyond prompt engineering or textual reflection.
  • Which study first introduced the concept of 'executable skills' as a persistent memory for agents, and how does AgentFactory's subagent approach differ from Voyager's skill library?
  • Investigate how the 'Agent Skills' open standard is being integrated into multi-modal agent frameworks for GUI or cross-platform automation.
目录
AgentFactory: Building a Self-Evolving AI Workforce through Executable Code Accumulation
1. TL;DR
2. Background: Beyond Verbal Reflection
3. Methodology: The Three-Phase Lifecycle
3.1. 1. Phase 1: Install (From Scratch to Code)
3.2. 2. Phase 2: Self-Evolve (The Feedback Loop)
3.3. 3. Phase 3: Deploy (Portability)
4. Experiments: Measuring Efficiency
4.1. Cross-System Reuse
5. Critical Insights & Future Outlook