WisPaper
WisPaper
学术搜索
学术问答
价格
TrueCite
[System Intelligence] Externalization: Why the Next Breakthrough in AI Agents Isn't a Bigger Model
总结
问题
方法
结果
要点
摘要

This paper presents a unified systems-level review of Large Language Model (LLM) agents through the lens of externalization. It identifies three core dimensions—Memory, Skills, and Protocols—and introduces Harness Engineering as the integration layer that transforms internal cognitive burdens into reliable external structures.

TL;DR

The secret to reliable AI agents is not more parameters, but Externalization. This paper argues that modern agent design is shifting from "Model-Centric" to "Harness-Centric," moving cognitive burdens—like memory, procedural expertise, and interaction rules—out of the model’s weights and into a managed infrastructure called the Harness.

Background: The Outward Migration of Intelligence

For years, the industry assumed that "smarter weights = better agents." However, even the most powerful models fail at long-term consistency. The researchers track a historical arc of capability:

  1. Weights (2022-2023): Knowledge is frozen in parameters. Hard to update, easy to hallucinate.
  2. Context (2023-2024): Using RAG and Prompting (Chain-of-Thought) to "stage" cognition.
  3. Harness (2025-2026): Intelligence is distributed across persistent memory, skill registries, and standardized protocols.

Externalization Evolution Figure 1: Just as humans moved from internal thought to writing and digital computation, LLM agents are moving from internal weights to external harnesses.


The Core Architecture: Memory, Skills, and Protocols

The paper breaks down "Agency" into three externalized modules that transform the model's task:

1. Externalized State: Memory

Instead of cramming everything into a fragile context window, the harness uses a Hierarchical Memory Architecture.

  • Transformation: From Recall (hard) to Recognition/Retrieval (reliable).
  • The Design: Modern systems use "OS-style" memory management, swapping "hot" context (active work) for "cold" storage (historical episodes).

2. Externalized Expertise: Skills

A model shouldn't have to "invent" a workflow every time.

  • Transformation: From Improvised Generation to Structured Composition.
  • The Artifact: "Skill files" (like SKILL.md) encapsulate procedures, decision heuristics, and safety constraints. The agent simply loads the expertise required for the task.

3. Externalized Interaction: Protocols

Interaction with tools (APIs) or other agents is often the point of failure.

  • Transformation: From Ad-hoc Language to Structured Contracts.
  • The Solution: Standards like the Model Context Protocol (MCP) or Agent-to-Agent (A2A) protocols ensure that communication is typed, validated, and secure.

Methodology: The Harness as a Cognitive Environment

The "Harness" is the runtime that hosts these modules. It isn't just "glue code"; it's a Cognitive Environment that implements:

  • Agent Loops: The "Perceive-Plan-Act" cycle.
  • Sandboxing: Forcing the model to work in a safe, isolated area.
  • Observability: Creating a "black box recorder" of every decision.

Harness Architecture Figure 2: The community focus is shifting from the 'Agent Core' (LLM) to the 'Harness' (Infrastructure).


Deep Insight: The Cerebrum vs. The Cerebellum

In a fascinating extension to Embodied AI (Robotics), the authors suggest we are seeing a "Cerebrum–Cerebellum split."

  • The Cerebrum (LLM Agent): Handles high-level reasoning and task decomposition.
  • The Cerebellum (VLA Models): Handles fast, reactive motor control (e.g., grasping an object). By externalizing motor control as a "Skill," the high-level brain is freed to focus on the goal, making robots significantly more robust.

Critical Analysis & Conclusion

This paper changes the "SOTA" definition. A system’s power is no longer just its benchmark score on a static test, but its externalization quality.

The Trade-off: Externalization adds latency and context overhead. If you externalize too much, the model spends all its time reading manuals rather than working. The future of AI research will be "Partitioning": deciding exactly which 10% of a task should stay in the model's brain and which 90% should be moved to the harness.

Final Takeaway: We are entering the era of "Distributed Agency." If you want to build a better AI assistant, stop tuning the model and start building a better environment for it to live in.

发现相似论文

试试这些示例

  • Search for recent papers that compare the performance of 'parametric knowledge' versus 'externalized memory' in long-horizon LLM agent tasks.
  • Which researchers or papers first applied Donald Norman's theory of 'Cognitive Artifacts' to Large Language Model architectures?
  • Find studies investigating how multi-modal (visual/audio) externalization affects the design of 'Skill' modules in embodied robotic agents.
目录
[System Intelligence] Externalization: Why the Next Breakthrough in AI Agents Isn't a Bigger Model
1. TL;DR
2. Background: The Outward Migration of Intelligence
3. The Core Architecture: Memory, Skills, and Protocols
3.1. 1. Externalized State: Memory
3.2. 2. Externalized Expertise: Skills
3.3. 3. Externalized Interaction: Protocols
4. Methodology: The Harness as a Cognitive Environment
5. Deep Insight: The Cerebrum vs. The Cerebellum
6. Critical Analysis & Conclusion