A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression

WisPaper

学术搜索

学术问答

价格

TrueCite

工作空间

Home

Blog

A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression

TACO: Eliminating Terminal Noise via Self-Evolving Context Compression

总结

问题

方法

结果

要点

摘要

TACO (Terminal Agent Compression) is a plug-and-play, self-evolving framework designed to compress redundant terminal observations for long-horizon agentic tasks. By automatically discovering and refining structured compression rules, it achieves state-of-the-art performance gains (1%-4% accuracy increase) on benchmarks like TerminalBench 1.0/2.0 and SWE-Bench Lite while significantly reducing token overhead.

TL;DR

Terminal agents often get "lost" in the noise of verbose logs and build traces. TACO (Terminal Agent Compression) is a new self-evolving framework that treats context management as a dynamic learning task. By evolving a pool of semantic compression rules, TACO reduces token waste by 10% and boosts agent accuracy by up to 4%, enabling even small models to solve complex, long-horizon software engineering tasks.

The Bottleneck: The Quadratic Cost of Verbosity

When an AI agent interacts with a terminal, it receives raw output—sometimes thousands of lines of apt-get installation progress or make logs.

The Context Saturation: These logs are 99% noise but consume the limited context window.
The Quadratic Growth: As the conversation history grows, re-sending these logs at every step leads to a quadratic explosion in token costs.
The Signal-to-Noise Problem: Critical error messages (the "Signal") are buried under megabytes of "Unpacking..." and "Setting up..." lines (the "Noise").

Prior methods used static truncation (cutting the middle) or fixed summaries, but terminal outputs are too heterogeneous for one-size-fits-all rules.

Methodology: Semantic Filtering through Self-Evolution

TACO introduces a plug-and-play adapter that sits between the terminal and the LLM agent. Its core innovation is a three-tier evolution process:

Rule Initialization: Instead of starting from scratch, TACO uses a Global Rule Pool of proven compression patterns (e.g., how to handle pip or git logs).
Intra-Task Evolution: If an agent hits a new type of output (e.g., a specific binary disassembly), TACO invokes a "shadow LLM" to generate a regex-based compression rule on the fly.
The Feedback Loop: If the agent complains ("Wait, I need to see the full output!"), TACO marks that rule as "over-aggressive" and refines it into a more conservative version.

Overall Architecture of TACO Figure 1: The TACO framework workflow, highlighting the Global Rule Pool and the online evolution mechanism.

Experimental Results: Doing More with Less

The researchers tested TACO on TerminalBench (1.0 & 2.0) and SWE-Bench Lite using backbones from 8B to 1T parameters.

Accuracy Boost: On TerminalBench 2.0, adding TACO consistently improved performance by 1-4%.
Efficiency: For massive models like DeepSeek-V3.2 and MiniMax-M2.5, TACO reduced the "Token per Step" significantly.
Empowering Small Models: Smaller models (Qwen-8B/14B) often fail early because their context overflows. TACO allowed these models to "stay in the game" longer, increasing their success rate by effectively cleaning their "working memory."

Performance Comparison Table 1: Performance gains across various closed-source and open-source models.

Visualizing the "Before & After"

A key example in the paper shows a 10,000-character apt-get log being compressed to just 73 characters. Crucially, the rule preserves the "Signal" (the final status of the installation) while discarding the repetitive "Unpacking..." lines.

Token Budget Comparison Figure 2: Accuracy vs. Token Budget. Under any fixed token budget, TACO-enabled agents outperform baseline agents.

Critical Insight: Convergence and Stability

One of the most elegant parts of TACO is the Convergence Metric. By tracking the "Retention" of rules in the Top-K of the Global Pool, the system can determine when it has learned enough "wisdom" to handle a specific domain. The paper shows that once the rule frontier stabilizes, agent performance becomes significantly less volatile.

Conclusion & Future Work

TACO demonstrates that we don't necessarily need infinitely large context windows; we need smarter context management. By treating the terminal output as something that can be symbolically optimized through experience, TACO paves the way for persistent, lifelong-learning agents that get cheaper and smarter the more they interact with their environment.

Future research could extend this "observational compression" to GUI-based agents or multi-modal logs, where the "noise" is even more prevalent than in a text-based terminal.

发现相似论文

试试这些示例

Search for recent papers targeting observational context compression or adaptive token pruning specifically for software engineering agents and autonomous DevOps agents.
Which studies first proposed the concept of "self-evolving" rule-based memory for LLM agents, and how does TACO's symbolic optimization compare to parametric reinforcement learning approaches?
Investigate the application of semantic observation filtering in other high-noise environments like Robotic Process Automation (RPA) or real-world embodied AI interaction logs.

TACO: Eliminating Terminal Noise via Self-Evolving Context Compression

1. TL;DR

2. The Bottleneck: The Quadratic Cost of Verbosity

3. Methodology: Semantic Filtering through Self-Evolution

4. Experimental Results: Doing More with Less

4.1. Visualizing the "Before & After"

5. Critical Insight: Convergence and Stability

6. Conclusion & Future Work