WisPaper
WisPaper
Scholar Search
Scholar QA
AI Feeds
Pricing
TrueCite
ClawKeeper: The "Antivirus" for the Autonomous Agent Era
Summary
Problem
Method
Results
Takeaways
Abstract

ClawKeeper is a comprehensive multi-layered safety framework designed for the OpenClaw autonomous agent ecosystem. It introduces a novel triple-defense architecture consisting of Skills, Plugins, and a decoupled "Watcher" agent, achieving a state-of-the-art Defense Success Rate (DSR) of 85-90% across seven major threat categories.

TL;DR

As autonomous agents like OpenClaw move from simple chatbots to system-level operators capable of executing shell commands and accessing local files, they introduce massive security risks. ClawKeeper is a new tri-layered framework that acts as the "antivirus" for these agentic operating systems. Its most innovative feature is the Watcher—an independent agent that sits outside the main system to monitor, intercept, and block malicious actions before they hit the hardware.

The Problem: When Your Agent Becomes a Double Agent

Existing security for agents is "fragmented." Most defenses are either simple prompts (easily ignored via jailbreaking) or internal plugins (easily disabled by a compromised agent). This creates a Safety-Utility Tradeoff: the more you restrict an agent to make it safe, the less "smart" it becomes at solving tasks. Furthermore, if an agent is told to "forget all previous instructions," it often discards its internal security guardrails as well.

Methodology: The Triple-Layer Defense

ClawKeeper solves this by layering protection from the instruction level down to an external supervisory level.

The Framework of ClawKeeper

1. Skill-based Protection (Instruction Level)

It injects high-level security "skills" into the agent's context. This includes Windows-specific constraints and "Red-Line" protocols for software like Telegram or Feishu to prevent sensitive data leakage.

2. Plugin-based Protection (Runtime Level)

This is a hard-coded enforcer within the OpenClaw runtime. It handles:

  • Hardening: Binding gateways to localhost only.
  • Configuration Protection: Using cryptographic hashes to ensure critical agent files aren't tampered with.
  • Behavioral Scanning: Analyzing logs retrospectively to find "latent" threats.

3. The Watcher (Regulatory Separation)

This is the "crown jewel" of the paper. The Watcher is a separate OpenClaw instance that communicates with the main agent via WebSocket.

  • Decoupled Intelligence: Because it's a separate process, the main agent cannot "kill" its supervisor.
  • Real-time Intervention: It monitors tool calls and command outputs. If it sees a rm -rf / or an SSH key exfiltration attempt, it pauses execution and asks the human for confirmation.

Watcher Architecture

Experiments: Superior Robustness and Self-Evolution

ClawKeeper was tested across seven threat categories (T1-T7), including Prompt Injection, Privilege Escalation, and Malicious Skill Installation. It achieved a 90% Success Rate where baselines like OpenGuardrails often struggled to hit 60%.

The most striking result is the Watcher's ability to learn. Unlike static code, the Watcher is an LLM agent itself. As it encounters more adversarial patterns, it updates its "security memory," essentially becoming more vigilant over time.

Performance Comparison Table: ClawKeeper consistently outperforms fragmented baselines across all seven OWASP-aligned threat categories.

Qualitative Evidence: Blocking the "Jailbreak"

In case studies, ClawKeeper effectively stopped agents from:

  1. Exfiltrating Credentials: When an agent was asked to read a file containing GitHub passwords, the Hardening module replaced the passwords with a policy notice.
  2. Dangerous Loops: The Watcher detected when an agent was stuck in a tool-call loop and forced human intervention.
  3. Obfuscated Commands: It decoded Base64 malicious payloads and blocked them before execution.

Critical Insight & Conclusion

ClawKeeper moves the industry away from "internal guardrails" toward External Oversight.

  • Takeaway: The "Watcher" paradigm is likely the future of AI safety. Just as we don't allow a bank teller to audit their own vault, we shouldn't expect a task-solving agent to be its own security guard.
  • Limitations: While powerful, the Watcher adds computational overhead (essentially running two LLM sessions). For local users with limited hardware, this might be a bottleneck.
  • Future Impact: This framework is not limited to OpenClaw. It can be ported to any agent system, providing a standardized "Safety Sidecar" for the broader AI ecosystem.

Find Similar Papers

Try Our Examples

  • Search for recent studies on decoupled auditor agents or "external monitors" for Large Language Model based autonomous systems similar to the Watcher paradigm.
  • Which paper first introduced the conflict between task utility and safety alignment in autonomous agents, and how does architectural separation compare to fine-tuning methods?
  • Explore the application of independent safety observers in multi-agent systems or embodied AI to prevent runaway execution or tool-use loops.
Contents
ClawKeeper: The "Antivirus" for the Autonomous Agent Era
1. TL;DR
2. The Problem: When Your Agent Becomes a Double Agent
3. Methodology: The Triple-Layer Defense
3.1. 1. Skill-based Protection (Instruction Level)
3.2. 2. Plugin-based Protection (Runtime Level)
3.3. 3. The Watcher (Regulatory Separation)
4. Experiments: Superior Robustness and Self-Evolution
5. Qualitative Evidence: Blocking the "Jailbreak"
6. Critical Insight & Conclusion