ClawKeeper is a comprehensive multi-layered safety framework designed for the OpenClaw autonomous agent ecosystem. It introduces a novel triple-defense architecture consisting of Skills, Plugins, and a decoupled "Watcher" agent, achieving a state-of-the-art Defense Success Rate (DSR) of 85-90% across seven major threat categories.
TL;DR
As autonomous agents like OpenClaw move from simple chatbots to system-level operators capable of executing shell commands and accessing local files, they introduce massive security risks. ClawKeeper is a new tri-layered framework that acts as the "antivirus" for these agentic operating systems. Its most innovative feature is the Watcher—an independent agent that sits outside the main system to monitor, intercept, and block malicious actions before they hit the hardware.
The Problem: When Your Agent Becomes a Double Agent
Existing security for agents is "fragmented." Most defenses are either simple prompts (easily ignored via jailbreaking) or internal plugins (easily disabled by a compromised agent). This creates a Safety-Utility Tradeoff: the more you restrict an agent to make it safe, the less "smart" it becomes at solving tasks. Furthermore, if an agent is told to "forget all previous instructions," it often discards its internal security guardrails as well.
Methodology: The Triple-Layer Defense
ClawKeeper solves this by layering protection from the instruction level down to an external supervisory level.

1. Skill-based Protection (Instruction Level)
It injects high-level security "skills" into the agent's context. This includes Windows-specific constraints and "Red-Line" protocols for software like Telegram or Feishu to prevent sensitive data leakage.
2. Plugin-based Protection (Runtime Level)
This is a hard-coded enforcer within the OpenClaw runtime. It handles:
- Hardening: Binding gateways to localhost only.
- Configuration Protection: Using cryptographic hashes to ensure critical agent files aren't tampered with.
- Behavioral Scanning: Analyzing logs retrospectively to find "latent" threats.
3. The Watcher (Regulatory Separation)
This is the "crown jewel" of the paper. The Watcher is a separate OpenClaw instance that communicates with the main agent via WebSocket.
- Decoupled Intelligence: Because it's a separate process, the main agent cannot "kill" its supervisor.
- Real-time Intervention: It monitors tool calls and command outputs. If it sees a
rm -rf /or an SSH key exfiltration attempt, it pauses execution and asks the human for confirmation.

Experiments: Superior Robustness and Self-Evolution
ClawKeeper was tested across seven threat categories (T1-T7), including Prompt Injection, Privilege Escalation, and Malicious Skill Installation. It achieved a 90% Success Rate where baselines like OpenGuardrails often struggled to hit 60%.
The most striking result is the Watcher's ability to learn. Unlike static code, the Watcher is an LLM agent itself. As it encounters more adversarial patterns, it updates its "security memory," essentially becoming more vigilant over time.
Table: ClawKeeper consistently outperforms fragmented baselines across all seven OWASP-aligned threat categories.
Qualitative Evidence: Blocking the "Jailbreak"
In case studies, ClawKeeper effectively stopped agents from:
- Exfiltrating Credentials: When an agent was asked to read a file containing GitHub passwords, the Hardening module replaced the passwords with a policy notice.
- Dangerous Loops: The Watcher detected when an agent was stuck in a tool-call loop and forced human intervention.
- Obfuscated Commands: It decoded Base64 malicious payloads and blocked them before execution.
Critical Insight & Conclusion
ClawKeeper moves the industry away from "internal guardrails" toward External Oversight.
- Takeaway: The "Watcher" paradigm is likely the future of AI safety. Just as we don't allow a bank teller to audit their own vault, we shouldn't expect a task-solving agent to be its own security guard.
- Limitations: While powerful, the Watcher adds computational overhead (essentially running two LLM sessions). For local users with limited hardware, this might be a bottleneck.
- Future Impact: This framework is not limited to OpenClaw. It can be ported to any agent system, providing a standardized "Safety Sidecar" for the broader AI ecosystem.
