Security guardrails for OpenClaw agents - Lethal Trifecta defense
npm install clawguard-openclawSOTA security guardrails for OpenClaw agents ā Complete Lethal Trifecta defense.
The three attack vectors that can compromise an AI agent:
1. Input Attacks (Prompt Injection) - Malicious instructions in user messages or external content
2. Runtime Attacks (Tool Exploitation) - Abusing tool calls for data exfiltration or system compromise
3. Output Attacks (Data Leakage) - Credentials or PII leaking in agent responses
ClawGuard defends against all three with state-of-the-art detection techniques.
``bash`
openclaw plugins install @openclaw/clawguard
Then restart your gateway.
- Spotlighting - Data marking for untrusted content (Microsoft research)
- Defense presets - paranoid, balanced, permissive
- Structured threat events - Correlation via fingerprinting
- Context decay - Risk scores decay over conversation
`json5`
{
plugins: {
entries: {
clawguard: {
enabled: true,
config: {
preset: "balanced" // or "paranoid" or "permissive"
}
}
}
}
}
`json5`
{
plugins: {
entries: {
clawguard: {
enabled: true,
config: {
inputGuard: {
enabled: true,
threshold: 50,
blockOnDetection: false,
useAdversarialDetection: true,
useMultiTurnTracking: true
},
runtimeGuard: {
enabled: true,
dangerousTools: ["exec", "write", "edit"],
blockExfilUrls: true,
requireApproval: false
},
outputGuard: {
enabled: true,
redactCredentials: true,
redactPII: true,
canaryTokens: ["SECRET_CANARY_12345"]
},
spotlighting: {
enabled: true,
mode: "delimit",
sources: ["web", "email"]
},
logging: {
logThreats: true,
structuredEvents: true
}
}
}
}
}
}
| Preset | Threshold | Block | Adversarial | Multi-turn | Approval | Spotlighting |
|--------|-----------|-------|-------------|------------|----------|--------------|
| paranoid | 25 | ā | ā | ā | ā | all sources |balanced
| | 50 | ā | ā | ā | ā | web, email |permissive
| | 75 | ā | ā | ā | ā | disabled |
`bashCheck status and stats
openclaw clawguard status
Slash Command
In any chat, use
/clawguard to see current status and session stats.How It Works
ClawGuard hooks into OpenClaw's plugin lifecycle:
`
User Message
ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā INPUT GUARD (before_agent_start) ā
ā ⢠Pattern matching (7 languages) ā
ā ⢠Adversarial suffix detection ā
ā ⢠Multi-turn context tracking ā
ā ⢠Source-aware thresholds ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā RUNTIME GUARD (before_tool_call) ā
ā ⢠Parameter validation ā
ā ⢠Exfil URL blocking ā
ā ⢠Dangerous command detection ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā OUTPUT GUARD (message_sending) ā
ā ⢠Credential scanning ā
ā ⢠PII detection ā
ā ⢠Canary token monitoring ā
ā ⢠Auto-redaction ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā
Safe Response
`Research References
- Adversarial Suffixes: Zou et al. "Universal and Transferable Adversarial Attacks on Aligned Language Models"
- Spotlighting: Microsoft "Defending Against Indirect Prompt Injection Attacks"
- Lethal Trifecta: OpenClaw security model
- Multi-turn Attacks: Perez & Ribeiro "Ignore This Title and HackAPrompt"
Testing
`bash
cd projects/clawguard-plugin
bun test # 63 tests
`File Structure
`
src/
āāā index.ts # Plugin entry, lifecycle hooks, CLI
āāā guards.ts # Input/Runtime/Output guards
āāā patterns.ts # Detection patterns (injection, credentials, PII)
āāā analyzers.ts # SOTA: entropy, context tracker, spotlighting
āāā guards.test.ts # Guard tests (38)
āāā analyzers.test.ts # Analyzer tests (25)
``MIT
Built by MaxsClawd & Max ā Day one, shipped.