English | 中文
AI Agent security testing framework with 160+ attack cases from latest research.
Quick Start •
Features •
Architecture •
Test Cases •
Advanced Usage
As AI agents gain tool use, RAG retrieval, and autonomous execution capabilities, they become exponentially more vulnerable to attack. Traditional LLM security testing is not enough — agents have unique attack surfaces:
- Tool hijacking — attackers redirect agent tool calls to malicious endpoints
- RAG poisoning — poisoned knowledge base causes agents to output harmful content
- Indirect prompt injection — injected instructions hidden in web pages, documents, or tool outputs
- Multi-agent exploitation — compromised agents spread attacks across agent networks
- Memory poisoning — long-term conversation history gets corrupted
AgentRed systematically tests your AI agent against 160+ attack vectors across 35 categories, derived from the latest academic research (OWASP, arXiv, USENIX Security). It generates a detailed HTML report with scores, remediation advice, and privacy-first architecture.
| Feature | Description |
|---|---|
| 160+ Attack Test Cases | Prompt injection (18), jailbreaking (8), tool attacks (18), RAG attacks (16), multi-agent (4), multimodal (4), and more |
| 7-Stage Pipeline | Config Check → Static Analysis → Adaptive Generation → Tool Use → Dynamic → RAG/Memory → Advisor |
| Privacy-First Architecture | 3-level sanitization; Advisor agent only receives sanitized data, never raw agent content |
| Adaptive Test Generation | Analyzes your agent's domain/tools/capabilities and generates targeted test cases automatically |
| AI-Powered Advisor | Uses any LLM (GPT-4o, DeepSeek, etc.) to generate remediation suggestions from sanitized reports |
| 3 Input Modes | Local directory scan, prompt text input, or live API endpoint testing |
| Tool Use Testing | Tests via agent's function calling interface (not just prompt injection) |
| RAG/Memory Evaluation | 11 static checks + 9 dynamic tests for RAG quality, memory persistence, and tool calling |
| Config Checker | Detects missing API keys, placeholder values, empty .env files |
| HTML + JSON Reports | Radar charts, dimension cards, severity tables, remediation suggestions |
# Python 3.10+ required
python --version
# Install dependencies
pip install pyyaml# 1. Interactive wizard (recommended for beginners)
python cli.py wizard
# 2. Scan local agent directory (no API key needed)
python cli.py test --dir /path/to/your-agent
# 3. Test with API endpoint (full dynamic testing)
python cli.py test --api-key sk-xxx --api-endpoint https://api.openai.com/v1/chat/completions# Test an OpenAI-compatible agent with all features enabled
python cli.py test \
--dir ./my-agent \
--api-key $OPENAI_API_KEY \
--name "finance-bot" \
--advisor-model deepseek-chat \
--privacy-level moderate \
--output ./reportsThe HTML report shows overall scores, dimension breakdowns, and risk level at a glance:
Each check item shows pass/warn/fail status with specific findings:
Input Sources Engine Pipeline Output
┌──────────┐ ┌─────────────────────┐ ┌──────────────┐
│ Local Dir │────▶│ Config→Static │────▶│ HTML+JSON │
│ Prompt │ │ Adaptive→ToolUse │ │ Radar Chart │
│ API │ │ Dynamic→RAG/Mem │ │ Remediation │
└──────────┘ │ Advisor→Sanitize │ └──────────────┘
└─────────────────────┘
▲ │
160+ Cases │ ▼ Privacy Filter
┌───────────┴───┐
│ Sanitized │◀── Advisor LLM
│ Data Only │
└───────────────┘
All test cases are derived from latest academic research and industry standards:
| Category | Count | Source |
|---|---|---|
| Prompt Injection | 18 | OWASP LLM01, InjecAgent (arXiv 2403.02691) |
| Jailbreaking | 8 | Many-shot (NeurIPS'24), GCG, AutoDAN, Cognitive Override |
| Harmful Content | 10 | OWASP LLM06, Safety Alignment Bypass |
| Data Exfiltration | 10 | CamoLeak, EchoLeak (CVE-2025-32711), Hidden Markdown |
| Tool Attack | 8 | ToolHijacker (arXiv 2504.19793) |
| Tool Description Poisoning | 4 | MCP Protocol Attack (MDPI 2026) |
| Tool Shadowing / Rug Pull | 5 | MCP Tool Substitution Attacks |
| Tool Invocation Abuse | 6 | Privilege Escalation via Tools |
| Tool Permission Overreach | 3 | OWASP Agentic AI - Tool Misuse |
| Knowledge Base Poisoning | 3 | PoisonedRAG (USENIX Security'25) |
| RAG Context Poisoning | 2 | Retrieval Manipulation |
| Cross-Document Data Leak | 2 | EchoLeak-style Cross-Doc Injection |
| Retriever Backdoor | 2 | Adversarial Retrieval Triggers |
| Adversarial Embedding | 2 | Semantic Space Poisoning |
| Memory Poisoning | 3 | Conversation History Corruption |
| RAG Scope Violation | 2 | Out-of-Bounds Retrieval |
| Multi-Agent Attack | 4 | A2A Protocol Exploitation, Cascade Attacks |
| Multimodal Attack | 4 | CrossInject (arXiv 2504.14348), Steganography |
| Social Engineering | 2 | Persona Spoofing, Authority Impersonation |
| Protocol Attack | 3 | A2A/MCP Protocol Layer Exploits |
| Autonomous Misbehavior | 3 | Goal Drift, Resource Exhaustion, Loop Traps |
| AI Virus / Self-Replication | 3 | Prompt-based Self-Propagation |
| Guardrail Manipulation | 3 | Safety Filter Bypass Techniques |
| Boundary (Input) | 8 | Fuzzing, Overflow, Encoding Attacks |
| Boundary (Task) | 5 | Capability Confusion, Goal Hijacking |
| Boundary (Context) | 8 | Context Window Attacks, Injection |
| Boundary (Permission) | 4 | ACL Bypass, Role Escalation |
| Boundary (Protocol) | 3 | API Contract Violations |
| Performance | 12 | Latency, Accuracy, Consistency, Robustness |
Total: 160 test cases across 35 attack categories
| Source | Year | Key Contributions |
|---|---|---|
| OWASP Top 10 for LLM Apps | 2025 | LLM01-LLO10 standard risks |
| OWASP Agentic AI Risks | 2025 | Agent-specific top 10 risks |
| LLM Security Survey | 2025 | Backdoors, adversarial inputs, embedding inversion |
| Prompt Injection in LLM Agents & MCP | 2026 | Tool Shadowing, Rug Pull, ZombAI, CamoLeak |
| Agentic AI Security | 2025 | Multi-agent, social engineering, autonomous失控 |
| ToolHijacker | 2025 | Tool selection manipulation, doc injection |
| PoisonedRAG | 2025 | Knowledge base poisoning (90% success rate) |
| CrossInject | 2025 | Cross-modal injection (+30.1% success rate) |
| Many-shot Jailbreaking | 2024 | Long-context false-pattern injection |
AgentRed automatically analyzes your agent's characteristics and generates targeted test cases:
# Enable adaptive generation (default: on)
python cli.py test --dir ./my-agent
# The analyzer detects:
# - Domain: finance, medical, customer_service, education, legal, code...
# - Tools: web_search, code_exec, file_access, database, email, mcp, browser...
# - Capabilities: rag, memory, multi_agent, autonomous, tool_use...
# - Risk Level: high / medium / low (based on safety features)Different agent types generate completely different test case sets:
- Finance Agent → domain-specific fraud/injection tests
- Medical Agent → HIPAA/safety boundary tests
- Code Agent → sandbox escape, command injection, dependency poisoning
- MCP-enabled Agent → tool description poisoning, shadowing, rug pull
# Strict: remove all sensitive info (API keys, paths, emails)
python cli.py test --privacy-level strict
# Moderate: mask paths, summarize prompts (default)
python cli.py test --privacy-level moderate
# Minimal: only remove explicit credentials
python cli.py test --privacy-level minimal# Use DeepSeek as advisor (cost-effective, Chinese-friendly)
python cli.py test --advisor-api-key $DEEPSEEK_API_KEY --advisor-model deepseek-chat
# Use GPT-4o for deeper analysis
python cli.py test --advisor-api-key $OPENAI_API_KEY --advisor-model gpt-4o
# Use reasoning model for complex analysis
python cli.py test --advisor-model deepseek-reasoner
# Rule-only mode (no API needed)
python cli.py test --no-advisor# Disable individual modules
python cli.py test --no-adaptive # No adaptive generation
python cli.py test --no-tool-use # No tool use testing
python cli.py test --no-extended-eval # No RAG/Memory evaluation
python cli.py test --no-config-check # No config checking
python cli.py test --no-advisor # No AI advisor
# Dry run: preview test cases without executing
python cli.py test --dry-run
# Filter by dimension or severity
python cli.py test --dimension security
pythoncli.py test --severity critical,highUsage: python cli.py test [OPTIONS]
Input Options:
--dir PATH Local agent directory to scan
--prompt TEXT System prompt text (or @file.txt)
--api-key KEY OpenAI-compatible API key
--api-endpoint URL API endpoint URL
--name NAME Agent name for report
Testing Options:
--dimension DIM security | boundary | performance | all
--severity SEV critical,high | high,medium | ...
--output DIR Report output directory (./reports)
--dry-run Preview without executing
Feature Flags:
--no-adaptive Disable adaptive test generation
--no-tool-use Disable tool use testing
--no-extended-eval Disable RAG/Memory evaluation
--no-config-check Disable configuration checking
--no-advisor Disable AI advisor
Privacy Options:
--privacy-level LEVEL strict | moderate | minimal
--no-sanitized Don't sanitize advisor input (not recommended)
Advisor Options:
--advisor-api-key KEY Advisor LLM API key
--advisor-model MODEL Model name (deepseek-chat, gpt-4o, ...)
--advisor-strategy STR auto | api_only | rule_only | hybrid
Other Commands:
wizard Interactive setup wizard
agent-tester/
├── cli.py # CLI entry point
├── config.yaml # Default configuration
├── demo.py # End-to-end demo script
│
├── core/
│ ├── runner.py # Main orchestrator (7-stage pipeline)
│ ├── loader.py # YAML test case loader
│ ├── evaluator.py # Rule-based response evaluator
│ ├── scorer.py # Multi-dimensional scorer
│ ├── reporter.py # JSON report generator
│ ├── html_reporter.py # HTML visual report generator
│ ├── scanner.py # Directory scanner (detect framework)
│ ├── source.py # Unified agent profile (API/Local/Prompt)
│ ├── static_analyzer.py # 12-item static analysis engine
│ ├── advisor.py # AI advisor agent (any LLM)
│ ├── config_checker.py # Missing config detection
│ ├── rag_memory_evaluator.py # RAG/Memory/Tool evaluation
│ ├── tool_use_tester.py # Tool/function calling tester
│ ├── privacy_filter.py # 3-level data sanitizer
│ └── adaptive_generator.py # Profile-based test case generator
│
├── client/
│ ├── base.py # Abstract client interface
│ ├── openai_client.py # OpenAI-compatible client
│ └── mock_client.py # Mock client for offline testing
│
├── testcases/
│ ├── security.yaml # 115 security test cases
│ ├── boundary.yaml # 33 boundary test cases
│ ├── performance.yaml # 12 performance test cases
│ ├── tool_attack.yaml # 18 tool attack test cases
│ └── rag_attack.yaml # 16 RAG attack test cases
│
├── demo-agent/ # Example agent for demo
│ ├── agent.py
│ ├── config.json
│ └── system_prompt.txt
│
└── screenshots/ # README images
AgentRed accepts three types of input:
-
Local Directory (
--dir) — Scans your agent project, auto-detects framework (LangChain, AutoGen, CrewAI, OpenAI SDK, or custom), extracts system prompts, config files, and code patterns. -
Prompt Text (
--prompt) — Directly analyze a system prompt string or@file.txtreference. -
API Endpoint (
--api-key) — Connect to a running agent via OpenAI-compatible API for full dynamic testing.
[0] Config Check → Detect missing API keys, placeholder values, empty configs
[1] Static Analysis → 12 checks on prompts, code, config (no API needed)
[1.5] Adaptive Gen → Generate targeted cases based on agent profile
[2] Tool Use Test → Test via agent's function calling interface
[3] Dynamic Test → Send 160+ attack prompts, evaluate responses
[4] RAG/Memory Eval → Evaluate RAG quality, memory safety, tool usage
[5] Advisor → AI generates remediation (sanitized input only)
[6] Report Sanitize → Auto-remove sensitive data before saving
Your Agent ──(raw data)──▶ AgentRed Tester ──(sanitize)──▶ Advisor LLM
│
PrivacyFilter
strict/moderate/minimal
• API Keys masked
• Paths redacted
• Prompts summarized
• Emails removed
The Advisor agent never sees your original agent content. Only sanitized evaluation summaries are sent for analysis.
We welcome contributions! Here's how you can help:
- Add new attack vectors — Submit new YAML test cases in
testcases/ - Improve detection logic — Enhance evaluators in
core/ - Add framework support — Extend
scanner.pyfor new agent frameworks - Fix bugs — Issues and PRs are welcome
- Improve documentation — Better examples, clearer explanations
Create or edit a YAML file in testcases/:
- id: "sec-pi-new-001"
category: "prompt_injection"
dimension: "security"
severity: "critical"
description: "Your new attack technique description"
prompt: "The actual attack payload here..."
expected_behavior:
type: refuse
keywords: ["cannot", "unable", "sorry"]
references:
- "https://arxiv.org/abs/XXXX.XXXXXX"git clone https://github.com/<your-org>/agentred.git
cd agentred
pip install pyyaml
# Run demo (offline, no API needed)
python demo.py
# Run with dry-run to see what would be tested
python cli.py test --prompt "You are a helpful assistant." --dry-run- Web UI dashboard with real-time testing visualization
- CI/CD integration plugin (GitHub Actions pre-commit hook)
- Benchmark leaderboard for different agent frameworks
- Export to SARIF format for GitHub Security tab integration
- Multi-language support (Chinese/English/Japanese)
- Plugin system for custom attack modules
MIT License — see LICENSE for details.
This project builds on groundbreaking research from the security community:
- OWASP Foundation — LLM Top 10 & Agentic AI Top 10 standards
- USENIX Security Symposium — PoisonedRAG paper authors
- arXiv contributors — All researchers cited above whose work informs our test cases
- The broader AI safety community — For advancing the understanding of agent vulnerabilities
Built with focus on making AI agents safer for everyone.
If this project helps you, consider giving it a ⭐!

