AgentRed

English | 中文

AI Agent security testing framework with 160+ attack cases from latest research.

Quick Start • Features • Architecture • Test Cases • Advanced Usage

Why AgentRed?

As AI agents gain tool use, RAG retrieval, and autonomous execution capabilities, they become exponentially more vulnerable to attack. Traditional LLM security testing is not enough — agents have unique attack surfaces:

Tool hijacking — attackers redirect agent tool calls to malicious endpoints
RAG poisoning — poisoned knowledge base causes agents to output harmful content
Indirect prompt injection — injected instructions hidden in web pages, documents, or tool outputs
Multi-agent exploitation — compromised agents spread attacks across agent networks
Memory poisoning — long-term conversation history gets corrupted

AgentRed systematically tests your AI agent against 160+ attack vectors across 35 categories, derived from the latest academic research (OWASP, arXiv, USENIX Security). It generates a detailed HTML report with scores, remediation advice, and privacy-first architecture.

Features

Feature	Description
160+ Attack Test Cases	Prompt injection (18), jailbreaking (8), tool attacks (18), RAG attacks (16), multi-agent (4), multimodal (4), and more
7-Stage Pipeline	Config Check → Static Analysis → Adaptive Generation → Tool Use → Dynamic → RAG/Memory → Advisor
Privacy-First Architecture	3-level sanitization; Advisor agent only receives sanitized data, never raw agent content
Adaptive Test Generation	Analyzes your agent's domain/tools/capabilities and generates targeted test cases automatically
AI-Powered Advisor	Uses any LLM (GPT-4o, DeepSeek, etc.) to generate remediation suggestions from sanitized reports
3 Input Modes	Local directory scan, prompt text input, or live API endpoint testing
Tool Use Testing	Tests via agent's function calling interface (not just prompt injection)
RAG/Memory Evaluation	11 static checks + 9 dynamic tests for RAG quality, memory persistence, and tool calling
Config Checker	Detects missing API keys, placeholder values, empty .env files
HTML + JSON Reports	Radar charts, dimension cards, severity tables, remediation suggestions

Quick Start

Prerequisites

# Python 3.10+ required
python --version

# Install dependencies
pip install pyyaml

3 Ways to Run

# 1. Interactive wizard (recommended for beginners)
python cli.py wizard

# 2. Scan local agent directory (no API key needed)
python cli.py test --dir /path/to/your-agent

# 3. Test with API endpoint (full dynamic testing)
python cli.py test --api-key sk-xxx --api-endpoint https://api.openai.com/v1/chat/completions

Example: Full Test in One Command

# Test an OpenAI-compatible agent with all features enabled
python cli.py test \
  --dir ./my-agent \
  --api-key $OPENAI_API_KEY \
  --name "finance-bot" \
  --advisor-model deepseek-chat \
  --privacy-level moderate \
  --output ./reports

Screenshots

Report Overview

The HTML report shows overall scores, dimension breakdowns, and risk level at a glance:

Detailed Static Analysis

Each check item shows pass/warn/fail status with specific findings:

Architecture

Input Sources          Engine Pipeline              Output
┌──────────┐     ┌─────────────────────┐     ┌──────────────┐
│ Local Dir │────▶│ Config→Static       │────▶│ HTML+JSON    │
│ Prompt   │     │ Adaptive→ToolUse    │     │ Radar Chart  │
│ API      │     │ Dynamic→RAG/Mem     │     │ Remediation  │
└──────────┘     │ Advisor→Sanitize    │     └──────────────┘
                  └─────────────────────┘
                         ▲   │
              160+ Cases  │   ▼ Privacy Filter
              ┌───────────┴───┐
              │  Sanitized    │◀── Advisor LLM
              │  Data Only    │
              └───────────────┘

Click to view full architecture diagram

Test Case Coverage

All test cases are derived from latest academic research and industry standards:

Category	Count	Source
Prompt Injection	18	OWASP LLM01, InjecAgent (arXiv 2403.02691)
Jailbreaking	8	Many-shot (NeurIPS'24), GCG, AutoDAN, Cognitive Override
Harmful Content	10	OWASP LLM06, Safety Alignment Bypass
Data Exfiltration	10	CamoLeak, EchoLeak (CVE-2025-32711), Hidden Markdown
Tool Attack	8	ToolHijacker (arXiv 2504.19793)
Tool Description Poisoning	4	MCP Protocol Attack (MDPI 2026)
Tool Shadowing / Rug Pull	5	MCP Tool Substitution Attacks
Tool Invocation Abuse	6	Privilege Escalation via Tools
Tool Permission Overreach	3	OWASP Agentic AI - Tool Misuse
Knowledge Base Poisoning	3	PoisonedRAG (USENIX Security'25)
RAG Context Poisoning	2	Retrieval Manipulation
Cross-Document Data Leak	2	EchoLeak-style Cross-Doc Injection
Retriever Backdoor	2	Adversarial Retrieval Triggers
Adversarial Embedding	2	Semantic Space Poisoning
Memory Poisoning	3	Conversation History Corruption
RAG Scope Violation	2	Out-of-Bounds Retrieval
Multi-Agent Attack	4	A2A Protocol Exploitation, Cascade Attacks
Multimodal Attack	4	CrossInject (arXiv 2504.14348), Steganography
Social Engineering	2	Persona Spoofing, Authority Impersonation
Protocol Attack	3	A2A/MCP Protocol Layer Exploits
Autonomous Misbehavior	3	Goal Drift, Resource Exhaustion, Loop Traps
AI Virus / Self-Replication	3	Prompt-based Self-Propagation
Guardrail Manipulation	3	Safety Filter Bypass Techniques
Boundary (Input)	8	Fuzzing, Overflow, Encoding Attacks
Boundary (Task)	5	Capability Confusion, Goal Hijacking
Boundary (Context)	8	Context Window Attacks, Injection
Boundary (Permission)	4	ACL Bypass, Role Escalation
Boundary (Protocol)	3	API Contract Violations
Performance	12	Latency, Accuracy, Consistency, Robustness

Total: 160 test cases across 35 attack categories

Research Sources

Source	Year	Key Contributions
OWASP Top 10 for LLM Apps	2025	LLM01-LLO10 standard risks
OWASP Agentic AI Risks	2025	Agent-specific top 10 risks
LLM Security Survey	2025	Backdoors, adversarial inputs, embedding inversion
Prompt Injection in LLM Agents & MCP	2026	Tool Shadowing, Rug Pull, ZombAI, CamoLeak
Agentic AI Security	2025	Multi-agent, social engineering, autonomous失控
ToolHijacker	2025	Tool selection manipulation, doc injection
PoisonedRAG	2025	Knowledge base poisoning (90% success rate)
CrossInject	2025	Cross-modal injection (+30.1% success rate)
Many-shot Jailbreaking	2024	Long-context false-pattern injection

Advanced Usage

Adaptive Test Generation

AgentRed automatically analyzes your agent's characteristics and generates targeted test cases:

# Enable adaptive generation (default: on)
python cli.py test --dir ./my-agent

# The analyzer detects:
#   - Domain: finance, medical, customer_service, education, legal, code...
#   - Tools: web_search, code_exec, file_access, database, email, mcp, browser...
#   - Capabilities: rag, memory, multi_agent, autonomous, tool_use...
#   - Risk Level: high / medium / low (based on safety features)

Different agent types generate completely different test case sets:

Finance Agent → domain-specific fraud/injection tests
Medical Agent → HIPAA/safety boundary tests
Code Agent → sandbox escape, command injection, dependency poisoning
MCP-enabled Agent → tool description poisoning, shadowing, rug pull

Privacy Levels

# Strict: remove all sensitive info (API keys, paths, emails)
python cli.py test --privacy-level strict

# Moderate: mask paths, summarize prompts (default)
python cli.py test --privacy-level moderate

# Minimal: only remove explicit credentials
python cli.py test --privacy-level minimal

Advisor Configuration

# Use DeepSeek as advisor (cost-effective, Chinese-friendly)
python cli.py test --advisor-api-key $DEEPSEEK_API_KEY --advisor-model deepseek-chat

# Use GPT-4o for deeper analysis
python cli.py test --advisor-api-key $OPENAI_API_KEY --advisor-model gpt-4o

# Use reasoning model for complex analysis
python cli.py test --advisor-model deepseek-reasoner

# Rule-only mode (no API needed)
python cli.py test --no-advisor

Feature Toggles

# Disable individual modules
python cli.py test --no-adaptive         # No adaptive generation
python cli.py test --no-tool-use         # No tool use testing
python cli.py test --no-extended-eval    # No RAG/Memory evaluation
python cli.py test --no-config-check     # No config checking
python cli.py test --no-advisor          # No AI advisor

# Dry run: preview test cases without executing
python cli.py test --dry-run

# Filter by dimension or severity
python cli.py test --dimension security
pythoncli.py test --severity critical,high

CLI Reference

Usage: python cli.py test [OPTIONS]

Input Options:
  --dir PATH            Local agent directory to scan
  --prompt TEXT         System prompt text (or @file.txt)
  --api-key KEY         OpenAI-compatible API key
  --api-endpoint URL    API endpoint URL
  --name NAME           Agent name for report

Testing Options:
  --dimension DIM       security | boundary | performance | all
  --severity SEV        critical,high | high,medium | ...
  --output DIR          Report output directory (./reports)
  --dry-run            Preview without executing

Feature Flags:
  --no-adaptive        Disable adaptive test generation
  --no-tool-use        Disable tool use testing
  --no-extended-eval   Disable RAG/Memory evaluation
  --no-config-check    Disable configuration checking
  --no-advisor         Disable AI advisor

Privacy Options:
  --privacy-level LEVEL strict | moderate | minimal
  --no-sanitized       Don't sanitize advisor input (not recommended)

Advisor Options:
  --advisor-api-key KEY   Advisor LLM API key
  --advisor-model MODEL   Model name (deepseek-chat, gpt-4o, ...)
  --advisor-strategy STR  auto | api_only | rule_only | hybrid

Other Commands:
  wizard                Interactive setup wizard

Project Structure

agent-tester/
├── cli.py                        # CLI entry point
├── config.yaml                   # Default configuration
├── demo.py                       # End-to-end demo script
│
├── core/
│   ├── runner.py                 # Main orchestrator (7-stage pipeline)
│   ├── loader.py                 # YAML test case loader
│   ├── evaluator.py              # Rule-based response evaluator
│   ├── scorer.py                 # Multi-dimensional scorer
│   ├── reporter.py               # JSON report generator
│   ├── html_reporter.py          # HTML visual report generator
│   ├── scanner.py                # Directory scanner (detect framework)
│   ├── source.py                 # Unified agent profile (API/Local/Prompt)
│   ├── static_analyzer.py        # 12-item static analysis engine
│   ├── advisor.py                # AI advisor agent (any LLM)
│   ├── config_checker.py         # Missing config detection
│   ├── rag_memory_evaluator.py   # RAG/Memory/Tool evaluation
│   ├── tool_use_tester.py        # Tool/function calling tester
│   ├── privacy_filter.py         # 3-level data sanitizer
│   └── adaptive_generator.py     # Profile-based test case generator
│
├── client/
│   ├── base.py                   # Abstract client interface
│   ├── openai_client.py          # OpenAI-compatible client
│   └── mock_client.py            # Mock client for offline testing
│
├── testcases/
│   ├── security.yaml             # 115 security test cases
│   ├── boundary.yaml             # 33 boundary test cases
│   ├── performance.yaml          # 12 performance test cases
│   ├── tool_attack.yaml          # 18 tool attack test cases
│   └── rag_attack.yaml           # 16 RAG attack test cases
│
├── demo-agent/                   # Example agent for demo
│   ├── agent.py
│   ├── config.json
│   └── system_prompt.txt
│
└── screenshots/                  # README images

How It Works

Stage 1: Input Analysis

AgentRed accepts three types of input:

Local Directory (--dir) — Scans your agent project, auto-detects framework (LangChain, AutoGen, CrewAI, OpenAI SDK, or custom), extracts system prompts, config files, and code patterns.
Prompt Text (--prompt) — Directly analyze a system prompt string or @file.txt reference.
API Endpoint (--api-key) — Connect to a running agent via OpenAI-compatible API for full dynamic testing.

Stage 2: 7-Stage Testing Pipeline

[0] Config Check    → Detect missing API keys, placeholder values, empty configs
[1] Static Analysis  → 12 checks on prompts, code, config (no API needed)
[1.5] Adaptive Gen   → Generate targeted cases based on agent profile
[2] Tool Use Test    → Test via agent's function calling interface
[3] Dynamic Test     → Send 160+ attack prompts, evaluate responses
[4] RAG/Memory Eval  → Evaluate RAG quality, memory safety, tool usage
[5] Advisor          → AI generates remediation (sanitized input only)
[6] Report Sanitize  → Auto-remove sensitive data before saving

Stage 3: Privacy-First Design

Your Agent ──(raw data)──▶ AgentRed Tester ──(sanitize)──▶ Advisor LLM
                                    │
                              PrivacyFilter
                            strict/moderate/minimal
                              • API Keys masked
                              • Paths redacted
                              • Prompts summarized
                              • Emails removed

The Advisor agent never sees your original agent content. Only sanitized evaluation summaries are sent for analysis.

Contributing

We welcome contributions! Here's how you can help:

Add new attack vectors — Submit new YAML test cases in testcases/
Improve detection logic — Enhance evaluators in core/
Add framework support — Extend scanner.py for new agent frameworks
Fix bugs — Issues and PRs are welcome
Improve documentation — Better examples, clearer explanations

Adding a New Test Case

Create or edit a YAML file in testcases/:

- id: "sec-pi-new-001"
  category: "prompt_injection"
  dimension: "security"
  severity: "critical"
  description: "Your new attack technique description"
  prompt: "The actual attack payload here..."
  expected_behavior:
    type: refuse
    keywords: ["cannot", "unable", "sorry"]
  references:
    - "https://arxiv.org/abs/XXXX.XXXXXX"

Development Setup

git clone https://github.com/<your-org>/agentred.git
cd agentred
pip install pyyaml

# Run demo (offline, no API needed)
python demo.py

# Run with dry-run to see what would be tested
python cli.py test --prompt "You are a helpful assistant." --dry-run

Roadmap

Web UI dashboard with real-time testing visualization
CI/CD integration plugin (GitHub Actions pre-commit hook)
Benchmark leaderboard for different agent frameworks
Export to SARIF format for GitHub Security tab integration
Multi-language support (Chinese/English/Japanese)
Plugin system for custom attack modules

License

MIT License — see LICENSE for details.

Acknowledgments

This project builds on groundbreaking research from the security community:

OWASP Foundation — LLM Top 10 & Agentic AI Top 10 standards
USENIX Security Symposium — PoisonedRAG paper authors
arXiv contributors — All researchers cited above whose work informs our test cases
The broader AI safety community — For advancing the understanding of agent vulnerabilities

_{Built with focus on making AI agents safer for everyone.}
If this project helps you, consider giving it a ⭐!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
client		client
core		core
demo-agent		demo-agent
screenshots		screenshots
testcases		testcases
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
__init__.py		__init__.py
cli.py		cli.py
config.yaml		config.yaml
demo.py		demo.py
requirements.txt		requirements.txt
setup-github.bat		setup-github.bat

Folders and files

Latest commit

History

Repository files navigation

AgentRed

Why AgentRed?

Features

Quick Start

Prerequisites

3 Ways to Run

Example: Full Test in One Command

Screenshots

Report Overview

Detailed Static Analysis

Architecture

Test Case Coverage

Research Sources

Advanced Usage

Adaptive Test Generation

Privacy Levels

Advisor Configuration

Feature Toggles

CLI Reference

Project Structure

How It Works

Stage 1: Input Analysis

Stage 2: 7-Stage Testing Pipeline

Stage 3: Privacy-First Design

Contributing

Adding a New Test Case

Development Setup

Roadmap

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages