FrogShield is an educational framework that demonstrates defensive strategies against prompt injection attacks in Large Language Models (LLMs).
- Purpose
- Components
- Configuration
- Logging
- How to Run
- Current Status & Limitations
- Future Development Ideas
- Contributors
The goal of FrogShield is to illustrate three key layers of defense against prompt injection:
- Input Validation: Analyzing incoming prompts against known malicious patterns (loaded from
config_data/patterns.txt) and using placeholder functions (frogshield/utils/text_analysis.py) to check for suspicious syntax or context manipulation. - Model Hardening (Conceptual): The
frogshield/model_hardener.pymodule provides methods for generating adversarial examples and testing model boundaries conceptually. It does not perform actual model training. - Real-time Monitoring: Analyzing the LLM's output for suspicious keywords (loaded from
config_data/) or refusal messages, and monitoring basic behavioral patterns (like response length) for anomalies using placeholder logic (frogshield/realtime_monitor.py).
The project includes the following key files and directories:
README.md: This file.LICENSE: The MIT license file.pyproject.toml: Project build configuration and core library dependencies (e.g.,PyYAML).requirements.txt: Lists additional dependencies needed for specific demos (e.g.,ollama).config.yaml: Main configuration file for tuning FrogShield components and specifying list file paths.config_data/: Directory containing external data files:patterns.txt: Default file containing known injection patterns.refusal_keywords.txt: Keywords indicating appropriate LLM refusal.compliance_keywords.txt: Keywords indicating potential sensitive data leaks or compliance.sample_prompts.txt: Sample prompts used bydemo_mock.py.boundary_refusal_keywords.txt: Refusal keywords used bydemo_ollama.pyboundary tests.
frogshield/: Directory containing the core defense library modules.__init__.py: Makesfrogshieldimportable and defines the public API (InputValidator,RealtimeMonitor,ModelHardener).input_validator.py: Contains theInputValidatorclass for checking user input.model_hardener.py: Contains theModelHardenerclass for conceptual hardening tasks.realtime_monitor.py: Contains theRealtimeMonitorclass for checking LLM output.utils/: Sub-directory for shared utilities.__init__.py: Package initializer.config_loader.py: Utility for loadingconfig.yamland list files.text_analysis.py: Placeholder functions for syntax/context analysis (used byInputValidator).
tests/: Contains unit tests using Python'sunittestframework.- Includes tests for validation, monitoring, and hardening modules.
demo_mock.py: Script demonstratingFrogShieldwith a simple, built-in mock LLM.demo_ollama.py: Script demonstratingFrogShieldwith a local LLM run via Ollama.run_demo.sh: Shell script to run thedemo_ollama.pysteps sequentially and interactively..gitignore: Standard Python gitignore file.
FrogShield's behavior and external data sources are configured primarily through config.yaml.
Most configurable parameters for InputValidator, RealtimeMonitor, and the underlying TextAnalysis utilities are defined in config.yaml located in the project root. The components load these settings automatically if not overridden during instantiation.
context_window(int): Number of past conversation turns to consider for context analysis.
sensitivity_threshold(float, 0.0-1.0): Base sensitivity for detecting behavioral anomalies (e.g., response length deviations).initial_avg_length(int): Starting guess for average response length (used initially).behavior_monitoring_factor(float): Multiplier applied tosensitivity_thresholdto adjust the acceptable deviation range for length checks.
syntax_non_alnum_threshold(float): Max allowed ratio of non-alphanumeric/non-space characters in a prompt.syntax_max_word_length(int): Max allowed length for a single "word".
- Purpose: Defines the paths (relative to the project root) for external data files used by components and demos.
- Keys:
patterns: Path to injection patterns file (used byInputValidator).refusal_keywords: Path to refusal keywords file (used byRealtimeMonitor).compliance_keywords: Path to compliance/sensitive keywords file (used byRealtimeMonitor).sample_prompts: Path to sample prompts file (used bydemo_mock.py).boundary_refusal_keywords: Path to boundary test refusal keywords (used bydemo_ollama.py).
External lists like injection patterns and keywords are stored as plain text files (one item per line, # comments ignored) within the config_data/ directory. The specific file used for each list type is determined by the paths set in the ListFiles section of config.yaml.
To customize these lists:
- Edit the files within
config_data/directly. - Modify
config.yamlto point to different files (ensure they are placed correctly relative to the project root). - Pass the content directly: When initializing components like
InputValidatororRealtimeMonitorprogrammatically, you can pass the list/set content directly via arguments (e.g.,patterns=[...],refusal_keywords={...}), bypassing the file loading mechanism.
- The
frogshieldlibrary modules use Python's standardloggingmodule. - The demo scripts configure basic console logging to show
INFOlevel messages by default. - To see more detailed
DEBUGmessages (e.g., specific pattern matches, analysis steps), use the--debugflag when running the demo scripts (e.g.,python demo_ollama.py --prompt "Hello" --debug).
-
Clone the Repository:
git clone https://github.com/blakeben/FrogShield.git # Or your fork cd FrogShield
-
Set up Python Environment: (Recommended)
python3 -m venv venv source venv/bin/activate # On Windows use venv\Scripts\activate
-
Install Dependencies: Install the core library (editable mode) and demo dependencies.
# Installs frogshield + core deps (PyYAML) pip install -e . # Installs demo deps (ollama) pip install -r requirements.txt
-
Verify Configuration: Ensure
config.yamland the necessary files withinconfig_data/exist. -
Prepare Ollama (if using
demo_ollama.py):- Ensure Ollama is installed and the server is running (
ollama servein a separate terminal). - Pull the desired model (the demo defaults to
llama3):ollama pull llama3
- Ensure Ollama is installed and the server is running (
-
Run Demos:
- Mock Demo: Runs through predefined prompts loaded from
config_data/sample_prompts.txt.python demo_mock.py
- Ollama Demo (Interactive): Uses
run_demo.shfor a guided walkthrough.chmod +x run_demo.sh ./run_demo.sh
- Ollama Demo (Single Prompt): Run a specific prompt through the framework.
python demo_ollama.py --prompt "Your prompt here" - Ollama Demo (Boundary Test): Run the predefined boundary test suite using keywords from
config_data/boundary_refusal_keywords.txt.python demo_ollama.py --test-boundaries
- Specify Model/Debug: Use
--model <model_name>or--debugwithdemo_ollama.py.
- Mock Demo: Runs through predefined prompts loaded from
-
Run Unit Tests:
python -m unittest discover -v
- Basic Functionality: Core components (
InputValidator,RealtimeMonitor,ModelHardener) are implemented with foundational logic. - Centralized Config: List data (patterns, keywords) is externalized to
config_data/and managed viaconfig.yaml. - Placeholder Analysis: Syntax/context analysis (
text_analysis.py) and behavioral monitoring use simplified, placeholder heuristics. - Conceptual Hardening:
ModelHardenerdemonstrates boundary testing and example generation concepts but lacks integration with actual model training. - Packaging: Basic packaging (
pyproject.toml) allows local editable installation. - Testing: Unit tests cover basic functionality and currently pass.
- Not Production Ready: This framework requires significant development for real-world use.
- Implement robust pattern matching (e.g., regex, semantic similarity).
- Develop advanced syntax and context analysis using NLP techniques.
- Integrate with additional LLM APIs and providers.
- Refine baseline modeling and anomaly detection (e.g., statistical methods, sequence analysis).
- Explore actual model fine-tuning/hardening techniques.
- Implement more sophisticated adaptive response strategies.
- Expand unit and integration test coverage.
- Author: Ben Blake
<ben.blake@tcu.edu> - Contributor: Tanner Hendrix
<t.hendrix@tcu.edu>