Open Source Reliability Harness: Make your agents follow rules. One line of code to enforce, trace, and improve.
-
Updated
May 23, 2026 - Python
Open Source Reliability Harness: Make your agents follow rules. One line of code to enforce, trace, and improve.
Review and moderation, your way. Online safety dashboard, queues, routing and automatic enforcement rules, and integrations.
A JavaScript-based content safety system designed to detect and filter sensitive media in real-time, ensuring platform compliance and user protection.
🛡️ Programmable Guardrails for LLM Applications in Java. A framework-agnostic toolkit for input/output validation, PII masking, and jailbreak detection. The Java alternative to NVIDIA NeMo Guardrails.
An intelligent task management assistant built with .NET, Next.js, Microsoft Agent Framework, AG-UI protocol, and Azure OpenAI, demonstrating Clean Architecture and autonomous AI agent capabilities
Step-by-Step tutorial that teaches you how to use Azure Safety Content - the prebuilt AI service that helps ensure that content sent to user is filtered to safeguard them from risky or undesirable outcomes
🔍 Benchmark jailbreak resilience in LLMs with JailBench for clear insights and improved model defenses against jailbreak attempts.
│ Real-time NSFW & harmful content detection as a service
Transform uncertainty into absolute confidence.
Benchmark LLM jailbreak resilience across providers with standardized tests, adversarial mode, rich analytics, and a clean Web UI.
Arabic Content Moderator — scan text for toxicity, hate speech, spam. Dialect-aware. Fully offline.
Open skill system for humane AI — 9 reusable specs + MCP runtime
AI application firewall for LLM-powered apps — multi-layered detection (heuristic, ML classifier, semantic, LLM-judge) against prompt injection, jailbreaks, and data leakage - inferwall.com
Technical presentations with hands-on demos
轻卫:基于 Agent 的中文内容安全检测系统
Production-Grade LLM Alignment Engine (TruthProbe + ADT)
A Chrome extension that uses Claude AI to protect users under 18 from inappropriate content by analyzing webpage content in real-time.
Free AI safety stack + frontier adversarial red teaming. Policy engine, content scanner, behavioral monitor, MCP gateway. 350+ vulnerabilities found across NVIDIA, Microsoft, Meta, Google. MIT licensed.
Pre-Publish Security Gate - Scan and redact sensitive information before sharing
Content moderation (text and image) in a social network demo
Add a description, image, and links to the content-safety topic page so that developers can more easily learn about it.
To associate your repository with the content-safety topic, visit your repo's landing page and select "manage topics."