Advisory MCP gate that probes coding-agent tool-calls via a Llama-3.1-8B SAE inspector. Observation only, fail-closed escalate, sandbox required.
mcp llama ai-safety interpretability sparse-autoencoder mechanistic-interpretability coding-agent safety-gate goodfire advisory-gate
-
Updated
May 25, 2026 - Python