A Mechanistic Interpretability Toolkit for Cross-Layer Transcoder Training and Attribution-Graph Visualization
-
Updated
Apr 16, 2026 - Python
A Mechanistic Interpretability Toolkit for Cross-Layer Transcoder Training and Attribution-Graph Visualization
Open-source AI cognition layer — circuit-level topology engine producing verifiable FIRE events, bus validation receipts, and falsifiable cognition records in real time. AGPL-3.0.
AI Safety research platform for studying personality drift in AI systems using mechanistic interpretability and clinical assessment tools. Complete simulation framework with neural circuit analysis, statistical drift detection, and intervention protocols.
Universal probing and interpretability tool for MLX language models on Apple Silicon
Framework for evaluating and steering generative image systems using geometry-first metrics, structural stress testing, and constraint-based analysis. Designed to expose compositional collapse, spatial priors, and model failure modes without accessing training data or model internals.
OKI TRACE: Local LLM observability. See step-by-step, layer-by-layer what your AI thinks. Logit Lens & Attention for HuggingFace models.
WPE/TME: Text-native language for encoding semantic structure and temporal relationships. Geometric calculus with formal semantics. AI reasoning.
Conformal Geometric Algebra (CGA) with efficient sequence modeling by introducing a recurrent rotor mechanism and a novel bit-masked hardware kernel that solves the computational bottleneck of Clifford products.
I Asked It to Forget, but It Didn't — A Case of Miscommunication Between AI and Humans
Do LLMs think like brains? We test GPT-2, BERT, Mistral, DeepSeek & Qwen+SAE against EEG data. Sparse features yield a 4.3× alignment jump. Working paper included.
An interactive, serverless WebAssembly dashboard demonstrating the statistical fragility of AI interpretability tools. Built for the alphaXiv Hackathon to simulate the multiple comparisons problem.
A NeuroAI project using Bernoulli-inspired fluid-flow analogy to explore how information moves through neural networks. The signal strength in the NN is defined as the "pressure" from Bernoulli's equation, the speed of information propagation as the "flow speed of fluid" and, the activation level as the "opening and closing of valves".
📦 Redwood Research's transformer interpretability tools, conveniently packaged in a Docker container for simple and reproducible deployments.
Human Retention Layer for AI Work — hand-solvable math shadow models + session reasoning distillation. Cross-platform agent skills for Claude Code, Copilot, Cursor, Windsurf, Cline, Codex CLI, Gemini CLI, and 10+ more.
Toy 6. An interactive phase-space instrument mapping Ψ = S/D — the ratio of capability to modeling depth that determines whether a system is in the viable, transitional, or failure-mode-dominant regime. Includes the Inner Crossing animation. Companion simulation for The Inner Crossing — Series 2, Part 3.
Toy 7. An elimination-filter landscape applying two structural constraints simultaneously to map which objective classes can persist under sustained optimization pressure — and which cannot. Includes a four-stage scenario engine and open-question frontier. Companion simulation for The Shape of What Does Not End — Series 2, Part 4.
🌐 Explore WPE and TME, text-native languages designed for structural and temporal reasoning, enhancing clarity in semantic calculus.
Add a description, image, and links to the ai-interpretability topic page so that developers can more easily learn about it.
To associate your repository with the ai-interpretability topic, visit your repo's landing page and select "manage topics."