ai-interpretability

Here are 17 public repositories matching this topic...

LLM-Interp / CLT-Forge

A Mechanistic Interpretability Toolkit for Cross-Layer Transcoder Training and Attribution-Graph Visualization

transcoder visual-interface mechanistic-interpretability ai-interpretability attribution-graphs auto-interpretability cross-layer-transcoder transformer-circuits

Updated Apr 16, 2026
Python

EvezArt / evez-os

Star

Open-source AI cognition layer — circuit-level topology engine producing verifiable FIRE events, bus validation receipts, and falsifiable cognition records in real time. AGPL-3.0.

ai topology agpl cognition fire-events autonomous-agents visual-cognition neural-network-visualization decision-graph ai-interpretability ai-transparency evezi

Updated May 16, 2026
Python

AI Safety research platform for studying personality drift in AI systems using mechanistic interpretability and clinical assessment tools. Complete simulation framework with neural circuit analysis, statistical drift detection, and intervention protocols.

ai-safety neural-circuits mechanistic-interpretability ai-behavior ai-interpretability

Updated Aug 1, 2025
Jupyter Notebook

scouzi1966 / MLXLMProbe

Star

Universal probing and interpretability tool for MLX language models on Apple Silicon

mlx mixture-of-experts model-interpretability ablation-studies mechanistic-interpretability llm-inference llm-evaluation ai-interpretability apple-intelligence expert-routing gpt-oss-20b mlx-llms

Updated Jan 28, 2026
Python

rusparrish / Visual-Thinking-Lens

Star

Framework for evaluating and steering generative image systems using geometry-first metrics, structural stress testing, and constraint-based analysis. Designed to expose compositional collapse, spatial priors, and model failure modes without accessing training data or model internals.

machine-vision visual-reasoning image-quality-assessment artificial-vision generative-ai visual-intelligence ai-interpretability recursive-systems prompt-architecture ai-image-analysis ai-critique-framework image-recursion structural-critique ai-art-research

Updated May 3, 2026
Jupyter Notebook

skyline-GTRr32 / OKI-TRACE

Star

OKI TRACE: Local LLM observability. See step-by-step, layer-by-layer what your AI thinks. Logit Lens & Attention for HuggingFace models.

python open-source ai transformers developer-tools attention-mechanism blackbox huggingface ai-tools mechanistic-interpretability local-llm ai-interpretability llm-observability ai-transparency glass-box-ai llm-debugging logit-lens

Updated Feb 14, 2026
Python

Heimdall-Organization / wpe-tme-language

Star

WPE/TME: Text-native language for encoding semantic structure and temporal relationships. Geometric calculus with formal semantics. AI reasoning.

Updated Dec 2, 2025

PotatoInfinity / Versor

Star

Conformal Geometric Algebra (CGA) with efficient sequence modeling by introducing a recurrent rotor mechanism and a novel bit-masked hardware kernel that solves the computational bottleneck of Clifford products.

geometry mathematical-physics clifford-algebras geometric-deep-learning explainable-ai sequence-modeling scientific-machine-learning paradigm-shift efficient-deep-learning sub-quadratic-attention isotropic-architecture ai-interpretability physical-ai manifold-constrained-recurrence

Updated Feb 6, 2026
Python

kou-saki / i-asked-it-to-forget

Star

I Asked It to Forget, but It Didn't — A Case of Miscommunication Between AI and Humans

Updated Apr 17, 2025

karthik-kgm2003 / Brain-Transformer-Equivalence

Star

Do LLMs think like brains? We test GPT-2, BERT, Mistral, DeepSeek & Qwen+SAE against EEG data. Sparse features yield a 4.3× alignment jump. Working paper included.

transformers neuroscience dynamical-systems sparse-autoencoders eeg-analysis bert-model gpt2 neuroscience-inspired-ai neural-decoding mechanistic-interpretability qwen mistral-7b deepseek-llm machine-psychology ai-interpretability zuco representational-similarity-analysis

Updated May 2, 2026
Jupyter Notebook

Shreyansh15624 / marimo_comp

Star

An interactive, serverless WebAssembly dashboard demonstrating the statistical fragility of AI interpretability tools. Built for the alphaXiv Hackathon to simulate the multiple comparisons problem.

visualization data-science statistics plotly webassembly wasm research-paper marimo marimo-notebook ai-interpretability python-marimo wasm-export

Updated Apr 25, 2026
Python

harleone / Bernoulli-Inspired-Analysis-of-Neural-Information-Propagation

Star

A NeuroAI project using Bernoulli-inspired fluid-flow analogy to explore how information moves through neural networks. The signal strength in the NN is defined as the "pressure" from Bernoulli's equation, the speed of information propagation as the "flow speed of fluid" and, the activation level as the "opening and closing of valves".

python machine-learning deep-learning physics visualisation mnist neuralnetwork bernoulli neuroai ai-interpretability

Updated Nov 15, 2025

AlexTMjugador / redwoodresearch-interp-docker

Sponsor

Star

📦 Redwood Research's transformer interpretability tools, conveniently packaged in a Docker container for simple and reproducible deployments.

docker ai ai-safety redwood-research ai-interpretability

Updated Apr 21, 2024
Dockerfile

FrancyJGLisboa / whiteboard-suite

Star

Human Retention Layer for AI Work — hand-solvable math shadow models + session reasoning distillation. Cross-platform agent skills for Claude Code, Copilot, Cursor, Windsurf, Cline, Codex CLI, Gemini CLI, and 10+ more.

ai-interpretability agent-skill math-transparency

Updated Mar 8, 2026
PowerShell

bethediamond / ai-alignment-phase

Star

Toy 6. An interactive phase-space instrument mapping Ψ = S/D — the ratio of capability to modeling depth that determines whether a system is in the viable, transitional, or failure-mode-dominant regime. Includes the Inner Crossing animation. Companion simulation for The Inner Crossing — Series 2, Part 3.

Updated May 16, 2026
HTML

bethediamond / ai-alignment-landscape

Star

Toy 7. An elimination-filter landscape applying two structural constraints simultaneously to map which objective classes can persist under sustained optimization pressure — and which cannot. Includes a four-stage scenario engine and open-question frontier. Companion simulation for The Shape of What Does Not End — Series 2, Part 4.

Updated May 16, 2026
HTML

Arehman782 / wpe-tme-language

Star

🌐 Explore WPE and TME, text-native languages designed for structural and temporal reasoning, enhancing clarity in semantic calculus.

language notation specification temporal-logic multi-domain knowledge-representation field-theory explainable-ai phase-space neuro-symbolic symbolic-reasoning cognitive-architectures neuro-symbolic-ai semantic-framework ai-reasoning ai-interpretability geometric-calculus deterministic-ai

Updated May 17, 2026

Improve this page

Add a description, image, and links to the ai-interpretability topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ai-interpretability topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai-interpretability

Here are 17 public repositories matching this topic...

LLM-Interp / CLT-Forge

EvezArt / evez-os

drKeeman / glitch_core

scouzi1966 / MLXLMProbe

rusparrish / Visual-Thinking-Lens

skyline-GTRr32 / OKI-TRACE

Heimdall-Organization / wpe-tme-language

PotatoInfinity / Versor

kou-saki / i-asked-it-to-forget

karthik-kgm2003 / Brain-Transformer-Equivalence

Shreyansh15624 / marimo_comp

harleone / Bernoulli-Inspired-Analysis-of-Neural-Information-Propagation

AlexTMjugador / redwoodresearch-interp-docker

FrancyJGLisboa / whiteboard-suite

bethediamond / ai-alignment-phase

bethediamond / ai-alignment-landscape

Arehman782 / wpe-tme-language

Improve this page

Add this topic to your repo