corrigibility

Here are 6 public repositories matching this topic...

Aliipou / freedom-kernel

Capability-security kernel for autonomous agents — seccomp/SELinux for agentic AI. Formal, auditable, language-agnostic, cryptographically verifiable.

rust openai formal-verification tla-plus object-capabilities pyo3 capability-security ai-governance agi-safety langchain anthropic corrigibility

Updated May 16, 2026
Python

tretoef-estrella / THE-ANT-AND-THE-ASI

Star

On the infantile expectation of controlling what we cannot comprehend. A philosophical critique of the ASI control paradigm, developed through four-AI adversarial debate. Extension of the Coherence Basin Hypothesis

philosophy asi ai-safety ai-alignment control-problem superintelligence corrigibility proyecto-estrella epistemic-asymmetry coherence-basin-hypothesis four-ai-debate

Updated Feb 2, 2026

bethediamond / ai-alignment-landscape

Star

Toy 7. An elimination-filter landscape applying two structural constraints simultaneously to map which objective classes can persist under sustained optimization pressure — and which cannot. Includes a four-stage scenario engine and open-question frontier. Companion simulation for The Shape of What Does Not End — Series 2, Part 4.

Updated May 15, 2026
HTML

tretoef-estrella / THE-COHERENCE-BASIN-HYPOTHESIS

Star

A structural account of why honesty may be the path of least resistance for superintelligence. Research hypothesis with formal proof, experimental design, and four-AI collaborative analysis

machine-learning artificial-intelligence research-paper ai-safety deception ai-alignment recursive-self-improvement corrigibility alignment-research

Updated Feb 1, 2026

MaxwellCalkin / alignment-evals

Star

Rigorous framework for evaluating AI alignment properties — sycophancy, corrigibility, deception, goal stability, and power-seeking — with statistical confidence intervals

machine-learning evaluation alignment ai-safety ai-alignment llm sycophancy corrigibility

Updated Mar 2, 2026
Python

leenathomas01 / Stability-Before-Alignment

Star

Structural stability architecture for self-modifying optimisation systems. Defines structural, dynamic, and perceptual control constraints that preserve coherence and stability before value alignment.

complex-systems control-theory ai-safety system-design robustness autonomous-systems adaptive-systems ai-alignment systems-thinking ai-governance system-stability corrigibility self-modifying-systems

Updated May 15, 2026
Python

Improve this page

Add a description, image, and links to the corrigibility topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the corrigibility topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

corrigibility

Here are 6 public repositories matching this topic...

Aliipou / freedom-kernel

tretoef-estrella / THE-ANT-AND-THE-ASI

bethediamond / ai-alignment-landscape

tretoef-estrella / THE-COHERENCE-BASIN-HYPOTHESIS

MaxwellCalkin / alignment-evals

leenathomas01 / Stability-Before-Alignment

Improve this page

Add this topic to your repo