VeriSimiser

Table of Contents

What Is This?
Octad: Eight Concerns
Tier 1: True Piggybacks
Tier 2: Augmentation Layer
The Manifest
Architecture
Relationship to VeriSimDB
Building and Running
Integration with TypedQLiser
Status
License

What Is This?

VeriSimiser augments existing databases with capabilities from VeriSimDB's octad model — specifically the capabilities that work as genuine piggybacks without requiring you to replace your database.

Honest framing: this is not a pure bolt-on like the language -isers. Language -isers generate a separate wrapper alongside your code — one-way dependency, your code untouched. Database augmentation is fundamentally different because it interacts with shared mutable state. VeriSimiser is therefore split into two tiers:

Tier 1 (true piggyback) — capabilities that sit alongside or in front of your database, never touching its storage engine. Safe bolt-ons.
Tier 2 (augmentation layer) — capabilities that require additional storage alongside your database. Honest about being "VeriSimDB with your database as one backend" rather than pretending to be invisible.

Octad: Eight Concerns

Each entity in a VeriSimiser-augmented database is enriched along eight concerns — properties you want to know or enforce about the data, independent of how it is stored. The concerns octad is canonical per docs/decisions/0004-octad-ontology.adoc (ADR-0004). The earlier "Eight Modalities" framing (graph, vector, tensor, …) is reframed as a set of Tier 2 overlay representations, not the top-level identity; see Tier 2 below.

Concern	Question it answers	Sidecar / storage	Code module	Tier
Data	The original rows in your database.	your DB	n/a	Always on
Metadata	Schema, types, and runtime introspection.	your DB	`manifest::`	Always on
Provenance	Who did what, when, and how? — SHA-256 hash-chained audit trail.	`verisimdb_provenance_log` (SQLite sidecar)	`abi::`, `tier1::provenance`	Tier 1 ✓
Lineage	Where did this come from? — DAG of data derivations.	`verisimdb_lineage_edges` (acyclic, per ADR-0005)	`tier1::lineage`	Tier 1 ✓
Constraints	Does this still hold? — invariant enforcement across concerns.	`verisimdb_constraints` + check rules	`tier1::drift` (observed symptoms)	Tier 1 ✓
AccessControl	Who is allowed to see / change this? — row/column policies.	`verisimdb_access_policies` (per ADR-0007)	`tier1::access`	Tier 1 ✓
Temporal	What did this look like at time T? — version history + point-in-time.	`verisimdb_temporal_versions`	`tier1::temporal`	Tier 1 ✓
Simulation	What if we changed this? — sandbox branches that do not touch main data.	`verisimdb_simulation_branches` (snapshot isolation, per ADR-0006)	`tier2::simulation`	Tier 2

The canonical Rust enumeration of these is abi::OctadDimension. The CLI’s verisimiser octad subcommand prints this set with descriptions.

Tier 1: True Piggybacks

These work like PostGIS — they add capability without replacing anything. Tier 1 sits beside your primary database and either intercepts its event stream (write-path observers) or filters its query results (read-path observers).

Provenance (write-path observer)

SHA-256 hash-chain verified origin tracking for every piece of data.

Every write is intercepted and a provenance record created.
SHA-256 hash chain links records in order, with the chain hash computed over length-prefixed, canonically-encoded field bytes prefixed by the verisim-prov-v1\0 domain tag (collision-resistant, see src/abi/mod.rs::compute_hash).
Query results can include provenance metadata.
Full audit trail: who created, when, from what source, what transformations.

This is a write-path observer — it records what happened, it doesn’t change what happened. The provenance chain is stored in a separate sidecar (SQLite, file, or VeriSimDB instance), never in your primary database.

See docs/theory/provenance-threat-model.adoc for the four-adversary threat model and the protections each adversary class encounters.

Lineage (DAG of derivations)

Records the "where did this come from?" question as a directed acyclic graph of entity-to-entity edges. ADR-0005 fixes acyclicity as a binding invariant — every lineage write checks for cycles via recursive CTE before insert.

Temporal versioning (point-in-time queries)

Automatic version history for entities, stored as (entity_id, table_name, version, valid_from, valid_to, snapshot, operation).

Point-in-time queries: "what did this entity look like at time T?"
Diff queries: "what changed between T1 and T2?"
Rollback capability: restore any entity to a previous state.
Retention policies: verisimiser gc auto-prunes history older than [retention.temporal-days] from the manifest (V-L2-P1).

Piggybacks onto write events (triggers, CDC, or application-level hooks) and stores version history in a separate sidecar.

AccessControl (row / column policies)

Prefix-typed principals (user:, group:, role:) and an AccessPredicate AST with deny-wins resolution. Policies are evaluated against the same row that flows through the read-path. See ADR-0007 for the model.

Drift detection: symptoms of Constraints violation across modalities

Drift detection is what the Constraints concern observes when the same entity is represented in more than one form and the forms disagree. Once a Tier 2 overlay (graph, vector, tensor, …) is enabled alongside the primary Data, the representations can fall out of sync — at which point the Constraints concern reports the symptom.

ADR-0003 fixes each drift category as a triple of (input, distance function, default threshold). All categories produce a score in [0, 1] so an aggregator can take a weighted sum and stay in range. The eight categories:

Category	What it compares	Default threshold
Temporal	`MAX(version)` per modality for the same entity	`0.10`
Structural	Schema fingerprint (sorted `(field, type)` pairs)	`0.05`
Semantic	Free-text labels via sentence embeddings (cosine)	`0.20`
Statistical	Distributions over the same field (1-Wasserstein)	`0.15`
Referential	Edge sets (FK / cross-reference Jaccard)	`0.10`
Provenance	Tip hashes + longest common prefix of chains	`0.05`
Spatial	Coordinates via haversine, normalised	`0.0001`
Embedding	Stored vector vs freshly recomputed vector (cosine)	`0.10`

Run verisimiser drift --threshold 0.1 to scan every entity in verisimdb_temporal_versions and report those whose score sits at or above the threshold. The Temporal category ships today (V-L1-E2, #49); the remaining seven follow under V-L1-E* using the same report shape.

This is a read-path augmentation — it observes query results, it doesn’t modify them. Safe to add, safe to remove, no data dependency.

Tier 2: Augmentation Layer

Tier 2 capabilities require additional storage alongside your database. They are honest about being "VeriSimDB modalities with your database as the primary store." This is still valuable — it’s how you get octad capabilities incrementally — but it’s not a bolt-on.

Tier 2 modalities are overlay representations — alternative shapes the same entity is projected into for a specific query workload. A user enables vector because they want similarity search; they enable spatial because they want geofencing. Enabling an overlay is independent of which concerns are active.

Graph overlay — RDF triples and property graph edges. Stored in a separate graph index.
Vector overlay — embeddings for similarity search. Stored in an HNSW index alongside your database.
Tensor overlay — multi-dimensional numeric data. Stored in an ndarray-backed sidecar.
Semantic overlay — type annotations and proof blobs. Stored in a CBOR sidecar.
Document overlay — full-text search. Stored in a Tantivy sidecar.
Spatial overlay — geospatial coordinates. Stored in an R-tree sidecar.

Each Tier 2 overlay has its own storage and can be enabled independently via [tier2] in the manifest. Your primary database remains the source of truth for its native data. When an overlay diverges from the primary Data, the Constraints concern reports the symptom via the relevant drift category (see above).

The Manifest

[verisimiser]
name = "my-augmented-db"

[database]
target-db = "postgresql"
connection-string = "postgres://localhost/mydb"

# Tier 1: concerns layered onto your DB without altering its storage
[tier1]
provenance = true             # SHA-256 hash-chain audit trail
lineage = true                # acyclic derivation DAG (ADR-0005)
constraints = true            # invariant enforcement / drift reports
access-control = true         # row/column policies (ADR-0007)
temporal-versioning = true    # automatic version history
drift-detection = true        # cross-modal observer (Constraints symptom)

[tier1.provenance]
sidecar = "sqlite"            # sqlite | file | verisim
sidecar-path = ".verisimiser/provenance.db"

[tier1.temporal]
sidecar = "sqlite"

[retention]
temporal-days = 90            # purged by `verisimiser gc`

# Tier 2: overlay representations (additional storage alongside your DB)
[tier2]
graph = false
vector = false
tensor = false
semantic = false
document = false
spatial = false

[tier2.vector]
# model = "sentence-transformers/all-MiniLM-L6-v2"
# dimensions = 384

The verisimiser octad subcommand prints the active concerns from your manifest; verisimiser doctor checks that sidecars, thresholds, and retention bounds are configured consistently.

Architecture

                    Your Application
                          │
                          ├──── writes ────► Your Database (Data, Metadata)
                          │                       │
                          │                  VeriSimiser intercepts
                          │                       │
              ┌───────────┴──────────────────────┼──────────────────────┐
              │ Tier 1 sidecars (concerns)       │                      │
              │                                   │                      │
              │  ┌──────────────┐  ┌────────────┐ │  ┌─────────────────┐│
              │  │ provenance   │  │ lineage    │ │  │ temporal         ││
              │  │ log          │  │ DAG        │ │  │ versions         ││
              │  │ (Provenance) │  │ (Lineage)  │ │  │ (Temporal)       ││
              │  └──────────────┘  └────────────┘ │  └─────────────────┘│
              │                                   │                      │
              │  ┌────────────────┐  ┌──────────────────────────┐       │
              │  │ access         │  │ drift index / constraint │       │
              │  │ policies       │  │ check results            │       │
              │  │ (AccessControl)│  │ (Constraints)            │       │
              │  └────────────────┘  └──────────────────────────┘       │
              └───────────────────────────────────┼──────────────────────┘
                                                  │
              ┌───── optional, per-overlay ───────┘
              │ Tier 2 overlays (modalities)
              │
              │  ┌───────┐ ┌────────┐ ┌────────┐ ┌──────────┐ ┌────────┐ ┌────────┐
              │  │ graph │ │ vector │ │ tensor │ │ semantic │ │ docs   │ │ spatial│
              │  │ index │ │ HNSW   │ │ ndarry │ │ CBOR     │ │ Tantivy│ │ R-tree │
              │  └───────┘ └────────┘ └────────┘ └──────────┘ └────────┘ └────────┘
              │
              └───── simulation branches (Simulation concern; per ADR-0006)
                     snapshot-isolated, never touch main Data

Interception methods (configurable per database):

PostgreSQL — logical replication / pg_notify / triggers.
SQLite — sqlite3_update_hook / WAL monitoring.
MongoDB — change streams.
Application-level — middleware / ORM hooks.

Relationship to VeriSimDB

VeriSimiser is NOT a replacement for VeriSimDB. It is a gateway drug.

VeriSimiser Tier 1 gives you the six implementable concerns (Provenance, Lineage, Constraints, AccessControl, Temporal — plus Data and Metadata which are always-on) on your existing database. Zero commitment.
VeriSimiser Tier 2 gives you the modality overlays (graph, vector, tensor, semantic, document, spatial) and the Simulation concern as sidecars. Incremental adoption.
Full VeriSimDB gives you the complete octad with native cross-modal querying, VCL, and built-in drift normalisation. Full commitment.

The migration path is Tier 1 → Tier 2 → full VeriSimDB (if you want it). Most users will be happy at Tier 1 or Tier 2.

Building and Running

Per docs/decisions/0009-build-path.adoc, two build paths are canonical:

cargo build for development. MSRV pinned at Rust 1.85 (rust-version in Cargo.toml). The Justfile wraps the common recipes.
Containerfile for ops. Produces a single OCI image suitable for deployment, CI, and reproducible release builds.

flake.nix, guix.scm, .guix-channel, and .devcontainer/ remain in the tree as experimental paths — kept, not maintained.

Pre-built release binaries for linux-x86_64, linux-aarch64, macos-arm64, and windows-x86_64 are published by the release workflow (V-L3-L1, #58) with .sha256 companions.

Integration with TypedQLiser

VeriSimiser works alongside TypedQLiser:

TypedQLiser type-checks your queries (compile-time, no runtime cost).
VeriSimiser augments your database with concerns capabilities (runtime).
Together: formally verified queries against an augmented database.

Status

Pre-alpha. Architecture defined, tier system designed, the eight concerns canonical per ADR-0004. Tier 1 is the priority implementation; the Temporal drift detector (V-L1-E2 / #49) is the first ADR-0003 category shipped end-to-end.

Part of the -iser family. #3 priority (after TypedQLiser and Chapeliser).

License

SPDX-License-Identifier: PMPL-1.0-or-later

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
.claude		.claude
.devcontainer		.devcontainer
.github		.github
.machine_readable		.machine_readable
.well-known		.well-known
container		container
contractiles		contractiles
docs		docs
examples		examples
features		features
src		src
tests		tests
.editorconfig		.editorconfig
.envrc		.envrc
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.guix-channel		.guix-channel
.tool-versions		.tool-versions
0-AI-MANIFEST.a2ml		0-AI-MANIFEST.a2ml
CHANGELOG.adoc		CHANGELOG.adoc
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING-DEV.adoc		CONTRIBUTING-DEV.adoc
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Containerfile		Containerfile
EXPLAINME.adoc		EXPLAINME.adoc
Justfile		Justfile
LICENSE		LICENSE
QUICKSTART-DEV.adoc		QUICKSTART-DEV.adoc
QUICKSTART-MAINTAINER.adoc		QUICKSTART-MAINTAINER.adoc
QUICKSTART-USER.adoc		QUICKSTART-USER.adoc
README.adoc		README.adoc
ROADMAP.adoc		ROADMAP.adoc
TEST-NEEDS.md		TEST-NEEDS.md
TOPOLOGY.md		TOPOLOGY.md
build.rs		build.rs
contractile.just		contractile.just
eclexiaiser.toml		eclexiaiser.toml
flake.nix		flake.nix
guix.scm		guix.scm
k9iser.toml		k9iser.toml
llm-warmup-dev.md		llm-warmup-dev.md
llm-warmup-user.md		llm-warmup-user.md
selur-compose.toml		selur-compose.toml
setup.sh		setup.sh
stapeln.toml		stapeln.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VeriSimiser

What Is This?

Octad: Eight Concerns

Tier 1: True Piggybacks

Provenance (write-path observer)

Lineage (DAG of derivations)

Temporal versioning (point-in-time queries)

AccessControl (row / column policies)

Drift detection: symptoms of Constraints violation across modalities

Tier 2: Augmentation Layer

The Manifest

Architecture

Relationship to VeriSimDB

Building and Running

Integration with TypedQLiser

Status

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VeriSimiser

What Is This?

Octad: Eight Concerns

Tier 1: True Piggybacks

Provenance (write-path observer)

Lineage (DAG of derivations)

Temporal versioning (point-in-time queries)

AccessControl (row / column policies)

Drift detection: symptoms of Constraints violation across modalities

Tier 2: Augmentation Layer

The Manifest

Architecture

Relationship to VeriSimDB

Building and Running

Integration with TypedQLiser

Status

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages