VeriSimiser augments existing databases with capabilities from VeriSimDB's octad model — specifically the capabilities that work as genuine piggybacks without requiring you to replace your database.
Honest framing: this is not a pure bolt-on like the language -isers. Language -isers generate a separate wrapper alongside your code — one-way dependency, your code untouched. Database augmentation is fundamentally different because it interacts with shared mutable state. VeriSimiser is therefore split into two tiers:
-
Tier 1 (true piggyback) — capabilities that sit alongside or in front of your database, never touching its storage engine. Safe bolt-ons.
-
Tier 2 (augmentation layer) — capabilities that require additional storage alongside your database. Honest about being "VeriSimDB with your database as one backend" rather than pretending to be invisible.
Each entity in a VeriSimiser-augmented database is enriched along
eight concerns — properties you want to know or enforce about the
data, independent of how it is stored. The concerns octad is canonical
per docs/decisions/0004-octad-ontology.adoc (ADR-0004). The earlier
"Eight Modalities" framing (graph, vector, tensor, …) is reframed as a
set of Tier 2 overlay representations, not the top-level identity;
see Tier 2 below.
| Concern | Question it answers | Sidecar / storage | Code module | Tier |
|---|---|---|---|---|
Data |
The original rows in your database. |
your DB |
n/a |
Always on |
Metadata |
Schema, types, and runtime introspection. |
your DB |
|
Always on |
Provenance |
Who did what, when, and how? — SHA-256 hash-chained audit trail. |
|
|
Tier 1 ✓ |
Lineage |
Where did this come from? — DAG of data derivations. |
|
|
Tier 1 ✓ |
Constraints |
Does this still hold? — invariant enforcement across concerns. |
|
|
Tier 1 ✓ |
AccessControl |
Who is allowed to see / change this? — row/column policies. |
|
|
Tier 1 ✓ |
Temporal |
What did this look like at time T? — version history + point-in-time. |
|
|
Tier 1 ✓ |
Simulation |
What if we changed this? — sandbox branches that do not touch main data. |
|
|
Tier 2 |
The canonical Rust enumeration of these is abi::OctadDimension. The
CLI’s verisimiser octad subcommand prints this set with descriptions.
These work like PostGIS — they add capability without replacing anything. Tier 1 sits beside your primary database and either intercepts its event stream (write-path observers) or filters its query results (read-path observers).
SHA-256 hash-chain verified origin tracking for every piece of data.
-
Every write is intercepted and a provenance record created.
-
SHA-256 hash chain links records in order, with the chain hash computed over length-prefixed, canonically-encoded field bytes prefixed by the
verisim-prov-v1\0domain tag (collision-resistant, seesrc/abi/mod.rs::compute_hash). -
Query results can include provenance metadata.
-
Full audit trail: who created, when, from what source, what transformations.
This is a write-path observer — it records what happened, it doesn’t change what happened. The provenance chain is stored in a separate sidecar (SQLite, file, or VeriSimDB instance), never in your primary database.
See docs/theory/provenance-threat-model.adoc for the four-adversary
threat model and the protections each adversary class encounters.
Records the "where did this come from?" question as a directed acyclic graph of entity-to-entity edges. ADR-0005 fixes acyclicity as a binding invariant — every lineage write checks for cycles via recursive CTE before insert.
Automatic version history for entities, stored as
(entity_id, table_name, version, valid_from, valid_to, snapshot, operation).
-
Point-in-time queries: "what did this entity look like at time T?"
-
Diff queries: "what changed between T1 and T2?"
-
Rollback capability: restore any entity to a previous state.
-
Retention policies:
verisimiser gcauto-prunes history older than[retention.temporal-days]from the manifest (V-L2-P1).
Piggybacks onto write events (triggers, CDC, or application-level hooks) and stores version history in a separate sidecar.
Prefix-typed principals (user:, group:, role:) and an
AccessPredicate AST with deny-wins resolution. Policies are evaluated
against the same row that flows through the read-path. See ADR-0007 for
the model.
Drift detection is what the Constraints concern observes when the same entity is represented in more than one form and the forms disagree. Once a Tier 2 overlay (graph, vector, tensor, …) is enabled alongside the primary Data, the representations can fall out of sync — at which point the Constraints concern reports the symptom.
ADR-0003 fixes each drift category as a triple of (input, distance
function, default threshold). All categories produce a score in [0, 1]
so an aggregator can take a weighted sum and stay in range. The eight
categories:
| Category | What it compares | Default threshold |
|---|---|---|
Temporal |
|
|
Structural |
Schema fingerprint (sorted |
|
Semantic |
Free-text labels via sentence embeddings (cosine) |
|
Statistical |
Distributions over the same field (1-Wasserstein) |
|
Referential |
Edge sets (FK / cross-reference Jaccard) |
|
Provenance |
Tip hashes + longest common prefix of chains |
|
Spatial |
Coordinates via haversine, normalised |
|
Embedding |
Stored vector vs freshly recomputed vector (cosine) |
|
Run verisimiser drift --threshold 0.1 to scan every entity in
verisimdb_temporal_versions and report those whose score sits at or
above the threshold. The Temporal category ships today (V-L1-E2, #49);
the remaining seven follow under V-L1-E* using the same report shape.
This is a read-path augmentation — it observes query results, it doesn’t modify them. Safe to add, safe to remove, no data dependency.
Tier 2 capabilities require additional storage alongside your database. They are honest about being "VeriSimDB modalities with your database as the primary store." This is still valuable — it’s how you get octad capabilities incrementally — but it’s not a bolt-on.
Tier 2 modalities are overlay representations — alternative shapes
the same entity is projected into for a specific query workload. A user
enables vector because they want similarity search; they enable
spatial because they want geofencing. Enabling an overlay is
independent of which concerns are active.
-
Graph overlay — RDF triples and property graph edges. Stored in a separate graph index.
-
Vector overlay — embeddings for similarity search. Stored in an HNSW index alongside your database.
-
Tensor overlay — multi-dimensional numeric data. Stored in an ndarray-backed sidecar.
-
Semantic overlay — type annotations and proof blobs. Stored in a CBOR sidecar.
-
Document overlay — full-text search. Stored in a Tantivy sidecar.
-
Spatial overlay — geospatial coordinates. Stored in an R-tree sidecar.
Each Tier 2 overlay has its own storage and can be enabled
independently via [tier2] in the manifest. Your primary database
remains the source of truth for its native data. When an overlay
diverges from the primary Data, the Constraints concern reports the
symptom via the relevant drift category (see above).
[verisimiser]
name = "my-augmented-db"
[database]
target-db = "postgresql"
connection-string = "postgres://localhost/mydb"
# Tier 1: concerns layered onto your DB without altering its storage
[tier1]
provenance = true # SHA-256 hash-chain audit trail
lineage = true # acyclic derivation DAG (ADR-0005)
constraints = true # invariant enforcement / drift reports
access-control = true # row/column policies (ADR-0007)
temporal-versioning = true # automatic version history
drift-detection = true # cross-modal observer (Constraints symptom)
[tier1.provenance]
sidecar = "sqlite" # sqlite | file | verisim
sidecar-path = ".verisimiser/provenance.db"
[tier1.temporal]
sidecar = "sqlite"
[retention]
temporal-days = 90 # purged by `verisimiser gc`
# Tier 2: overlay representations (additional storage alongside your DB)
[tier2]
graph = false
vector = false
tensor = false
semantic = false
document = false
spatial = false
[tier2.vector]
# model = "sentence-transformers/all-MiniLM-L6-v2"
# dimensions = 384The verisimiser octad subcommand prints the active concerns from your
manifest; verisimiser doctor checks that sidecars, thresholds, and
retention bounds are configured consistently.
Your Application
│
├──── writes ────► Your Database (Data, Metadata)
│ │
│ VeriSimiser intercepts
│ │
┌───────────┴──────────────────────┼──────────────────────┐
│ Tier 1 sidecars (concerns) │ │
│ │ │
│ ┌──────────────┐ ┌────────────┐ │ ┌─────────────────┐│
│ │ provenance │ │ lineage │ │ │ temporal ││
│ │ log │ │ DAG │ │ │ versions ││
│ │ (Provenance) │ │ (Lineage) │ │ │ (Temporal) ││
│ └──────────────┘ └────────────┘ │ └─────────────────┘│
│ │ │
│ ┌────────────────┐ ┌──────────────────────────┐ │
│ │ access │ │ drift index / constraint │ │
│ │ policies │ │ check results │ │
│ │ (AccessControl)│ │ (Constraints) │ │
│ └────────────────┘ └──────────────────────────┘ │
└───────────────────────────────────┼──────────────────────┘
│
┌───── optional, per-overlay ───────┘
│ Tier 2 overlays (modalities)
│
│ ┌───────┐ ┌────────┐ ┌────────┐ ┌──────────┐ ┌────────┐ ┌────────┐
│ │ graph │ │ vector │ │ tensor │ │ semantic │ │ docs │ │ spatial│
│ │ index │ │ HNSW │ │ ndarry │ │ CBOR │ │ Tantivy│ │ R-tree │
│ └───────┘ └────────┘ └────────┘ └──────────┘ └────────┘ └────────┘
│
└───── simulation branches (Simulation concern; per ADR-0006)
snapshot-isolated, never touch main DataInterception methods (configurable per database):
-
PostgreSQL — logical replication /
pg_notify/ triggers. -
SQLite —
sqlite3_update_hook/ WAL monitoring. -
MongoDB — change streams.
-
Application-level — middleware / ORM hooks.
VeriSimiser is NOT a replacement for VeriSimDB. It is a gateway drug.
-
VeriSimiser Tier 1 gives you the six implementable concerns (Provenance, Lineage, Constraints, AccessControl, Temporal — plus Data and Metadata which are always-on) on your existing database. Zero commitment.
-
VeriSimiser Tier 2 gives you the modality overlays (graph, vector, tensor, semantic, document, spatial) and the Simulation concern as sidecars. Incremental adoption.
-
Full VeriSimDB gives you the complete octad with native cross-modal querying, VCL, and built-in drift normalisation. Full commitment.
The migration path is Tier 1 → Tier 2 → full VeriSimDB (if you want it). Most users will be happy at Tier 1 or Tier 2.
Per docs/decisions/0009-build-path.adoc, two build paths are
canonical:
-
cargo buildfor development. MSRV pinned at Rust 1.85 (rust-versioninCargo.toml). The Justfile wraps the common recipes. -
Containerfilefor ops. Produces a single OCI image suitable for deployment, CI, and reproducible release builds.
flake.nix, guix.scm, .guix-channel, and .devcontainer/ remain
in the tree as experimental paths — kept, not maintained.
Pre-built release binaries for linux-x86_64, linux-aarch64,
macos-arm64, and windows-x86_64 are published by the release
workflow (V-L3-L1, #58) with .sha256 companions.
VeriSimiser works alongside TypedQLiser:
-
TypedQLiser type-checks your queries (compile-time, no runtime cost).
-
VeriSimiser augments your database with concerns capabilities (runtime).
-
Together: formally verified queries against an augmented database.
Pre-alpha. Architecture defined, tier system designed, the eight concerns canonical per ADR-0004. Tier 1 is the priority implementation; the Temporal drift detector (V-L1-E2 / #49) is the first ADR-0003 category shipped end-to-end.
Part of the -iser family. #3 priority (after TypedQLiser and Chapeliser).