Skip to content

hyperpolymath/verisimiser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

VeriSimiser

What Is This?

VeriSimiser augments existing databases with capabilities from VeriSimDB's octad model — specifically the capabilities that work as genuine piggybacks without requiring you to replace your database.

Honest framing: this is not a pure bolt-on like the language -isers. Language -isers generate a separate wrapper alongside your code — one-way dependency, your code untouched. Database augmentation is fundamentally different because it interacts with shared mutable state. VeriSimiser is therefore split into two tiers:

  • Tier 1 (true piggyback) — capabilities that sit alongside or in front of your database, never touching its storage engine. Safe bolt-ons.

  • Tier 2 (augmentation layer) — capabilities that require additional storage alongside your database. Honest about being "VeriSimDB with your database as one backend" rather than pretending to be invisible.

Octad: Eight Concerns

Each entity in a VeriSimiser-augmented database is enriched along eight concerns — properties you want to know or enforce about the data, independent of how it is stored. The concerns octad is canonical per docs/decisions/0004-octad-ontology.adoc (ADR-0004). The earlier "Eight Modalities" framing (graph, vector, tensor, …) is reframed as a set of Tier 2 overlay representations, not the top-level identity; see Tier 2 below.

Concern Question it answers Sidecar / storage Code module Tier

Data

The original rows in your database.

your DB

n/a

Always on

Metadata

Schema, types, and runtime introspection.

your DB

manifest::

Always on

Provenance

Who did what, when, and how? — SHA-256 hash-chained audit trail.

verisimdb_provenance_log (SQLite sidecar)

abi::, tier1::provenance

Tier 1 ✓

Lineage

Where did this come from? — DAG of data derivations.

verisimdb_lineage_edges (acyclic, per ADR-0005)

tier1::lineage

Tier 1 ✓

Constraints

Does this still hold? — invariant enforcement across concerns.

verisimdb_constraints + check rules

tier1::drift (observed symptoms)

Tier 1 ✓

AccessControl

Who is allowed to see / change this? — row/column policies.

verisimdb_access_policies (per ADR-0007)

tier1::access

Tier 1 ✓

Temporal

What did this look like at time T? — version history + point-in-time.

verisimdb_temporal_versions

tier1::temporal

Tier 1 ✓

Simulation

What if we changed this? — sandbox branches that do not touch main data.

verisimdb_simulation_branches (snapshot isolation, per ADR-0006)

tier2::simulation

Tier 2

The canonical Rust enumeration of these is abi::OctadDimension. The CLI’s verisimiser octad subcommand prints this set with descriptions.

Tier 1: True Piggybacks

These work like PostGIS — they add capability without replacing anything. Tier 1 sits beside your primary database and either intercepts its event stream (write-path observers) or filters its query results (read-path observers).

Provenance (write-path observer)

SHA-256 hash-chain verified origin tracking for every piece of data.

  • Every write is intercepted and a provenance record created.

  • SHA-256 hash chain links records in order, with the chain hash computed over length-prefixed, canonically-encoded field bytes prefixed by the verisim-prov-v1\0 domain tag (collision-resistant, see src/abi/mod.rs::compute_hash).

  • Query results can include provenance metadata.

  • Full audit trail: who created, when, from what source, what transformations.

This is a write-path observer — it records what happened, it doesn’t change what happened. The provenance chain is stored in a separate sidecar (SQLite, file, or VeriSimDB instance), never in your primary database.

See docs/theory/provenance-threat-model.adoc for the four-adversary threat model and the protections each adversary class encounters.

Lineage (DAG of derivations)

Records the "where did this come from?" question as a directed acyclic graph of entity-to-entity edges. ADR-0005 fixes acyclicity as a binding invariant — every lineage write checks for cycles via recursive CTE before insert.

Temporal versioning (point-in-time queries)

Automatic version history for entities, stored as (entity_id, table_name, version, valid_from, valid_to, snapshot, operation).

  • Point-in-time queries: "what did this entity look like at time T?"

  • Diff queries: "what changed between T1 and T2?"

  • Rollback capability: restore any entity to a previous state.

  • Retention policies: verisimiser gc auto-prunes history older than [retention.temporal-days] from the manifest (V-L2-P1).

Piggybacks onto write events (triggers, CDC, or application-level hooks) and stores version history in a separate sidecar.

AccessControl (row / column policies)

Prefix-typed principals (user:, group:, role:) and an AccessPredicate AST with deny-wins resolution. Policies are evaluated against the same row that flows through the read-path. See ADR-0007 for the model.

Drift detection: symptoms of Constraints violation across modalities

Drift detection is what the Constraints concern observes when the same entity is represented in more than one form and the forms disagree. Once a Tier 2 overlay (graph, vector, tensor, …) is enabled alongside the primary Data, the representations can fall out of sync — at which point the Constraints concern reports the symptom.

ADR-0003 fixes each drift category as a triple of (input, distance function, default threshold). All categories produce a score in [0, 1] so an aggregator can take a weighted sum and stay in range. The eight categories:

Category What it compares Default threshold

Temporal

MAX(version) per modality for the same entity

0.10

Structural

Schema fingerprint (sorted (field, type) pairs)

0.05

Semantic

Free-text labels via sentence embeddings (cosine)

0.20

Statistical

Distributions over the same field (1-Wasserstein)

0.15

Referential

Edge sets (FK / cross-reference Jaccard)

0.10

Provenance

Tip hashes + longest common prefix of chains

0.05

Spatial

Coordinates via haversine, normalised

0.0001

Embedding

Stored vector vs freshly recomputed vector (cosine)

0.10

Run verisimiser drift --threshold 0.1 to scan every entity in verisimdb_temporal_versions and report those whose score sits at or above the threshold. The Temporal category ships today (V-L1-E2, #49); the remaining seven follow under V-L1-E* using the same report shape.

This is a read-path augmentation — it observes query results, it doesn’t modify them. Safe to add, safe to remove, no data dependency.

Tier 2: Augmentation Layer

Tier 2 capabilities require additional storage alongside your database. They are honest about being "VeriSimDB modalities with your database as the primary store." This is still valuable — it’s how you get octad capabilities incrementally — but it’s not a bolt-on.

Tier 2 modalities are overlay representations — alternative shapes the same entity is projected into for a specific query workload. A user enables vector because they want similarity search; they enable spatial because they want geofencing. Enabling an overlay is independent of which concerns are active.

  • Graph overlay — RDF triples and property graph edges. Stored in a separate graph index.

  • Vector overlay — embeddings for similarity search. Stored in an HNSW index alongside your database.

  • Tensor overlay — multi-dimensional numeric data. Stored in an ndarray-backed sidecar.

  • Semantic overlay — type annotations and proof blobs. Stored in a CBOR sidecar.

  • Document overlay — full-text search. Stored in a Tantivy sidecar.

  • Spatial overlay — geospatial coordinates. Stored in an R-tree sidecar.

Each Tier 2 overlay has its own storage and can be enabled independently via [tier2] in the manifest. Your primary database remains the source of truth for its native data. When an overlay diverges from the primary Data, the Constraints concern reports the symptom via the relevant drift category (see above).

The Manifest

[verisimiser]
name = "my-augmented-db"

[database]
target-db = "postgresql"
connection-string = "postgres://localhost/mydb"

# Tier 1: concerns layered onto your DB without altering its storage
[tier1]
provenance = true             # SHA-256 hash-chain audit trail
lineage = true                # acyclic derivation DAG (ADR-0005)
constraints = true            # invariant enforcement / drift reports
access-control = true         # row/column policies (ADR-0007)
temporal-versioning = true    # automatic version history
drift-detection = true        # cross-modal observer (Constraints symptom)

[tier1.provenance]
sidecar = "sqlite"            # sqlite | file | verisim
sidecar-path = ".verisimiser/provenance.db"

[tier1.temporal]
sidecar = "sqlite"

[retention]
temporal-days = 90            # purged by `verisimiser gc`

# Tier 2: overlay representations (additional storage alongside your DB)
[tier2]
graph = false
vector = false
tensor = false
semantic = false
document = false
spatial = false

[tier2.vector]
# model = "sentence-transformers/all-MiniLM-L6-v2"
# dimensions = 384

The verisimiser octad subcommand prints the active concerns from your manifest; verisimiser doctor checks that sidecars, thresholds, and retention bounds are configured consistently.

Architecture

                    Your Application
                          │
                          ├──── writes ────► Your Database (Data, Metadata)
                          │                       │
                          │                  VeriSimiser intercepts
                          │                       │
              ┌───────────┴──────────────────────┼──────────────────────┐
              │ Tier 1 sidecars (concerns)       │                      │
              │                                   │                      │
              │  ┌──────────────┐  ┌────────────┐ │  ┌─────────────────┐│
              │  │ provenance   │  │ lineage    │ │  │ temporal         ││
              │  │ log          │  │ DAG        │ │  │ versions         ││
              │  │ (Provenance) │  │ (Lineage)  │ │  │ (Temporal)       ││
              │  └──────────────┘  └────────────┘ │  └─────────────────┘│
              │                                   │                      │
              │  ┌────────────────┐  ┌──────────────────────────┐       │
              │  │ access         │  │ drift index / constraint │       │
              │  │ policies       │  │ check results            │       │
              │  │ (AccessControl)│  │ (Constraints)            │       │
              │  └────────────────┘  └──────────────────────────┘       │
              └───────────────────────────────────┼──────────────────────┘
                                                  │
              ┌───── optional, per-overlay ───────┘
              │ Tier 2 overlays (modalities)
              │
              │  ┌───────┐ ┌────────┐ ┌────────┐ ┌──────────┐ ┌────────┐ ┌────────┐
              │  │ graph │ │ vector │ │ tensor │ │ semantic │ │ docs   │ │ spatial│
              │  │ index │ │ HNSW   │ │ ndarry │ │ CBOR     │ │ Tantivy│ │ R-tree │
              │  └───────┘ └────────┘ └────────┘ └──────────┘ └────────┘ └────────┘
              │
              └───── simulation branches (Simulation concern; per ADR-0006)
                     snapshot-isolated, never touch main Data

Interception methods (configurable per database):

  • PostgreSQL — logical replication / pg_notify / triggers.

  • SQLitesqlite3_update_hook / WAL monitoring.

  • MongoDB — change streams.

  • Application-level — middleware / ORM hooks.

Relationship to VeriSimDB

VeriSimiser is NOT a replacement for VeriSimDB. It is a gateway drug.

  • VeriSimiser Tier 1 gives you the six implementable concerns (Provenance, Lineage, Constraints, AccessControl, Temporal — plus Data and Metadata which are always-on) on your existing database. Zero commitment.

  • VeriSimiser Tier 2 gives you the modality overlays (graph, vector, tensor, semantic, document, spatial) and the Simulation concern as sidecars. Incremental adoption.

  • Full VeriSimDB gives you the complete octad with native cross-modal querying, VCL, and built-in drift normalisation. Full commitment.

The migration path is Tier 1 → Tier 2 → full VeriSimDB (if you want it). Most users will be happy at Tier 1 or Tier 2.

Building and Running

Per docs/decisions/0009-build-path.adoc, two build paths are canonical:

  • cargo build for development. MSRV pinned at Rust 1.85 (rust-version in Cargo.toml). The Justfile wraps the common recipes.

  • Containerfile for ops. Produces a single OCI image suitable for deployment, CI, and reproducible release builds.

flake.nix, guix.scm, .guix-channel, and .devcontainer/ remain in the tree as experimental paths — kept, not maintained.

Pre-built release binaries for linux-x86_64, linux-aarch64, macos-arm64, and windows-x86_64 are published by the release workflow (V-L3-L1, #58) with .sha256 companions.

Integration with TypedQLiser

VeriSimiser works alongside TypedQLiser:

  • TypedQLiser type-checks your queries (compile-time, no runtime cost).

  • VeriSimiser augments your database with concerns capabilities (runtime).

  • Together: formally verified queries against an augmented database.

Status

Pre-alpha. Architecture defined, tier system designed, the eight concerns canonical per ADR-0004. Tier 1 is the priority implementation; the Temporal drift detector (V-L1-E2 / #49) is the first ADR-0003 category shipped end-to-end.

Part of the -iser family. #3 priority (after TypedQLiser and Chapeliser).

License

SPDX-License-Identifier: PMPL-1.0-or-later

About

Augment any database with VeriSimDB octad capabilities — drift detection, provenance, temporal versioning, modality overlays

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors