Skip to content

StephenAbbott/opencheck

Repository files navigation

image

OpenCheck

Customer due diligence risk checks powered by the Legal Entity Identifier (LEI), open data and open standards - including the Beneficial Ownership Data Standard (BODS).

Try the demo at https://opencheck.world/

What is OpenCheck?

You paste in a Legal Entity Identifier. OpenCheck queries GLEIF first, derives every cross-source identifier it can (UK Companies House number, Norwegian organisation number, Irish company registration number, Finnish Y-tunnus, Latvian registration number, Lithuanian entity code, Estonian registry code, Czech IČO, Polish KRS number, Austrian Firmenbuchnummer, Slovak IČO, French SIREN, Dutch KvK number, Swedish organisation number, Swiss UID, Canadian corporation number, Belgian enterprise number, Danish CVR number, Croatian MBS, Australian ACN/ABN, OpenCorporates ID, Wikidata Q-ID, and more), and uses those bridges to fan out across 29 national and international corporate data sources.

Everything maps into BODS v0.4. Cross-source links and risk signals are computed deterministically, and the whole bundle is one click away from a downloadable export (JSON / JSONL / XML / ZIP).

The risk-signal layer mirrors the EU AMLA draft customer due diligence regulatory technical standards conditions for "complex corporate structures" — trust/arrangement, non-EU jurisdiction, nominee, ≥3 ownership layers, plus the composite threshold rule and an advisory mirror of the subjective obfuscation condition.

Status

Latest: Phase 53 — AI summaries: a grounded, source-cited narrative of each entity

An on-demand, plain-English summary of what OpenCheck found about an entity, written for a customer-due-diligence / financial-crime audience — where every statement is grounded in OpenCheck's own data. The model only rephrases a pre-built evidence packet (it never retrieves or infers), and a citation validator drops any claim it can't tie to a source, so "no unprovable information" is enforced in code, not just in the prompt.

  1. Evidence packet, not raw data. build_evidence_packet() distils a lookup result into atomic, already-evidenced facts (each carrying its source, BODS statement ids and a confidence derived from source authority), structured risk items, sources consulted, and gaps. This packet is the only thing the model sees.
  2. Cited claims + mechanical validator. Claude (claude-sonnet-4-6, structured output, low temperature) returns one executive paragraph plus per-claim citations; validate_narrative() withholds anything ungrounded. Absence is evidence — clean results and gaps are themselves citable, so the model never fabricates a citation.
  3. GET /narrative. Reuses the cached lookup pipeline (so the summary can't diverge from the page), runs off the event loop, validates, and returns the packet for UI linking. Flag- and key-gated.
  4. On-demand UI. A summary panel at the top of the result page with per-claim citation chips; clicking a chip scrolls to and flashes the source card and highlights the cited BODS node.
  5. Offline eval harness. A versioned prompt, six synthetic golden packets, and a machine-checkable rubric for iterating wording before any UI ships — scripts/eval_narrative.py.

Previous: Phase 52 — GEM GEOT project-level ESG data

Full development history

Quick start

The backend ships with cache-first dispatch: in stub mode (no API keys, no OPENCHECK_ALLOW_LIVE) every adapter returns deterministic placeholder data. Live mode is opt-in per source via env vars.

Docker

cp .env.example .env
docker compose up --build

Local (without Docker)

Backend:

cd backend
uv sync
uv run uvicorn opencheck.app:app --reload --port 8000

Frontend:

cd frontend
npm install
npm run dev

The first frontend build copies bundled images for @openownership/bods-dagre into public/bods-dagre-images/. If they're missing, run npm run build once.

Documentation

Page Contents
How it works Step-by-step lookup flow, per-adapter detail, Open Ownership BODS bundles, API surface, project structure
Sources Full adapter table — 26 active sources plus inactive bulk-only adapters, license, entry point, description
Risk signals All 12 signal codes: source-derived, AMLA CDD RTS, FATF jurisdiction, cross-source name match, ICIJ Offshore Leaks
Configuration Environment variables, Render deployment, running the test suite
Development history All 53 phases

Licensing

OpenCheck's own code is MIT-licensed. Data retrieved from third-party sources is licensed under each source's own terms — see ATTRIBUTIONS.md. Downloaded exports include a LICENSES.md listing every source that contributed data, with re-use guidance for the most-restrictive licence in the bundle.

The frontend also uses the Beneficial Ownership Visualisation System design tokens and @openownership/bods-dagre, both © Open Ownership and re-used under CC BY 4.0 / Apache 2.0 respectively.

Roadmap

  • Live opentender.eu integration — the adapter is wired but live_available=False for now.
  • A "complex offshore" demo subject that fires every AMLA chip simultaneously.
  • BODS RDF / SPARQL backbone via Oxigraph — load the assembled BODS bundle into a triple store, expose /sparql for the published Open Ownership red-flag queries.

Open issues and discussion live in the GitHub repo.

Related projects

About

Customer due diligence checks powered by the Legal Entity Identifier, open data and open standards including the Beneficial Ownership Data Standard

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors