Skip to content

chore(openmemory-js): security & test hardening (P1+P2 close-out)#172

Open
machj8968-lab wants to merge 23 commits intoCaviraOSS:mainfrom
machj8968-lab:feat/openmemory-js-hardening-2026-04
Open

chore(openmemory-js): security & test hardening (P1+P2 close-out)#172
machj8968-lab wants to merge 23 commits intoCaviraOSS:mainfrom
machj8968-lab:feat/openmemory-js-hardening-2026-04

Conversation

@machj8968-lab
Copy link
Copy Markdown

Summary

Closes the P1 and key P2 findings from a security audit of packages/openmemory-js.
22 commits across 4 phases, all backend / SDK changes — no UI scope.

Phases (in order on the branch):

  • Worktree A — HTTP / auth: fail-closed auth, req.tenant derived from API key (SHA-256), zod-shaped runtime validator on every route, GitHub/Notion webhook HMAC verification, error responses no longer swallowed as 200s
  • Worktree B — storage: PG TLS verify-full default, assertSafeIdentifier on env-driven DB names, vector table consolidated as openmemory_vectors, SQLite foreign_keys=ON, DB init failure throws DbInitError instead of process.exit(1)
  • Worktree C — tests: vitest harness replaces ad-hoc tsx scripts, ports omnibus + multilingual_dedup, fixes the broken tests/test_multilingual_dedup.ts/ folder structure, fixes migrate script path
  • Phase 1-4 plan follow-ups: prettier --write across the package, verify.test.ts un-skipped via classifier snapshot, per-tenant filtering pushed from JS post-fetch into SQL WHERE (options-bag), server.js ported to typed server.ts, tsconfig allowJs:false, webhook regression guard test

Test results

Test Files  5 passed (5)
     Tests  27 passed (27)
  Duration  3.2s
  • tests/omnibus.test.ts — 3 phases, synthetic embeddings
  • tests/multilingual_dedup.test.ts — 5 cases
  • tests/verify.test.ts — classifier snapshot + dim check
  • tests/temporal_per_tenant.test.ts — Alice/Bob isolation, idempotent migration, orphan quarantine (7 cases)
  • tests/webhook.test.ts — GitHub + Notion HMAC, valid/forged/missing-secret/missing-rawBody (10 cases)

tsc --noEmit -p tsconfig.json → 0 errors. prettier --check → all clean.

Breaking changes (downstream callers)

  • OM_API_KEY required. Use OM_DEV_ALLOW_NO_AUTH=true for local dev only.
  • body.user_id / query.user_id / path-slug user_id no longer trusted — mismatch with authenticated tenant returns 403 tenant_mismatch.
  • New required env vars for webhooks: OM_GITHUB_WEBHOOK_SECRET, OM_NOTION_WEBHOOK_SECRET. Unset → 503.
  • PG TLS verify-full default in production. Self-signed certs need explicit OM_PG_SSL=require or a CA via OM_PG_SSL_CA.
  • Vector table default name openmemory_vectors (was vectors on SQLite). Existing installations: keep working with one-time WARN; set OM_VECTOR_TABLE=vectors to pin the legacy name.
  • Invalid JSON → 400 (was: req.body = null and the request continued).
  • /memory/query errors → 500 (was: silently empty result).
  • /users/summaries/regenerate-all and /api/temporal/decay are now per-tenant by default; global runs require OM_ADMIN_REGENERATE_ALL=true / OM_ADMIN_DECAY=true.

Audit-driven findings closed

This PR closes the Codex audit findings rated:

  • [P1] Auth fail-open (auth.ts:89,91) → fixed in this PR
  • [P1] Multi-tenant isolation broken (caller-supplied user_id) → fixed in this PR (auth-derived tenant + per-tenant SQL filter)
  • [P2] Webhook ingestion no signature verification → fixed
  • [P2] PG TLS rejectUnauthorized: falsefixed
  • [P2] DB identifiers interpolated raw → fixed
  • [P2] Input validation absent → fixed (runtime validator on every route)
  • [P2] Errors hidden (memory/query empty result, JSON parse → null) → fixed
  • [P2] Schema/backend drift (sqlite vectors vs PG openmemory_vectors) → fixed (canonical name)
  • [P2] SQLite foreign_keys=OFFfixed
  • [P2] Test infrastructure (no npm test, broken tests/test_multilingual_dedup.ts/ folder) → fixed
  • [P2] process.exit(1) on DB init → fixed (DbInitError)
  • [P2] temporal decay SQLite-only SELECT changes()fixed (backend-portable)

Out of scope (deferred)

  • src/server/index.ts graceful shutdown chain beyond unhandledRejection (current handler covers DbInitError; broader chain is its own PR)
  • timeline.ts 3 helpers (get_subject_timeline, get_predicate_timeline, get_volatile_facts) still use JS post-filter — same options-bag treatment can extend in a follow-up
  • Replacing the custom server.ts framework with Express — breaks the "preserve patterns" principle for OSS contributors; deferred
  • Per-tenant counters for get_active_facts_count / get_total_facts_count — endpoints currently emit scope: "global" annotation
  • @types/wsserver.ts uses local typed shape declarations to avoid touching package.json deps in this PR

Test plan

  • npx tsc --noEmit -p tsconfig.json → 0 errors
  • npx prettier --check "src/**/*.ts" "tests/**/*.ts" → all clean
  • npm test → 27 passed
  • (post-merge) integration test against a real Postgres + valid OM_PG_SSL=verify-full
  • (post-merge) GitHub webhook end-to-end with real OM_GITHUB_WEBHOOK_SECRET

Related

🤖 Generated with Claude Code

AI and others added 23 commits April 27, 2026 17:53
- Add vitest + @vitest/coverage-v8 devDependencies
- Add npm scripts: test, test:watch, test:coverage, typecheck
- Fix broken migrate script path: src/migrate.ts -> src/core/migrate.ts
- Add vitest.config.ts: sequential single-fork run, 60s timeout,
  forces OM_EMBEDDINGS=synthetic + sqlite backends so the suite needs
  no API keys or external services

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- multilingual_dedup: move out of folder-named-like-a-file,
  rewrite as vitest describe/it preserving original assertions
- omnibus: convert phases 1-3 to it() blocks; keep mocked Date.now
  for the evolutionary-stability phase, preserve sleeps and cleanup
- verify: quarantine via describe.skip with an explanatory TODO.
  Original used q.conn.run (not in current db.ts API) and asserted
  hard-coded sector classifications that depend on classifier
  heuristics — re-baselining is out of scope for this infra pass.
  Cleanup-API and embedding-key issues are pre-fixed in the .skip
  body so a future implementer only needs to revisit the assertions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduces two small helpers that the storage layer uses to harden
env-driven configuration:

- identifiers.ts: assertSafeIdentifier() rejects any value that
  isn't [A-Za-z_][A-Za-z0-9_]{0,62}, preventing injection through
  OM_PG_DB / OM_PG_SCHEMA / OM_PG_TABLE / OM_VECTOR_TABLE which are
  interpolated into raw CREATE / DELETE SQL. Also exports
  DbInitError so library callers can catch init failures instead of
  the package killing the host process, and DEFAULT_VECTOR_TABLE so
  the SQLite and Postgres backends agree on a canonical name.

- pg_ssl.ts: resolvePgSsl() implements the documented OM_PG_SSL
  matrix — verify-full (default in production, system trust store
  or OM_PG_SSL_CA), require (TLS without cert verification, WARN
  logged), disable (no TLS), unset (defaults to verify-full in
  production / disable in dev).

These are used by the follow-up commits in db.ts, migrate.ts and
vector/postgres.ts.
Storage-layer fixes from the Codex audit:

- PG TLS: replace OM_PG_SSL=require → rejectUnauthorized:false with
  resolvePgSsl(); default is verify-full in production. The legacy
  require mode now logs a WARN.

- Identifier injection: validate OM_PG_DB / OM_PG_SCHEMA / OM_PG_TABLE
  / OM_VECTOR_TABLE through assertSafeIdentifier before they are
  interpolated into CREATE DATABASE, CREATE TABLE, DELETE FROM, etc.
  CREATE DATABASE now uses the validated, double-quoted name.

- Schema drift: SQLite vector table default is now openmemory_vectors
  (matching Postgres). Pre-1.4 installs that still have a bare
  `vectors` table get a one-time WARN with a manual rename hint —
  no auto-rename to avoid data risk. The clear_all path now uses the
  same validated name as CREATE TABLE instead of re-reading env.

- SQLite FKs: PRAGMA foreign_keys=ON so temporal_edges →
  temporal_facts FKs are actually enforced.

- Init failure: stop calling process.exit(1). Capture the failure
  as DbInitError; wait_ready() rejects with it, propagating through
  the run/get/all wrappers so library callers can catch instead of
  crashing the host process.

- log_maint_op now validates OM_PG_SCHEMA on each call too.
…elper

- Route the migration pool through resolvePgSsl() so we get the
  same verify-full / require / disable matrix as db.ts. The legacy
  inline OM_PG_SSL handling (which silently disabled cert
  verification on `require`) is gone.

- Validate OM_PG_DB / OM_PG_SCHEMA / OM_PG_TABLE / OM_VECTOR_TABLE
  before interpolating them. Schema validation is centralized in
  pgSchema().

- SQLite vector table is no longer hardcoded as `vectors`. The 1.2
  migration's SQL is now generated from the resolved table name —
  OM_VECTOR_TABLE if set, otherwise the legacy `vectors` table when
  it exists on disk, otherwise the canonical openmemory_vectors.
  This stops the migration from blowing up on installs that already
  use the new default.
…tion

- Default tableName is now DEFAULT_VECTOR_TABLE (openmemory_vectors)
  instead of the legacy `vectors`, matching db.ts and migrate.ts.

- Validate the table identifier in the constructor. db.ts already
  passes a pre-validated, schema-qualified, quoted form
  (e.g. "public"."openmemory_vectors") for the Postgres metadata
  backend; we detect that case by the leading double quote and
  trust it. Bare identifiers (e.g. SQLite VectorStore wrapping
  openmemory_vectors) are routed through assertSafeIdentifier.
apply_confidence_decay was using SQLite's connection-scoped
`changes()` to count rows touched by the UPDATE. That call doesn't
exist in Postgres, so on the PG backend the function silently
reported 0 every run.

Branch on env.metadata_backend:

- SQLite: keep the existing UPDATE + `SELECT changes()`.
- Postgres: rewrite the same UPDATE with `RETURNING 1` and count
  the rows of the result set. Also swap MAX(scalar, scalar) for
  GREATEST, since Postgres' MAX is aggregate-only.

No SQL placeholder change is required: the package's run_async
already converts `?` to `$N` for the PG path.
…logs

Codex audit fixes for the HTTP layer:

- middleware/auth: drop fail-open behaviour. Without OM_API_KEY, every
  protected endpoint now returns 503. Production / OM_REQUIRE_AUTH=true
  refuse to admit anonymous traffic at all. Successful auth attaches a
  stable req.tenant (SHA-256 prefix of the API key); the raw key never
  leaves this module. OM_DEV_ALLOW_NO_AUTH=true is required to opt into
  the legacy "no key" mode for local dev.
- middleware/tenant: helpers require_tenant() / reject_tenant_mismatch()
  for routes to derive identity from auth and reject caller-supplied
  user_id values that disagree.
- middleware/validate: tiny stdlib-only schema validator (no new deps)
  that returns parsed data or 400 with field-level errors.
- middleware/webhook: HMAC-SHA256 verification for GitHub
  (x-hub-signature-256) and Notion (x-notion-signature) webhooks with
  constant-time compare; fail-closed on missing secret.
- server.js: capture rawBody (Buffer) for HMAC verification and respond
  400 invalid_json on parse failure instead of silently nulling req.body.
- server/index.ts: telemetry errors now log with stack trace rather
  than being swallowed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…errors

Tenant isolation: every memory/users/vercel/ide handler now scopes all
reads and writes to req.tenant. Caller-supplied user_id (in body, query,
or path) is no longer trusted — if it disagrees with the authenticated
tenant we 403 with tenant_mismatch. Path slugs on /users/:user_id/...
are also checked against the tenant.

Validation: every route entry runs through the new validate middleware
with explicit schemas (length/type/range bounds). Pagination params are
parsed defensively and rejected when out of range.

Error handling: /memory/query no longer swallows backend errors and
returns an empty match list; it now logs and 500s. /memory/all,
/memory/:id, /memory/:id (delete), and the users/vercel/ide handlers
log on failure. /users/summaries/regenerate-all no longer fans out
across every tenant by default — opt-in via OM_ADMIN_REGENERATE_ALL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Webhooks (sources.ts):
- /sources/webhook/github now requires OM_GITHUB_WEBHOOK_SECRET and
  verifies x-hub-signature-256 against the raw body with constant-time
  compare. Missing secret -> 503 webhook_not_configured. Bad signature
  -> 401 invalid_signature.
- /sources/webhook/notion adopts the same scheme keyed by
  OM_NOTION_WEBHOOK_SECRET. Notion's public spec does not document a
  signature scheme; we require an explicit shared HMAC secret rather
  than accept anonymous payloads.
- /sources/:source/ingest now derives user_id from req.tenant and 403s
  on body.user_id mismatch.

Temporal (temporal.ts):
- All handlers now require_tenant() and validate inputs (date strings,
  numbers, enums) with explicit 400s instead of silent NaN coercion.
- create/get/update/invalidate scope by tenant via the user_id
  parameter that the temporal_graph store/query layer already
  supports. update/invalidate verify ownership via a tenant-scoped
  query before mutating.
- /api/temporal/decay (global maintenance) now requires
  OM_ADMIN_DECAY=true to run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Hardens packages/openmemory-js DB layer per Codex audit:
- PG TLS verify-full default; require/disable opt-in
- Strict identifier validation (assertSafeIdentifier) on env-driven names
- SQLite vector table renamed to openmemory_vectors (canonical)
- SQLite PRAGMA foreign_keys=ON
- DB init failure throws DbInitError instead of process.exit(1)
- Backend-portable rows-affected helper for temporal_graph

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces ad-hoc tsx scripts with vitest harness per Codex audit:
- Add vitest + @vitest/coverage-v8, npm scripts: test/test:watch/test:coverage/typecheck
- vitest.config.ts: serial fork, OM_EMBEDDINGS=synthetic, OM_VECTOR_BACKEND=sqlite
- Port test_omnibus.ts -> tests/omnibus.test.ts (3 phases)
- Port test_multilingual_dedup nested-folder -> tests/multilingual_dedup.test.ts
- verify.test.ts quarantined as describe.skip with TODO (sector heuristic coupling)
- Fix package.json migrate path: src/migrate.ts -> src/core/migrate.ts

Result: 8 passed | 1 skipped, tsc 0 errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes Codex P1 findings on HTTP layer:
- Auth fail-closed: OM_API_KEY now required (OM_DEV_ALLOW_NO_AUTH for dev)
- Tenant identity derived from API key (req.tenant), no body.user_id trust
- New middleware: tenant.ts, validate.ts (zod-shape stdlib validator), webhook.ts
- HMAC verification for GitHub/Notion webhooks (OM_*_WEBHOOK_SECRET)
- /memory/query no longer swallows backend errors as empty result
- Invalid JSON now 400 instead of req.body=null
- Telemetry errors logged with stack
- Admin gates on regenerate-all + decay (OM_ADMIN_*)

Breaking: clients must send valid OM_API_KEY; body/query/path user_id mismatched
to authenticated tenant returns 403 tenant_mismatch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After Worktree B replaced process.exit(1) on DB init failure with a thrown
DbInitError, the server entry point needs to catch it at the boundary.
Otherwise it surfaces as an unhandled rejection. Add a process-level
handler that logs the message and exits 1 only for DbInitError; other
rejections continue to propagate to existing safety nets.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Brings packages/openmemory-js source and test files to .prettierrc
compliance in one shot, so subsequent feature diffs stay clean. No
logic changes.
Replaces the original tsx-only smoke script's hard-coded sector
expectations with a vitest snapshot of current classifier output.
Drift in classifier behaviour now surfaces as a snapshot diff that
maintainers acknowledge with 'vitest -u', rather than silent passes
or a permanently-skipped spec.
Switches get_facts_by_subject, search_facts, query_facts_in_range,
find_conflicting_facts, get_related_facts to options-bag signatures
with user_id as a required field. Adds get_fact_by_id_for_user for
direct authenticated lookups. Filtering moves from JS post-fetch
into the SQL WHERE clause, which closes the audit caveat that
legacy facts could leak across tenants when post-filtering. Adds
LEGACY_ORPHAN_TENANT constant for the upcoming migration.
routes/temporal.ts now passes req.tenant via options-bag to the
helper signatures introduced in the previous commit, deleting the
f.user_id === tenant post-filters. migrate.ts marks any pre-existing
temporal_facts with NULL user_id as LEGACY_ORPHAN_TENANT so they
cannot be served to any real tenant.
Locks valid/invalid/missing-secret/missing-rawBody paths before the
src/server/server.js -> server.ts port. The rawBody case is the
explicit regression guard: if the typed framework drops raw-body
capture, this test fails loudly instead of silently fail-opening
the webhook.

Adapted from the plan's Express-style template to the actual
middleware shape: verify_{github,notion}_signature are pure
(raw_body, header_value, secret) -> { ok, reason? } functions, not
(req,res,next) middleware.
Replaces the 228-line untyped HTTP framework at src/server/server.js
with a typed src/server/server.ts. Tightens tsconfig.json by setting
allowJs:false so future regressions cannot reintroduce untyped
modules. Behaviour preserved: same req/res augmentations (rawBody,
status, json, send, set, hostname, ip, path, params, query), same
middleware contract, same routing (incl. ALL/options/head/ws), same
JSON body-parsing 413/400 behaviour, same 404 fallthrough, same
WebSocket upgrade routing, same serverStatic implementation.

The webhook HMAC regression guard added in the previous commit
covers the load-bearing rawBody capture path.

The `ws` package does not ship its own .d.ts and adding @types/ws is
out of scope for this phase, so the WebSocketServer constructor is
loaded via require() with a locally-declared shape — same as the
original JS file did.
Cleans up the three files Phase 2 and Phase 3 introduced or modified
that had not been reformatted: query.ts (options-bag rewrites),
temporal_per_tenant.test.ts (new), and verify.test.ts (snapshot port).
Final state: prettier --check passes across all src and tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tion

Resolves the 9-file conflict between the security hardening branch and
the project-level memory isolation feature from PR CaviraOSS#171. Both isolation
dimensions are orthogonal: every fact filters by user_id (required) AND
project_id (optional, with system_global / NULL fallback per CaviraOSS#171).
temporal_graph helpers now use options-bag signatures carrying both
fields; insert_fact preserves the positional overload for backward-compat.
All 27 prior tests + project isolation tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant