Skip to content

feat: ClickHouse cluster (ON CLUSTER) support#168

Open
KeKs0r wants to merge 2 commits into
mainfrom
cluster-support
Open

feat: ClickHouse cluster (ON CLUSTER) support#168
KeKs0r wants to merge 2 commits into
mainfrom
cluster-support

Conversation

@KeKs0r

@KeKs0r KeKs0r commented Jun 29, 2026

Copy link
Copy Markdown
Member

Summary

  • Add a clickhouse.cluster config field. When set, chkit emits ON CLUSTER <name> on every generated DDL statement and stores its migration journal in a replicated engine — for self-managed multi-node clusters. Table engines are passed through unchanged (no rewrite). Leave cluster unset for single-node, ClickHouse Cloud, or ObsessionDB, where replication is automatic and ON CLUSTER is unnecessary.
  • core: applyOnClusterToPlan() stamps the clause onto the finished plan in one post-pass, so the planner/renderers stay cluster-agnostic; new config field with injection-safe validation.
  • cli: the journal becomes ReplicatedReplacingMergeTree on a no-{shard} path created ON CLUSTER, so migration history is consistent across nodes and a load-balanced endpoint never re-applies migrations. Threaded at generate/migrate/status/check.
  • Docs (configuration/overview.md) + changeset (patch). New local docker cluster under test/cluster/ with cluster:up/down/verify + test:cluster scripts; the cluster e2e lives outside src/ so it's excluded from the default CI run.

Test plan

  • bun verify green — typecheck + lint + full live ObsessionDB e2e + build, 0 regressions
  • hermetic golden + validation: bun test packages/core/src/on-cluster.test.ts
  • live cluster e2e (2-replica docker cluster): bun run cluster:up && bun run test:cluster
  • manually verified on the cluster: CREATE database/table/view/materialized-view, ALTER, and DROP+recreate fan out to both replicas; journal replicated and consistent cross-endpoint; idempotent re-runs; no REPLICA_ALREADY_EXISTS

Set `clickhouse.cluster` to run all generated DDL ON CLUSTER and store the
migration journal in a replicated engine — for self-managed multi-node clusters.

- core: applyOnClusterToPlan() stamps ON CLUSTER onto the finished plan as a
  single post-pass; new `cluster` config field with injection-safe validation.
  Table engines are passed through unchanged (no rewrite).
- cli: journal becomes ReplicatedReplacingMergeTree on a no-{shard} path created
  ON CLUSTER, threaded at generate/migrate/status/check.
- tests: hermetic golden + validation (on-cluster.test.ts) and a live cluster
  e2e (packages/cli/test/cluster.e2e.test.ts) against a local docker cluster
  (test/cluster/, bun run cluster:up / test:cluster), excluded from default CI.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1a5442d10c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/core/src/on-cluster.ts Outdated
'ALTER TABLE',
'DROP TABLE IF EXISTS',
'DROP VIEW IF EXISTS',
'RENAME TABLE IF EXISTS',

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Place ON CLUSTER after the RENAME target

For explicit table renames under cluster mode, this anchor is passed through the generic injection path, producing RENAME TABLE IF EXISTS db.a ON CLUSTER 'c' TO db.b;. ClickHouse's RENAME syntax places [ON CLUSTER cluster] after the complete name TO new_name list (docs: https://clickhouse.com/docs/sql-reference/statements/rename), so any migration generated by --rename-table/renamedFrom with clickhouse.cluster will fail to apply instead of renaming the table.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in 4554f90. RENAME TABLE is now special-cased to place ON CLUSTER after the full name TO new_name list (at the end). Confirmed live: the old placement threw SYNTAX_ERROR (62); the new form applies and replicates. Added a golden unit-test assertion plus a rename case in the cluster e2e.

// no data rewrite.
await db.command(
`ALTER TABLE ${journalTable} ADD COLUMN IF NOT EXISTS migration_completed Bool DEFAULT true`,
`ALTER TABLE ${journalTable}${onCluster} ADD COLUMN IF NOT EXISTS migration_completed Bool DEFAULT true`,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject existing local journals in cluster mode

When clickhouse.cluster is added to a project that already has the old local _chkit_migrations table, this upgrade path only runs ALTER ... ON CLUSTER and never creates or converts the table to the new replicated engine. That leaves the journal non-replicated (or the distributed ALTER fails on replicas where the old table is absent), so status/migrate through another cluster node can miss applied migrations and replay them; please detect a pre-existing non-ReplicatedReplacingMergeTree journal and fail with migration instructions or migrate it before marking the store bootstrapped.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 4554f90. In cluster mode, ensureTable now checks the existing journal's engine first and fails fast with migration guidance (drop ON CLUSTER, or set CHKIT_JOURNAL_TABLE) if it isn't a replicated engine — before any ON CLUSTER ALTERs run. Added a cluster e2e case covering a pre-existing non-replicated journal.

…ed journal

Addresses PR #168 review (Codex):
- P1: RENAME TABLE places ON CLUSTER after the full `name TO new_name` list
  (at the end), not after the source name — the previous placement produced a
  SYNTAX_ERROR. Fixed in the injector + unit test; verified live.
- P2: when clickhouse.cluster is set and a pre-existing non-replicated
  _chkit_migrations journal is found, fail fast with migration guidance instead
  of leaving history non-replicated.
- Added cluster e2e coverage for both (rename placement + journal guard).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant