Skip to content

Antalya 26.1 - Add ALTER TABLE MODIFY SETTING support for Hybrid watermarks #1659

Open
mkmkme wants to merge 2 commits intoantalya-26.1from
mkmkme/hybrid/alter-watermark
Open

Antalya 26.1 - Add ALTER TABLE MODIFY SETTING support for Hybrid watermarks #1659
mkmkme wants to merge 2 commits intoantalya-26.1from
mkmkme/hybrid/alter-watermark

Conversation

@mkmkme
Copy link
Copy Markdown
Collaborator

@mkmkme mkmkme commented Apr 17, 2026

Introduce hybridParam('name', 'type') pseudo-function for Hybrid engine
predicates, allowing segment boundaries to be changed at runtime via
ALTER TABLE ... MODIFY SETTING without recreating the table.

Key design decisions:

  • hybridParam() takes exactly two arguments (name, type); all values
    must be provided via ENGINE SETTINGS, keeping the API surface minimal
    and eliminating default-value complexity.
  • Watermark names must start with 'hybrid_watermark_' and exactly match
    a declared hybridParam() in the predicates. Typos in both CREATE SETTINGS and ALTER are rejected.
  • Values are validated against the declared type at CREATE and ALTER
    time, so invalid values never reach the runtime substitution path.
  • Only hybrid_watermark_* settings are allowed on Hybrid tables;
    regular DistributedSettings and RESET SETTING are rejected.
  • Runtime watermark state uses MultiVersionstd::map for lock-free
    reads and deterministic serialization order in SHOW CREATE TABLE.
  • In StorageDistributed::alter(), the watermark snapshot is published
    before in-memory metadata to prevent concurrent readers from
    observing new metadata with stale watermark values.
  • Predicate validation at CREATE time substitutes the effective
    SETTINGS value (not the type default) so value-sensitive expressions
    are checked against realistic data.

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Added support for moving Hybrid table watermarks

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

mkmkme added 2 commits April 17, 2026 13:40
Introduce hybridParam('name', 'type') pseudo-function for Hybrid engine
predicates, allowing segment boundaries to be changed at runtime via
ALTER TABLE ... MODIFY SETTING without recreating the table.

Key design decisions:
- hybridParam() takes exactly two arguments (name, type); all values
  must be provided via ENGINE SETTINGS, keeping the API surface minimal
  and eliminating default-value complexity.
- Watermark names must start with 'hybrid_watermark_' and exactly match
  a declared hybridParam() in the predicates. Typos in both CREATE
  SETTINGS and ALTER are rejected.
- Values are validated against the declared type at CREATE and ALTER
  time, so invalid values never reach the runtime substitution path.
- Only hybrid_watermark_* settings are allowed on Hybrid tables;
  regular DistributedSettings and RESET SETTING are rejected.
- Runtime watermark state uses MultiVersion<std::map> for lock-free
  reads and deterministic serialization order in SHOW CREATE TABLE.
- In StorageDistributed::alter(), the watermark snapshot is published
  before in-memory metadata to prevent concurrent readers from
  observing new metadata with stale watermark values.
- Predicate validation at CREATE time substitutes the effective
  SETTINGS value (not the type default) so value-sensitive expressions
  are checked against realistic data.

Tests:
- Stateless tests covering CREATE, ALTER, DETACH/ATTACH persistence,
  multi-watermark, type conflict, invalid values, typo rejection,
  RESET SETTING rejection, and DistributedSettings rejection.
- Integration tests covering single-node, cross-node, and cluster()
  table function flows, each self-contained with own setup/teardown.

Documentation updated with hybridParam() syntax, ALTER examples,
multi-watermark usage, and restriction notes.

Made-with: Cursor
@mkmkme mkmkme added antalya port-antalya PRs to be ported to all new Antalya releases hybrid antalya-26.1 labels Apr 17, 2026
@github-actions
Copy link
Copy Markdown

Workflow [PR], commit [5c36b26]

@mkmkme
Copy link
Copy Markdown
Collaborator Author

mkmkme commented Apr 17, 2026

AI audit note: This review comment was generated by AI (gpt-5.4).

Audit update for PR #1659 (ALTER TABLE MODIFY SETTING support for Hybrid watermarks):

Confirmed defects:

Medium: ALTER MODIFY SETTING skips full predicate revalidation
Impact: a watermark change can be accepted and persisted even though the resulting Hybrid predicate becomes invalid for query rewrite/execution, leaving the table unreadable until the watermark is changed again.
Anchor: src/Storages/StorageDistributed.cpp / StorageDistributed::checkAlterIsPossible() vs Hybrid CREATE validation path
Trigger: change a watermark used in a value-sensitive predicate expression to a same-typed value that is semantically invalid for that expression.
Why defect: CREATE substitutes effective watermark values and revalidates every predicate with ExpressionAnalyzer, but ALTER only checks that the new setting parses as the declared type and never reruns predicate validation, so ALTER can admit states that CREATE would reject.
Fix direction (short): build the post-ALTER effective watermark map and reuse the same substituted-predicate validation used during CREATE.
Regression test direction (short): add a Hybrid predicate where only some same-typed watermark values are valid, and assert both CREATE and ALTER reject the bad value.

Example:

CREATE TABLE t
ENGINE = Hybrid(
    remote('localhost:9000', currentDatabase(), 'hot'),
        dateDiff(hybridParam('hybrid_watermark_unit', 'String'), ts, now()) < 30,
    remote('localhost:9000', currentDatabase(), 'cold'),
        dateDiff(hybridParam('hybrid_watermark_unit', 'String'), ts, now()) >= 30
)
SETTINGS hybrid_watermark_unit = 'day'
AS hot;

This CREATE should pass because 'day' is a valid String and also a semantically valid first argument for dateDiff().

But this ALTER is currently accepted too:

ALTER TABLE t MODIFY SETTING hybrid_watermark_unit = 'banana';

ALTER only validates that 'banana' parses as the declared type String. It does not revalidate the full predicate with substituted effective values. The next read then rewrites the predicate to:

dateDiff('banana', ts, now()) < 30

which is semantically invalid for dateDiff(). So CREATE and ALTER are not validation-equivalent for hybridParam() used in nontrivial expressions.

Coverage summary:

Scope reviewed: last commit in this PR, focusing on StorageDistributed Hybrid create/alter/read paths plus the added stateless/integration tests and docs
Categories failed: alter-time validation parity with create-time predicate validation
Categories passed: watermark name/type checks, missing-setting rejection, metadata persistence across detach/attach, multi-watermark update merge, concurrent read snapshot publication
Assumptions/limits: static audit only; I did not run the new tests or a live repro against ClickHouse

@mkmkme
Copy link
Copy Markdown
Collaborator Author

mkmkme commented Apr 17, 2026

The above comment is a very good point by /audit-review. I think we should fix it. TBH I didn't think of such a use case when implementing the feature.

FWIW the previous /audit-report run before the PR went smoothly without any discovered defects.

{
auto * arg_list = func->arguments->as<ASTExpressionList>();
const auto & param_name = arg_list->children[0]->as<ASTLiteral>()->value.safeGet<String>();
const auto & type_name = arg_list->children[1]->as<ASTLiteral>()->value.safeGet<String>();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does make sense to check children size before to avoid out of bound error?

if (auto * func = node->as<ASTFunction>(); func && func->name == "hybridParam")
{
auto * arg_list = func->arguments ? func->arguments->as<ASTExpressionList>() : nullptr;
if (arg_list && arg_list->children.size() >= 2)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw error when size != 2?

if (name_lit && type_lit)
result.emplace(name_lit->value.safeGet<String>(), type_lit->value.safeGet<String>());
}
return;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I right that method collects only type for first hybridParam in predicate and exits here? Technically condition can have several parameters.

std::optional<NameDependencies> name_deps{};
for (const auto & command : commands)
{
if (command.type == AlterCommand::Type::MODIFY_SETTING)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: I think now better to use switch-case instead of multiple ifs here.

@alex-zaitsev alex-zaitsev mentioned this pull request Apr 19, 2026
32 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

antalya antalya-26.1 hybrid port-antalya PRs to be ported to all new Antalya releases

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants