Add LaunchDarkly flag consistency check#36953
Conversation
Add a nightly CI check that verifies every synchronized Materialize system parameter (system variables, dyncfgs and feature flags returned by `SystemVars::iter_synced()`) has a corresponding flag in LaunchDarkly, and warns about LaunchDarkly flags that are no longer synchronized parameters. The set of Materialize parameters is collected by booting `materialized` and running `SHOW ALL` as `mz_system`, filtering out per-session variables and the `enable_launchdarkly` kill switch. To avoid warning about flags a deployed older version still relies on, the stale-flag check also considers the last published release (best effort). Throwaway `ci-test` flags are ignored. A parameter missing from LaunchDarkly fails the build; a stale flag only warns.
In addition to detecting parameters missing from LaunchDarkly and stale LaunchDarkly flags, also warn when a flag exists in both but the value LaunchDarkly serves by default diverges from Materialize's compiled-in default. To compare defaults, the synchronized parameters are now collected with their values from `SHOW ALL`, and the LaunchDarkly flags are fetched with full per-environment targeting so the served default (off variation, or fallthrough when targeting is on) can be determined. Comparison is conservative: ambiguous cases (e.g. a value with a unit vs a raw number, or a percentage rollout) are skipped rather than warned on. Pagination now follows the reported total count.
Many synchronized parameters legitimately have no LaunchDarkly flag (they ride their compiled-in default), so run the check with --no-fail for now: the missing-flag condition only warns until the backlog is triaged. Check default divergence against multiple environments (production and staging by default, configurable via LAUNCHDARKLY_ENVIRONMENTS). Flag existence is project-level so the missing/stale checks are environment-independent; only the served default can differ per environment, so divergence is reported per environment.
An empty diverging-defaults section was indistinguishable from "nothing could be compared". Print how many flags are present in both Materialize and LaunchDarkly, how many diverged, and how many were not comparable (no determinable LaunchDarkly default, or a representation mismatch such as a byte size shown as '1GB' versus the raw number).
mzcompose applies a large set of CI test overrides via MZ_SYSTEM_PARAMETER_DEFAULT by default, so SHOW ALL was reporting those test values rather than Materialize's compiled-in defaults, corrupting the default-divergence comparison. Boot the instance with an empty system_parameter_defaults (and no LaunchDarkly SDK key, so it does not sync from LaunchDarkly) so SHOW ALL reflects the value environmentd falls back to when a flag is absent. Parameter names are unaffected, so the missing/stale results do not change.
The list-flags endpoint only returns flag summaries, not per-environment targeting, so every served default came back as None and all in-both flags were reported "not comparable". Fetch each in-both flag individually (which returns full per-environment targeting) to determine the served default. Also report the set of environment keys actually seen and warn when a configured environment (e.g. production/staging) is not found, and fix the doubled "v" when building the previous-release image tag (MzVersion already includes the prefix).
Two fixes informed by a nightly run:
- The previous-release binary halted with "cannot read data with version
26.29.0-dev.0" because it booted on the volumes the newer current build had
just initialized. Destroy volumes before each collection so every boot starts
from a clean slate.
- The default-divergence check reported dozens of false positives from duration
formatting ("1 min" vs "60s", "30 d" vs "30d", "10 s" vs the raw-milliseconds
"10000"). Parse durations and compare by value, so only genuine differences
(e.g. "1 h" vs "0s", booleans, numbers, lists) are reported.
Self-managed deployments have no LaunchDarkly and run on the compiled-in default, so a production LaunchDarkly value that differs from the compiled-in default means cloud and self-managed behave differently (e.g. a feature enabled in cloud but still off by default in self-managed). Surface that as a warning, excluding parameters that deliberately tune cloud-only infrastructure (INTENTIONAL_LD_OVERRIDES: replica expiration, cloud quotas and replica sizes). Separately warn when a flag's served default differs between LaunchDarkly environments (e.g. production vs staging), which usually only happens during a staged rollout. Durations are compared by value via a shared values_equivalent helper so representation differences are not reported.
|
Nightly run in https://buildkite.com/materialize/nightly/builds/16741 |
Byte sizes are not normalized; a unit-bearing value like "1GB" is compared textually against the raw byte count and reported as differing. Document this limitation accurately instead of claiming such values return None.
Add a parse_bytes helper (1024-based B/kB/MB/GB/TB, matching src/repr/src/bytes.rs) mirroring the duration parser, and use it in values_equivalent so a unit-bearing byte size like "1GB" compares equal to the raw byte count LaunchDarkly stores (1073741824). Without this, byte-size parameters such as max_result_size would be reported as cloud-vs-self-managed divergences even when identical.
| # Warn-only for now: many synchronized parameters legitimately | ||
| # have no LaunchDarkly flag. Drop --no-fail once the backlog of | ||
| # missing flags has been triaged. | ||
| args: [--no-fail] |
There was a problem hiding this comment.
What good is adding it to Nightly if it can't fail? I would instead add a list of known unsynchronized parameters and fail for each difference other than that.
There was a problem hiding this comment.
Removed the check and added a list of known failures so it's not immediately red. I'll file an issue to fix the known failures.
| # `SessionVars::iter`. | ||
| # Matched case-insensitively, so the historical capitalization of e.g. | ||
| # `DateStyle` does not matter. | ||
| SESSION_VARIABLES = { |
There was a problem hiding this comment.
Isn't there a way to list all session variables dynamically? Keeping it in sync manually will definitely fail at some point. There should at least be a linter, but I prefer dynamic.
There was a problem hiding this comment.
Yup, same here. I'll file a follow-up issue to detect session variables, at the moment there is none other than looking at Rust code, or Rust code looking at Python code.
Per review, switch from warn-only to hard failure with curated known-exceptions allowlists, so only a *new* discrepancy turns the build red: - KNOWN_MISSING_FROM_LD (seeded with the current 231 unsynchronized params) - KNOWN_STALE_LD_FLAGS (seeded with the current 35 stale flags) - INTENTIONAL_LD_OVERRIDES (cloud-only tuning; cloud-vs-self-managed) - KNOWN_CROSS_ENV_DIVERGENCES (deliberate staged rollouts) The check now fails when any missing/stale/cloud-vs-self-managed/cross-env discrepancy falls outside its allowlist, and prints allowlist entries that no longer apply so they can be pruned. Drop --no-fail from the nightly step (kept as a local-only override). Also inline the internal SQL port (6877) per review.
Seed INTENTIONAL_LD_OVERRIDES with the 25 cloud-vs-default divergences (cloud- only tuning plus a TODO group of feature flags that default on in code but off in prod cloud, to be triaged) and KNOWN_CROSS_ENV_DIVERGENCES with the 38 prod-vs-staging divergences. Also report allowlisted divergence entries that no longer apply so they can be pruned as rollouts complete.
Motivation
Materialize synchronizes system parameters (system variables, dyncfgs, and feature flags — everything
SystemVars::iter_synced()returns) from LaunchDarkly at runtime. If a synchronized parameter has no corresponding LaunchDarkly flag,client.variationsilently falls back to the compiled-in default and the parameter can never be controlled in cloud. Conversely, a deployment without LaunchDarkly runs on the compiled-in default, so a LaunchDarkly value that diverges from that default means cloud behaves differently from the out-of-the-box default. Nothing checks for either today.Description
A new nightly CI step (
test/launchdarkly-flag-consistency) that compares Materialize's synchronized parameters against the LaunchDarkly project. It fails on four kinds of discrepancy, each gated by a curated known-exceptions allowlist so the build is green on today's baseline and only a new discrepancy turns it red:KNOWN_MISSING_FROM_LD.KNOWN_STALE_LD_FLAGS. The last release is considered to avoid flagging a flag a deployed older version still needs.INTENTIONAL_LD_OVERRIDES.KNOWN_CROSS_ENV_DIVERGENCES.The allowlists capture the accepted baseline and are expected to shrink over time; the check also prints any allowlist entry that is no longer a discrepancy so it can be pruned (e.g. a cross-environment rollout self-flags once it reaches production). The current baseline seeds: 231 missing, 35 stale, 35 cloud-vs-default overrides, 38 cross-environment. The cloud-vs-default group includes a commented TODO subset of feature flags that default on in code but are served off in production cloud — allowlisted for now and to be triaged (reconcile the compiled-in default, or flip production).
How it works:
system_parameter_defaultsand no LaunchDarkly SDK key, soSHOW ALLasmz_systemreports the true compiled-in defaults rather than mzcompose's CI overrides or LaunchDarkly-synced values. Volumes are destroyed between boots so the older binary doesn't trip over newer persist/catalog state.values_equivalenthelper that normalizes representation differences: durations (1 min==60s,10 s== raw-milliseconds10000), byte sizes (1GB==1073741824), and boolean spellings (on==true), so formatting differences aren't reported.src/sql/src/session/vars.rs), theenable_launchdarklykill switch, andci-test-tagged throwaway flags.Configuration via env vars:
LAUNCHDARKLY_PROJECT(defaultdefault) andLAUNCHDARKLY_ENVIRONMENTS(defaultproduction,staging; the first is the cloud baseline compared against the compiled-in default).Pipeline integration (
ci/nightly/pipeline.template.yml): newlaunchdarkly-flag-consistencystep onlinux-aarch64-small(hasLAUNCHDARKLY_API_TOKEN), depends onbuild-aarch64. It is blocking — a discrepancy outside the allowlists fails the step. (--no-failremains as a local-only override for downgrading failures to warnings.)Verification
Exercised by the nightly pipeline (requires
LAUNCHDARKLY_API_TOKENand LaunchDarkly API access). Trial runs against the realdefaultproject confirmed it surfaces genuine drift and that the allowlists make the baseline green. Thevalues_equivalentduration/byte/boolean normalization is unit-tested locally.https://claude.ai/code/session_01Qm3eZHzYhDYoMe4okpebri