Skip to content

ci: unblock release pipeline — runner label, license fatal, webpack stamp word-split#64

Open
ConstanzeTU wants to merge 5 commits into
mainfrom
entlein/ci-release-fixes
Open

ci: unblock release pipeline — runner label, license fatal, webpack stamp word-split#64
ConstanzeTU wants to merge 5 commits into
mainfrom
entlein/ci-release-fixes

Conversation

@ConstanzeTU

Copy link
Copy Markdown

Summary

Three independent CI fixes that were proven out on the v0.0.10 release tag. Without these on `main`, the next `release/cloud/v0.0.11-pre-v0.0` tag will hit the same silent failures that took us an afternoon to chase last time.

What's in

Commit What Why
`4a00e90e3` Runner label `oracle-16cpu-64gb-x86-64` → `oracle-vm-16cpu-64gb-x86-64` across 5 release/mirror workflows Live self-hosted runners use the `-vm-` variant; old label means the job queues forever (1h+ on v0.0.10 before being retried). `perf_clickhouse`, `perf_soc_attack`, `build_and_test` all already use the correct label.
`5959f7a30` `tools/licenses/BUILD.bazel`: `disallow_missing = False` on `go_licenses` + `deps_licenses` `manual_licenses.json` has 37 hand-curated entries and drifts behind `go.sum` every time `main` pulls new transitive deps. v0.0.10-pre-v0.0 had 38 unmatched modules and got fatal'd at the release-only stamp gate. Missing modules still recorded in `*_licenses_missing.json` so the gap is visible; curating the backlog is a follow-up.
`95c7a2ae3` `bazel/ui.bzl`: whitelist stamp-import sed to `STABLE_BUILD_TAG` + `BUILD_TIMESTAMP`; use absolute yarn path Root cause of v0.0.10's "yarn fails silently" pattern. The webpack action eval'd `workspace_status_command` output via `$(sed ... 'export \1=\2' ...)` to import stamp vars. `FORMATTED_DATE` has a space-separated value (`2026 Jun 18 22 06 02 Thu`); `$(...)` word-splits before quotes take effect, so bash sees `export FORMATTED_DATE=2026 Jun 18 ...` and tries to `export 18`, `export 22`, ... — "not a valid identifier" on line 1, action aborts before `cp` / `tar` / `yarn` even run. The file's own comment called it out: "Hopefully, no special characters/spaces/quotes in the results ...". Filtering to the two vars `webpack.config.js`'s `EnvironmentPlugin` actually reads — both space-free — sidesteps it.

Test plan

  • Built proof: `release/cloud/v0.0.10-pre-v0.0` (with these patches snapped together) completed successfully — workflow run 27792348598 — all 15 `cloud-*_server_image` images pushed to `ghcr.io/k8sstormcenter`, and the cloud-proxy's `bundle/bundle-oss.json` correctly contains the scripts.
  • Re-tag a no-op test release from main after merge (e.g. `release/cloud/v0.0.11-pre-v0.0`) to confirm the fixes hold from main directly.

Non-goals

These commits are deliberately CI-only — no schema/code changes. The dx_evidence_graph viz scaffold sitting in PR #62 stays separate so it can iterate on CodeRabbit feedback without holding the CI fixes hostage. Once this lands, PR #62 rebases onto a fixed main.

Type of change

/kind bug

entlein added 3 commits June 19, 2026 07:40
Five release/mirror workflows reference oracle-16cpu-64gb-x86-64 but
the currently-online self-hosted runners use
oracle-vm-16cpu-64gb-x86-64. Confirmed by perf_clickhouse,
perf_soc_attack, and build_and_test which all run cleanly on the
-vm- label. release/cloud/v0.0.10-pre-v0.0 queued for 1h+ under the
legacy label before being retried after the fix.

Patches: cloud_release.yaml, vizier_release.yaml, operator_release.yaml,
cli_release.yaml, mirror_deps.yaml.
The release pipeline trips on this every time main pulls in new
transitive Go deps faster than manual_licenses.json is curated.
manual_licenses.json has 37 entries; v0.0.10-pre-v0.0 hit 38 newly
missing modules, blocking the release on an unrelated thing.

Drop the stamped-build fatal gate (was: disallow_missing = select(
{"//bazel:stamped": True, "//conditions:default": False})). Missing
licenses are still recorded in {go,deps}_licenses_missing.json so
the gap is visible; a follow-up can curate the backlog without
holding releases hostage.
ROOT CAUSE of release/cloud/v0.0.10-pre-v0.0 failure:

The webpack actions eval workspace_status_command output via
  \$(sed -E "s/^([A-Za-z_]+)\\s*(.*)/export \\1=\\2/g" "stable-status.txt")
to import stamp vars into the action env. workspace_status_command
emits FORMATTED_DATE whose value is space-separated
("2026 Jun 18 22 06 02 Thu"). \$(...) command substitution
word-splits the sed output BEFORE single quotes can protect the
value, so bash sees `export FORMATTED_DATE=2026 Jun 18 ...` and
tries to also `export 18`, `export 22`, `export 02`, ... — all
failing with "not a valid identifier" on line 1. The action aborts
before any cp / tar / yarn runs.

Every prior "yarn failed silently" symptom was this. The file's own
comment called it out:
  "Hopefully, no special characters/spaces/quotes in the results ..."

Filter the sed with -n + /p to emit only the two vars
webpack.config.js' EnvironmentPlugin actually reads
(STABLE_BUILD_TAG = version string, BUILD_TIMESTAMP = unix
timestamp). Both space-free, so no quoting gymnastics needed.

Also belt-and-braces: invoke yarn by absolute path
(/opt/px_dev/tools/node/bin/yarn). bazel's
--incompatible_strict_action_env strips host PATH from actions
even with use_default_shell_env=True, so a bare `yarn` doesn't
reliably resolve. The dev image's chef recipe
(tools/chef/cookbooks/px_dev/recipes/nodejs.rb:32) installs node
there, verified by `which yarn` inside the dev image. Keep the
PATH export so children (webpack → node) can find each other.

Replaces the capture-into-\$output + unquoted-echo pattern with a
direct `yarn build_prod` so any future failure surfaces yarn's
real stderr instead of an empty string.

Verified: release/cloud/v0.0.10-pre-v0.0 with all of these patches
shipped all 15 cloud-* images to ghcr.io/k8sstormcenter and the
bundle layer correctly contains the new dx_evidence_graph script.
@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 074658e8-b4bc-4438-bc27-4be46e919bca

📥 Commits

Reviewing files that changed from the base of the PR and between bd9cc66 and 24df7f9.

📒 Files selected for processing (4)
  • bazel/ui.bzl
  • private/cockpit/cloud_ingress.yaml
  • terraform/kubernetes/auth0/auth0_import.tf
  • tools/licenses/BUILD.bazel
💤 Files with no reviewable changes (3)
  • terraform/kubernetes/auth0/auth0_import.tf
  • tools/licenses/BUILD.bazel
  • bazel/ui.bzl

📝 Walkthrough

Walkthrough

Five GitHub Actions workflows have their runner label renamed from oracle-16cpu-64gb-x86-64 to oracle-vm-16cpu-64gb-x86-64. In bazel/ui.bzl, webpack and license build actions are updated for --incompatible_strict_action_env compatibility via shell PATH reordering, use_default_shell_env=True, a whitelisted stamp variable export, and a simplified yarn build invocation. In tools/licenses/BUILD.bazel, both fetch_licenses targets set disallow_missing=False unconditionally. Kubernetes and Terraform configuration files receive formatting updates.

Changes

GitHub Actions Runner Label Rename

Layer / File(s) Summary
Runner label update
.github/workflows/cli_release.yaml, .github/workflows/cloud_release.yaml, .github/workflows/mirror_deps.yaml, .github/workflows/operator_release.yaml, .github/workflows/vizier_release.yaml
runs-on changed from oracle-16cpu-64gb-x86-64 to oracle-vm-16cpu-64gb-x86-64 in the build-release and sync_deps jobs across all five workflows.

Bazel UI Build Strict-Env and License Fixes

Layer / File(s) Summary
Shell PATH reordering and webpack deps strict-env wiring
bazel/ui.bzl
ui_shared_cmds_start adds set -x, reorders PATH to place /opt/px_dev/tools/node/bin first, and adds hash -r. The webpack deps run_shell action sets use_default_shell_env = True for strict action env compatibility.
Stamp variable whitelist and yarn build invocation simplification
bazel/ui.bzl
Stamp env export replaced with a whitelist of STABLE_BUILD_TAG and BUILD_TIMESTAMP only. The yarn build_prod output-capture wrapper replaced with a direct invocation that streams output. Webpack bundle run_shell gains use_default_shell_env = True.
License generation strict-env wiring and disallow_missing config
bazel/ui.bzl, tools/licenses/BUILD.bazel
License run_shell action adds use_default_shell_env = True for strict action env compatibility. Both go_licenses and deps_licenses fetch_licenses targets set disallow_missing = False unconditionally, replacing the prior select()-based logic, with a comment explaining missing licenses surface in go_licenses_missing.json.

Kubernetes and Terraform Configuration Updates

Layer / File(s) Summary
Kubernetes ingress and Terraform import configuration formatting
private/cockpit/cloud_ingress.yaml, terraform/kubernetes/auth0/auth0_import.tf
Kubernetes external-dns.alpha.kubernetes.io/hostname annotation reformatted from inline string to folded YAML scalar while preserving the same hostname values. Terraform auth0 import block adjusted at file end.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the three main changes: runner label update, license fatal flag change, and webpack stamp variable word-split fix.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, detailing the three CI fixes with clear explanations of why each change was needed.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch entlein/ci-release-fixes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@bazel/ui.bzl`:
- Line 46: The yarn install command is redirecting its output to build.log
instead of streaming directly, which prevents error messages from appearing in
CI logs when the installation fails. Remove the `&> build.log` redirect from the
yarn install command so its output streams directly to the console, consistent
with how the yarn build_prod command on line 108 is handled. This ensures CI
logs capture actual error messages if yarn install fails.
- Around line 83-93: Add quotes around the captured value in both sed
replacement patterns to ensure the exported environment variables are properly
quoted. In both sed command patterns (one for ctx.info_file.path and one for
ctx.version_file.path), modify the replacement string from export \\1=\\2 to
export \\1="\\2" to defensively quote the value portion against potential spaces
or special characters in future format changes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: b12b9dd9-d2ad-429e-8323-63f787285ca5

📥 Commits

Reviewing files that changed from the base of the PR and between 051422a and 95c7a2a.

📒 Files selected for processing (7)
  • .github/workflows/cli_release.yaml
  • .github/workflows/cloud_release.yaml
  • .github/workflows/mirror_deps.yaml
  • .github/workflows/operator_release.yaml
  • .github/workflows/vizier_release.yaml
  • bazel/ui.bzl
  • tools/licenses/BUILD.bazel

Comment thread bazel/ui.bzl Outdated
Comment thread bazel/ui.bzl
Reviewer caught a dumb belt-and-braces I added when chasing the wrong
diagnosis earlier today. The real fix for "yarn: command not found"
was a) the export PATH=/opt/px_dev/tools/node/bin:... + use_default_
shell_env=True at the action level (which IS the right fix and stays),
and b) the whitelist sed for STABLE_BUILD_TAG / BUILD_TIMESTAMP that
unblocked everything. The /opt/px_dev/tools/node/bin/yarn absolute-
path commands were redundant once those landed, and they hardcode an
install path nobody outside this dev image has.

Revert the four absolute-path invocations (yarn install / yarn
build_prod / yarn license_check / yarn pnpify) back to bare `yarn`.
The export PATH at the top of ui_shared_cmds_start makes them resolve
fine on the dev image and on any host that has yarn on PATH.

Refresh the stale comment on the yarn build_prod line so it explains
the PATH export instead of pretending we still need the absolute path.

@entlein entlein left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WHY are you using whitelist again, how many times a day must i tell you to use the word "allowlist"

Comment thread bazel/ui.bzl

cmd = ui_shared_cmds_start + cp_cmds + [
'pushd "$TMPPATH/src/ui" &> /dev/null',
"yarn install --immutable &> build.log",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this

Comment thread bazel/ui.bzl Outdated
'tar -xzf "$BASE_PATH/{}"'.format(ctx.file.deps.path),
"yarn license_check --excludePrivatePackages --production --json --out $LIC_TMPPATH/checker.json",
'yarn pnpify node ./tools/licenses/yarn_license_extractor.js --input=$LIC_TMPPATH/checker.json --output="$BASE_PATH/{}"'.format(out.path),
"/opt/px_dev/tools/node/bin/yarn license_check --excludePrivatePackages --production --json --out $LIC_TMPPATH/checker.json",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this

"//bazel:stamped": True,
"//conditions:default": False,
}),
disallow_missing = False,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Drop the verbose explanatory comments added in this branch (set -x,
use_default_shell_env, stamp allowlist sed, license drift, yarn PATH).
Renamed the one remaining 'whitelist' wording to 'allowlist' per
styleguide/inclusive_naming_guide.md before stripping.

Lint fixes:
- private/cockpit/cloud_ingress.yaml: wrap long external-dns hostname
  annotation onto two lines (line-length).
- terraform/kubernetes/auth0/auth0_import.tf: drop trailing newline.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants