feat(client): self-upgrade CronJob (closes #69) by saadqbal · Pull Request #89 · tracebloc/client

saadqbal · 2026-04-29T12:59:48Z

Summary

Closes Helm chart auto-update — deployed clients stay frozen on installed version #69. Ships a <release>-auto-upgrade CronJob that polls https://tracebloc.github.io/client daily and runs helm upgrade --reuse-values when a newer chart version is published, so deployed clients no longer freeze on the version they first installed.
Implements option B (auto-upgrade) from the issue — picked over notify-and-approve / hybrid because the issue's whole point is "customers don't run upgrade manually".
Bumps chart 1.2.3 → 1.3.0.

What's in the chart now

Resource	Name	Notes
`CronJob`	`<release>-auto-upgrade`	`concurrencyPolicy: Forbid`, default schedule `23 2 * * *` (UTC), `backoffLimit: 2`
`ConfigMap`	`<release>-auto-upgrade`	Holds the upgrade shell script
`ServiceAccount`	`<release>-auto-upgrade`	In the release namespace
`ClusterRoleBinding`	`<release>-auto-upgrade`	Bound to the built-in `cluster-admin`

The Pod satisfies PSA restricted: runAsNonRoot: true, runAsUser/Group: 1000, RuntimeDefault seccomp, allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, capabilities.drop: [ALL]. HOME and HELM_*_HOME are redirected into a /tmp emptyDir so helm can write its caches.

The version compare uses sort -V, not string compare — 1.10.0 > 1.9.0.

Trade-offs surfaced for review

Default enabled: true. The issue is "customers freeze on the version they installed". Default-off recreates the bug for everyone who installs 1.3.0 and never re-runs helm. Operator can opt out via autoUpgrade.enabled: false, or pause via autoUpgrade.suspend: true.
cluster-admin not a narrow custom role. The chart already templates cluster-scoped resources (PriorityClass, StorageClass, ClusterRole/Binding, optionally Namespace). A curated role would silently break the day a future chart version adds a new resource kind on already-deployed clients. Trust boundary is documented in values.yaml next to the autoUpgrade: block.
Backend reporting deferred. The issue mentions reporting back to the tracebloc backend so the workspace UI knows what version each customer runs. No endpoint exists yet — will land as a follow-up once the contract is defined. Not blocking the upgrade loop.
Image pinned to alpine/helm:3.16.4. No jq dep — script parses helm's YAML output with awk only. latest is rejected by the schema so behaviour can't drift from under us.

Test plan

helm lint client clean (default + AKS + bare-metal value files)
helm template stg client … renders 4 new docs by default; renders 0 with autoUpgrade.enabled=false
helm unittest client — 12 suites, 116 tests pass (11 of those new in tests/auto_upgrade_test.yaml)
Reviewer to verify on a live cluster (out of scope for this PR but the natural next step):
- helm install 1.3.0 fresh; observe <release>-auto-upgrade CronJob created
- Manually trigger the Job (kubectl create job … --from=cronjob/<release>-auto-upgrade); confirm it logs already at latest; nothing to do
- Publish a 1.3.1 to gh-pages; trigger the Job again; observe helm upgrade complete: 1.3.0 -> 1.3.1
- Verify reused values: helm get values <release> still shows clientId / clientPassword (not stripped by --reuse-values)
- Verify on a bare-metal cluster (no CSI) since the script's helm upgrade --wait --timeout 10m interacts with PVC bind timing

Follow-ups (separate tickets)

Backend reporting endpoint + auth wiring (chart side: optional autoUpgrade.reportUrl).
Optional per-customer override on what major-version bumps trigger auto-apply vs notify (option C / hybrid). Not needed for v1.

🤖 Generated with Claude Code

Note

High Risk
Introduces a default-on CronJob that can mutate cluster-scoped resources and is bound to cluster-admin, so a compromised pod or chart repo could lead to full cluster takeover.

Overview
Adds an auto-upgrade mechanism to the client Helm chart: when autoUpgrade.enabled (default true) it installs a <release>-auto-upgrade ConfigMap+CronJob that periodically checks the published chart repo and runs helm upgrade --reset-then-reuse-values to move the release to the latest chart version.

This also introduces the supporting ServiceAccount + ClusterRoleBinding (bound to built-in cluster-admin), new autoUpgrade values and schema validation (including rejecting image.tag: latest), updates install notes/migration docs, adds helm-unittest coverage, and bumps the chart/app version to 1.3.0.

^{Reviewed by Cursor Bugbot for commit b402aca. Bugbot is set up for automated code reviews on this repo. Configure here.}

Ship a chart-side CronJob that polls the published Helm repo daily and runs `helm upgrade --reuse-values` when a newer chart version is available, so deployed clients no longer freeze on the version they first installed and miss security/stability fixes. - New templates: auto-upgrade-cronjob.yaml (ConfigMap + CronJob), auto-upgrade-rbac.yaml (ServiceAccount + ClusterRoleBinding to the built-in cluster-admin ClusterRole). - New values: autoUpgrade.{enabled, schedule, repoUrl, repoName, chartName, timeout, suspend, successfulJobsHistoryLimit, failedJobsHistoryLimit, startingDeadlineSeconds, image, resources}; default ON. - Pod satisfies PSA restricted (runAsNonRoot, dropped caps, RO root, RuntimeDefault seccomp); HOME/HELM_*_HOME redirected to a tmp emptyDir. - Version compare uses sort -V so 1.10 > 1.9. - Bumps chart 1.2.3 -> 1.3.0; MIGRATION.md documents how to opt out. Cluster-admin (rather than a curated narrow role) keeps the upgrader robust: the chart already templates cluster-scoped resources (PriorityClass, StorageClass, ClusterRole/Binding, optionally Namespace), so a narrower role would silently break the day a future chart adds a new resource kind. Operators who want tighter posture can disable the feature and run `helm upgrade` manually. Backend reporting from the issue is intentionally deferred — no endpoint exists yet; will land as a follow-up once the contract is defined.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit fb5210b. Configure here.}

Caught during the #69 verification on tb-client-dev-templates: a dry-run 1.1.0 -> 1.3.0 helm upgrade with --reuse-values fails with template: client/templates/auto-upgrade-rbac.yaml:1:14: executing "..." at <.Values.autoUpgrade.enabled>: nil pointer evaluating interface {}.enabled because --reuse-values reuses the previous release's COMPUTED values, not just user overrides — so any top-level key added to values.yaml in the upgraded chart (autoUpgrade in 1.3.0, anything similar in future bumps) is absent from the merged values when rendering and the new templates blow up. --reset-then-reuse-values (helm 3.14+, available in our pinned alpine/helm 3.16.4 image) resets to the new chart's defaults, then layers the customer's user-supplied values on top — operator overrides like clientId, dockerRegistry creds, or autoUpgrade.enabled=false are preserved while new defaults flow through. - Switch the in-chart upgrade script to --reset-then-reuse-values. - Update the unit test to assert the corrected flag. - MIGRATION.md: tell operators to use the same flag for the manual 1.x -> 1.3.0 jump (subsequent chart bumps will go through the CronJob, which now uses the right flag itself).

saadqbal · 2026-04-29T14:45:01Z

Dev-cluster verification — `tb-client-dev-templates` (EKS, eu-central-1)

Verified end-to-end against the existing tracebloc release in tracebloc-templates (was on client-1.1.0).

One bug caught + fixed during verification

A dry-run of the manual 1.1.0 → 1.3.0 jump with plain --reuse-values failed:

template: client/templates/auto-upgrade-rbac.yaml:1:14:
executing "..." at <.Values.autoUpgrade.enabled>:
nil pointer evaluating interface {}.enabled

Root cause: --reuse-values reuses the previous release's computed values, so any new top-level key added in the upgraded chart (here, the whole autoUpgrade block) is missing from the merged values and templates that reference it blow up. Fix in commit b402aca:

In-chart upgrade script switched to --reset-then-reuse-values (helm 3.14+, available in our pinned alpine/helm:3.16.4)
MIGRATION.md tells operators to use the same flag for the manual 1.x → 1.3.0 jump
Subsequent chart bumps go through the CronJob, which now uses the right flag itself

Verification results (with the fix)

helm upgrade tracebloc ./client -n tracebloc-templates --reset-then-reuse-values --wait:

✅ Release moved 1.1.0 → 1.3.0, status deployed (revision 8)
✅ Pre-existing mysql-client and tracebloc-jobs-manager pods stayed Running, no restart (non-disruptive upgrade)
✅ PVCs unchanged (client-pvc, client-logs-pvc, mysql-pvc all bound)
✅ NOTES.txt prints the new Auto-upgrade: ON … line

Auto-upgrade resources:

cronjob.batch/tracebloc-auto-upgrade   23 2 * * *   <none>   False   0   <none>   2m26s
serviceaccount/tracebloc-auto-upgrade  0   2m38s
configmap/tracebloc-auto-upgrade       1   2m34s
clusterrolebinding/tracebloc-auto-upgrade   ClusterRole/cluster-admin   2m32s

Manually fired the Job (kubectl create job --from=cronjob/tracebloc-auto-upgrade …); Pod ran under PSA restricted (non-root, RO root, dropped caps, RuntimeDefault seccomp) and exited Succeeded with these logs:

[auto-upgrade] release=tracebloc namespace=tracebloc-templates repo=https://tracebloc.github.io/client
[auto-upgrade] current=1.3.0 latest=1.2.3
[auto-upgrade] deployed version is ahead of repo (current=1.3.0 > latest=1.2.3); skipping

That exercises every code path the CronJob will execute against the public repo (helm repo add/update, search-repo YAML parse, helm-list YAML parse, semver compare via sort -V); the only branch-specific line for the "newer found" case is the helm upgrade --reset-then-reuse-values --version $LATEST invocation, which we just exercised manually for the bootstrap upgrade itself.

Ready for review.

saadqbal requested a review from saqlainsyed007 as a code owner April 29, 2026 12:59

saadqbal self-assigned this Apr 29, 2026

cursor Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread client/templates/auto-upgrade-cronjob.yaml Outdated

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(client): self-upgrade CronJob (closes #69)#89

feat(client): self-upgrade CronJob (closes #69)#89
saadqbal wants to merge 2 commits intodevelopfrom
feat/auto-upgrade-cronjob

saadqbal commented Apr 29, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

saadqbal commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

saadqbal commented Apr 29, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's in the chart now

Trade-offs surfaced for review

Test plan

Follow-ups (separate tickets)

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

saadqbal commented Apr 29, 2026

Dev-cluster verification — tb-client-dev-templates (EKS, eu-central-1)

One bug caught + fixed during verification

Verification results (with the fix)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

saadqbal commented Apr 29, 2026 •

edited by cursor Bot

Loading

Dev-cluster verification — `tb-client-dev-templates` (EKS, eu-central-1)