feat(client): self-upgrade CronJob (closes #69)#89
Conversation
Ship a chart-side CronJob that polls the published Helm repo daily and runs
`helm upgrade --reuse-values` when a newer chart version is available, so
deployed clients no longer freeze on the version they first installed and
miss security/stability fixes.
- New templates: auto-upgrade-cronjob.yaml (ConfigMap + CronJob),
auto-upgrade-rbac.yaml (ServiceAccount + ClusterRoleBinding to the
built-in cluster-admin ClusterRole).
- New values: autoUpgrade.{enabled, schedule, repoUrl, repoName, chartName,
timeout, suspend, successfulJobsHistoryLimit, failedJobsHistoryLimit,
startingDeadlineSeconds, image, resources}; default ON.
- Pod satisfies PSA restricted (runAsNonRoot, dropped caps, RO root,
RuntimeDefault seccomp); HOME/HELM_*_HOME redirected to a tmp emptyDir.
- Version compare uses sort -V so 1.10 > 1.9.
- Bumps chart 1.2.3 -> 1.3.0; MIGRATION.md documents how to opt out.
Cluster-admin (rather than a curated narrow role) keeps the upgrader
robust: the chart already templates cluster-scoped resources
(PriorityClass, StorageClass, ClusterRole/Binding, optionally Namespace),
so a narrower role would silently break the day a future chart adds a new
resource kind. Operators who want tighter posture can disable the feature
and run `helm upgrade` manually.
Backend reporting from the issue is intentionally deferred — no endpoint
exists yet; will land as a follow-up once the contract is defined.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit fb5210b. Configure here.
Caught during the #69 verification on tb-client-dev-templates: a dry-run 1.1.0 -> 1.3.0 helm upgrade with --reuse-values fails with template: client/templates/auto-upgrade-rbac.yaml:1:14: executing "..." at <.Values.autoUpgrade.enabled>: nil pointer evaluating interface {}.enabled because --reuse-values reuses the previous release's COMPUTED values, not just user overrides — so any top-level key added to values.yaml in the upgraded chart (autoUpgrade in 1.3.0, anything similar in future bumps) is absent from the merged values when rendering and the new templates blow up. --reset-then-reuse-values (helm 3.14+, available in our pinned alpine/helm 3.16.4 image) resets to the new chart's defaults, then layers the customer's user-supplied values on top — operator overrides like clientId, dockerRegistry creds, or autoUpgrade.enabled=false are preserved while new defaults flow through. - Switch the in-chart upgrade script to --reset-then-reuse-values. - Update the unit test to assert the corrected flag. - MIGRATION.md: tell operators to use the same flag for the manual 1.x -> 1.3.0 jump (subsequent chart bumps will go through the CronJob, which now uses the right flag itself).
Dev-cluster verification —
|

Summary
<release>-auto-upgradeCronJob that pollshttps://tracebloc.github.io/clientdaily and runshelm upgrade --reuse-valueswhen a newer chart version is published, so deployed clients no longer freeze on the version they first installed.What's in the chart now
CronJob<release>-auto-upgradeconcurrencyPolicy: Forbid, default schedule23 2 * * *(UTC),backoffLimit: 2ConfigMap<release>-auto-upgradeServiceAccount<release>-auto-upgradeClusterRoleBinding<release>-auto-upgradecluster-adminThe Pod satisfies PSA
restricted:runAsNonRoot: true,runAsUser/Group: 1000,RuntimeDefaultseccomp,allowPrivilegeEscalation: false,readOnlyRootFilesystem: true,capabilities.drop: [ALL].HOMEandHELM_*_HOMEare redirected into a/tmpemptyDir so helm can write its caches.The version compare uses
sort -V, not string compare —1.10.0 > 1.9.0.Trade-offs surfaced for review
enabled: true. The issue is "customers freeze on the version they installed". Default-off recreates the bug for everyone who installs 1.3.0 and never re-runs helm. Operator can opt out viaautoUpgrade.enabled: false, or pause viaautoUpgrade.suspend: true.cluster-adminnot a narrow custom role. The chart already templates cluster-scoped resources (PriorityClass,StorageClass,ClusterRole/Binding, optionallyNamespace). A curated role would silently break the day a future chart version adds a new resource kind on already-deployed clients. Trust boundary is documented invalues.yamlnext to theautoUpgrade:block.alpine/helm:3.16.4. Nojqdep — script parses helm's YAML output with awk only.latestis rejected by the schema so behaviour can't drift from under us.Test plan
helm lint clientclean (default + AKS + bare-metal value files)helm template stg client …renders 4 new docs by default; renders 0 withautoUpgrade.enabled=falsehelm unittest client— 12 suites, 116 tests pass (11 of those new intests/auto_upgrade_test.yaml)helm install1.3.0 fresh; observe<release>-auto-upgradeCronJob createdkubectl create job … --from=cronjob/<release>-auto-upgrade); confirm it logsalready at latest; nothing to dogh-pages; trigger the Job again; observehelm upgrade complete: 1.3.0 -> 1.3.1helm get values <release>still showsclientId/clientPassword(not stripped by--reuse-values)helm upgrade --wait --timeout 10minteracts with PVC bind timingFollow-ups (separate tickets)
autoUpgrade.reportUrl).🤖 Generated with Claude Code
Note
High Risk
Introduces a default-on CronJob that can mutate cluster-scoped resources and is bound to
cluster-admin, so a compromised pod or chart repo could lead to full cluster takeover.Overview
Adds an auto-upgrade mechanism to the
clientHelm chart: whenautoUpgrade.enabled(default true) it installs a<release>-auto-upgradeConfigMap+CronJob that periodically checks the published chart repo and runshelm upgrade --reset-then-reuse-valuesto move the release to the latest chart version.This also introduces the supporting
ServiceAccount+ClusterRoleBinding(bound to built-incluster-admin), newautoUpgradevalues and schema validation (including rejectingimage.tag: latest), updates install notes/migration docs, adds helm-unittest coverage, and bumps the chart/app version to1.3.0.Reviewed by Cursor Bugbot for commit b402aca. Bugbot is set up for automated code reviews on this repo. Configure here.