Skip to content

proposal: self-managed AppProject + centralised app version pinning#5

Open
MWest2020 wants to merge 1 commit intomainfrom
proposal/platform-self-managed-and-centralized-versions
Open

proposal: self-managed AppProject + centralised app version pinning#5
MWest2020 wants to merge 1 commit intomainfrom
proposal/platform-self-managed-and-centralized-versions

Conversation

@MWest2020
Copy link
Copy Markdown
Member

Summary

OpenSpec proposal bundling three coupled platform improvements. Not implemented yet — this PR is for review of the design and tasks before any code lands.

  • Self-managing AppProject — bootstrap Application in the built-in default project eliminates manual kubectl apply drift (root cause of the 6 May 2026 fan-out incident: nc-* deny window committed in 939db3d, sat unapplied for 2 days).
  • Centralised app version pinning — top-level appVersions.* in common.yaml / env/*.yaml as platform default; existing tenant.apps.versions.* stays as the tenant-level override. Asymmetry intentional and CI-enforced (D9): top-level appVersions: in a tenant file fails validation.
  • Wave-gated rolloutnextcloud.platform/wave label on Applications enables argocd app sync -l nextcloud.platform/wave=N for serialized fleet bumps. Phased promotion documented in docs/ROLLOUTS.md. No auto-promotion.

Artifacts

All under openspec/changes/platform-self-managed-and-centralized-versions/:

  • proposal.md — why + capabilities + impact
  • design.md — 9 decisions with alternatives, risks table, 3-commit migration plan, pre-flight rollback drill on canary-accept
  • specs/argo-self-managed-appproject/spec.md
  • specs/app-version-pinning/spec.md
  • specs/wave-gated-rollout/spec.md
  • tasks.md — implementation checklist, 3 commits + final verification

Key decisions worth attention in review

  • D1 / D2 — Bootstrap Application lives in default, not nextcloud-platform. prune: false and preserveResourcesOnDeletion: true to prevent catastrophic AppProject prune.
  • D3 — Split current nextcloud-platform.yaml into AppProject + Namespace as separate single-doc files before introducing the bootstrap.
  • D5 — Two-key model (asymmetric on purpose). appVersions.* for platform / env, tenant.apps.versions.* for tenant overrides. No deprecation. Helm template uses sprig dig to resolve safely.
  • D9 — Validation script errors on top-level appVersions: in any tenant file, making the asymmetry a contract, not a convention.
  • No auto-promotion (D7) — operator-gated wave sync only.

Implementation order (when approved)

Three commits, each verifiable independently:

  1. Bootstrap Application + Namespace split + incident doc (low risk, isolated)
  2. Centralised pinning + Helm template + wave label (touches values + ApplicationSet — pre-flight rollback drill on canary-accept against a feature branch before merge to main)
  3. Docs + audit-only scripts/audit-tenant-pins.sh + memory update

Out of scope

  • Replacing the install hook in common.yaml
  • canary-prod stateless-HA workstream (separate proposal)
  • Postgres → MariaDB migrations
  • Auto-promotion between waves
  • Setting capacity thresholds

Test plan

  • Review of design D1–D9, in particular the two-key asymmetry (D5) and the validation contract (D9)
  • Review of the 3-commit migration plan, especially the pre-flight rollback drill scoped to canary-accept
  • Sanity check on the nextcloud.platform/wave label format and confirm it does not collide with any other tooling
  • Confirm the incident doc plan for docs/incidents/2026-05-06-common-yaml-fanout.md is acceptable in scope and sensitivity
  • Confirm timing for landing of Commit 1 (platform-touching → Mon–Thu 17:00–07:00 Amsterdam window per CLAUDE.md)

…n pinning

Bundle three coupled platform improvements as one OpenSpec change for review:

1. Self-managing AppProject via a bootstrap Application in the built-in
   `default` project — eliminates the manual `kubectl apply` drift that
   caused the 6 May 2026 fan-out incident (deny window committed in
   939db3d sat unapplied for 2 days).

2. Centralised app version pinning using a two-key model: top-level
   `appVersions.*` in common/env layers as the platform default, and the
   existing nested `tenant.apps.versions.*` for tenant-level overrides.
   Asymmetry is intentional and CI-enforced (D9): a tenant file with a
   top-level `appVersions:` block fails validation.

3. Wave-gated batch rollout: add `nextcloud.platform/wave` label so
   operators can serialize fleet-wide bumps via
   `argocd app sync -l nextcloud.platform/wave=N`. Phased promotion
   protocol (canary → accept → prod-per-wave) documented in
   `docs/ROLLOUTS.md`, no auto-promotion.

Artifacts:
  - proposal.md (why)
  - design.md (9 decisions, risks table, 3-commit migration plan,
    pre-flight rollback drill on canary-accept)
  - specs/{argo-self-managed-appproject,app-version-pinning,
    wave-gated-rollout}/spec.md (testable requirements)
  - tasks.md (implementation checklist)

Not implemented yet — proposal is for team review. Implementation runs
via /opsx:apply once approved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@MWest2020 MWest2020 requested a review from rjzondervan May 6, 2026 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant