From b9666fec86ec5ac9e0b1ce3a8a14a692c0811832 Mon Sep 17 00:00:00 2001 From: Joshua Gilman Date: Sat, 18 Apr 2026 20:44:00 -0700 Subject: [PATCH] docs: add platform RGD delivery design --- docs/docs/designs/gitops-multi-cluster.md | 66 ++++++-- docs/docs/designs/index.md | 1 + docs/docs/designs/kro-consumption-model.md | 43 ++++- docs/docs/designs/platform-rgd-delivery.md | 183 +++++++++++++++++++++ 4 files changed, 273 insertions(+), 20 deletions(-) create mode 100644 docs/docs/designs/platform-rgd-delivery.md diff --git a/docs/docs/designs/gitops-multi-cluster.md b/docs/docs/designs/gitops-multi-cluster.md index 78b1200..949fc83 100644 --- a/docs/docs/designs/gitops-multi-cluster.md +++ b/docs/docs/designs/gitops-multi-cluster.md @@ -194,6 +194,7 @@ gitops/ │ │ │ └── teamb.yaml │ │ └── applicationsets/ │ │ ├── platform.yaml +│ │ ├── clusters-platform.yaml │ │ ├── clusters-nonprod.yaml │ │ ├── clusters-prod.yaml │ │ ├── teams-nonprod.yaml @@ -211,17 +212,32 @@ gitops/ │ │ ├── teama-appa3/ │ │ ├── teamb-appb1/ │ │ └── teamb-appb2/ -│ └── rgds/ -│ ├── appdeployment.yaml -│ └── teamnamespace.yaml ├── clusters/ +│ ├── platform/ +│ │ ├── platform/ +│ │ │ ├── kro.yaml +│ │ │ ├── rgds-platform.yaml +│ │ │ ├── rgds-apps.yaml +│ │ │ └── platform.yaml +│ │ ├── policies/ +│ │ └── shared/ │ ├── nonprod/ +│ │ ├── platform/ +│ │ │ ├── kro.yaml +│ │ │ ├── rgds-platform.yaml +│ │ │ ├── rgds-apps.yaml +│ │ │ └── platform.yaml │ │ ├── capsule/ │ │ │ ├── teama.yaml │ │ │ └── teamb.yaml │ │ ├── policies/ │ │ └── shared/ │ └── prod/ +│ ├── platform/ +│ │ ├── kro.yaml +│ │ ├── rgds-platform.yaml +│ │ ├── rgds-apps.yaml +│ │ └── platform.yaml │ ├── capsule/ │ │ ├── teama.yaml │ │ └── teamb.yaml @@ -243,8 +259,11 @@ gitops/ The ownership model is: -- `platform/`: platform-cluster state and shared APIs -- `clusters/`: workload-cluster shared state +- `platform/`: platform-cluster control-plane state +- `clusters/*/platform/`: cluster-local `kro` bootstrap, released RGD bundle + installation, and cluster-local `Platform` instances +- `clusters/*/capsule`, `clusters/*/policies`, and `clusters/*/shared`: + workload-cluster shared state - `teams/`: team-owned application instances ## Argo CD Model @@ -253,10 +272,14 @@ One `Argo CD` instance runs on the platform cluster. It syncs: -- `platform/argocd`, `platform/capi`, `platform/kargo`, and `platform/rgds` to - the platform cluster -- `clusters/nonprod` to the `nonprod` cluster -- `clusters/prod` to the `prod` cluster +- `platform/argocd`, `platform/capi`, and `platform/kargo` to the platform + cluster +- `clusters/platform/platform` to the platform cluster +- `clusters/nonprod/platform`, `clusters/nonprod/capsule`, + `clusters/nonprod/policies`, and `clusters/nonprod/shared` to the `nonprod` + cluster +- `clusters/prod/platform`, `clusters/prod/capsule`, + `clusters/prod/policies`, and `clusters/prod/shared` to the `prod` cluster - `teams/*/*/envs/dev`, `teams/*/*/envs/staging`, and `teams/*/*/ephemeral/*` to the `nonprod` cluster - `teams/*/*/envs/prod` to the `prod` cluster @@ -265,8 +288,14 @@ The intended Argo shape is: - one `AppProject` per team - `ApplicationSet` for platform-owned fleet generation +- one admin-owned bootstrap `Application` per cluster for + `clusters//platform/` - `Application` resources kept in the `argocd` namespace +Within each `clusters//platform/` directory, sync waves should order +objects so `kro` installs first, the released RGD bundles install second, and +the cluster-local `Platform` instance is created last. + Do not rely on application CRs scattered across arbitrary namespaces as the default control model. The central `argocd` namespace is simpler unless a later team self-service requirement makes that extra complexity worth it. @@ -277,11 +306,17 @@ team self-service requirement makes that extra complexity worth it. The intended pattern is: -- shared `ResourceGraphDefinition`s live under `platform/rgds/` -- environment-specific custom resources live under `teams/` +- shared RGD source and release lifecycle live in the `platform` repo +- cluster-local RGD bundle installation and cluster-local platform instances + live under `clusters//platform/` +- environment-specific application custom resources live under `teams/` - Argo CD syncs the YAML +- versioned RGD bundles are installed from OCI artifacts - `kro` expands the custom resources into the Kubernetes objects they own +The platform-side release, CUE authoring, and OCI publication model is defined +in [Platform RGD Delivery Model](./platform-rgd-delivery.md). + An environment-specific application resource should be narrow and explicit. For example: @@ -365,9 +400,12 @@ The intended policy is: ## Worked Example: TeamA / AppA1 1. `CAPI` creates the `nonprod` and `prod` workload clusters. -2. `Argo CD` syncs shared platform state to the platform cluster. -3. `Argo CD` syncs Capsule tenants `teama` and `teamb` to `nonprod` and `prod`. -4. `Argo CD` syncs the shared `kro` `ResourceGraphDefinition`s. +2. `Argo CD` syncs shared control-plane state to the platform cluster. +3. `Argo CD` syncs `clusters/platform/platform/`, + `clusters/nonprod/platform/`, and `clusters/prod/platform/`, installing + `kro`, the selected released `platform-rgds` and `apps-rgds` bundles, and + the cluster-local `Platform` instances. +4. `Argo CD` syncs Capsule tenants `teama` and `teamb` to `nonprod` and `prod`. 5. `teams/teama/appa1/envs/dev/app.yaml` defines an `AppDeployment` with namespace `teama-appa1-dev`. 6. CI publishes a new image for `AppA1`. diff --git a/docs/docs/designs/index.md b/docs/docs/designs/index.md index 4738057..277f872 100644 --- a/docs/docs/designs/index.md +++ b/docs/docs/designs/index.md @@ -19,6 +19,7 @@ Current designs: - [Multi-Cluster GitOps Model](./gitops-multi-cluster.md) - [kro Consumption Model](./kro-consumption-model.md) +- [Platform RGD Delivery Model](./platform-rgd-delivery.md) - [App RGD Design](./app-rgd.md) Once a design is implemented and considered durable, its steady-state shape diff --git a/docs/docs/designs/kro-consumption-model.md b/docs/docs/designs/kro-consumption-model.md index 291bb80..1c6a04e 100644 --- a/docs/docs/designs/kro-consumption-model.md +++ b/docs/docs/designs/kro-consumption-model.md @@ -10,8 +10,8 @@ description: Proposed design for how GilmanLab publishes, consumes, and promotes Proposed. This document captures the intended role of `kro` in the lab and the current -working model for how `kro` resources flow from developer-owned source -repositories into environment-specific desired state in the `gitops` repo. +working model for how shared platform-owned APIs and developer-owned release +intent flow into environment-specific desired state in the `gitops` repo. This is an initial design draft. It intentionally does not define concrete `ResourceGraphDefinition` schemas yet. The next design pass should define the @@ -91,7 +91,12 @@ Examples include future APIs such as: - team namespace bootstrap APIs - platform capability APIs -This document does not define those schemas yet. +The source of truth for those shared APIs lives in the `platform` repo, not the +`gitops` repo. The platform team authors and releases those APIs there, then +publishes versioned bundle artifacts that `gitops` can install per cluster. + +The concrete platform-side release and delivery model is described in +[Platform RGD Delivery Model](./platform-rgd-delivery.md). ### Developer-owned application release intent @@ -131,6 +136,9 @@ materialized into the `gitops` repo under environment-specific paths on This means the environment-specific folders in the `gitops` repo are still the final desired state that Argo CD reconciles. +For platform-side APIs, the `gitops` repo also holds cluster-local installation +of released RGD bundles and cluster-local instances such as `Platform`. + ## Relationship to the GitOps Model This design builds on the multi-cluster GitOps model described in @@ -138,8 +146,11 @@ This design builds on the multi-cluster GitOps model described in The important clarification is: -- `kro` API definitions are platform-owned +- `kro` API definitions are platform-owned and released from the `platform` + repo - developer-authored API instances are release inputs +- `gitops` selects released API bundle versions per cluster and carries + cluster-local platform instances - Kargo materializes environment-specific outputs into the `gitops` repo - Argo CD reconciles those outputs from `main` @@ -159,6 +170,24 @@ gitops/ This document does not settle the exact `gitops` folder layout beyond that principle. +The intended platform-side cluster bootstrap shape in `gitops` is: + +```text +gitops/ +└── clusters/ + └── / + └── platform/ + ├── kro.yaml + ├── rgds-platform.yaml + ├── rgds-apps.yaml + └── platform.yaml +``` + +This cluster-local layout installs `kro`, installs selected released RGD +bundles, and declares the cluster-local `Platform` instance. The detailed +release, build, and OCI publication model is described in +[Platform RGD Delivery Model](./platform-rgd-delivery.md). + ## Promotion Model `Kargo` should treat a release as more than an image update. @@ -306,10 +335,12 @@ The current working model is: 3. CI produces an image and a corresponding Git commit. 4. Kargo bundles those artifacts into Freight. 5. Kargo promotes Freight between environments. -6. During promotion, Kargo combines release input with environment-specific +6. The `gitops` repo installs released shared API bundles per cluster and + carries cluster-local platform instances such as `Platform`. +7. During promotion, Kargo combines release input with environment-specific inputs and writes the final resource instances into the environment-specific area of the `gitops` repo on `main`. -7. Argo CD reconciles those final resource instances from the `gitops` repo. +8. Argo CD reconciles those final resource instances from the `gitops` repo. This is the design baseline for the next pass. diff --git a/docs/docs/designs/platform-rgd-delivery.md b/docs/docs/designs/platform-rgd-delivery.md new file mode 100644 index 0000000..1ac9cd2 --- /dev/null +++ b/docs/docs/designs/platform-rgd-delivery.md @@ -0,0 +1,183 @@ +--- +title: Platform RGD Delivery Model +description: Proposed design for authoring, releasing, publishing, and consuming platform-side kro APIs. +--- + +# Platform RGD Delivery Model + +## Status + +Proposed. + +This document defines the platform-side delivery model for shared `kro` +`ResourceGraphDefinition`s. It complements the broader +[kro Consumption Model](./kro-consumption-model.md) by making the platform-owned +API lifecycle concrete without turning the `gitops` repo into the source of +truth for reusable API definitions. + +## Purpose + +The primary purpose of this design is to keep platform API ownership, release +lifecycle, and cluster consumption clearly separated. + +The intended split is: + +- the `platform` repo owns authoring, validation, release notes, and + publication for shared platform APIs +- the `gitops` repo owns cluster-local installation of released API bundles and + cluster-local instances of those APIs + +This lets the lab version and promote platform APIs deliberately, while keeping +cluster bootstrap in Git simple enough to reason about at a glance. + +## Goals + +- Keep reusable platform API source out of the `gitops` repo. +- Give `platform-rgds` and `apps-rgds` independent release trains. +- Publish released RGD bundles as OCI artifacts that Argo CD can install + declaratively. +- Keep the cluster-local bootstrap surface small and explicit. +- Use CUE for build-time authoring and validation without exposing CUE as an + operator-facing runtime interface. + +## Non-Goals + +- This document does not define the exact `Platform` schema. +- This document does not define the full CI workflow YAML for release or + publication. +- This document does not define every future platform capability block. + +## Design Summary + +The intended model is: + +- `platform` owns the source for shared `kro` APIs +- `release-please` orchestrates release PRs, tags, and changelog updates +- `platform-rgds` and `apps-rgds` are released independently +- publish workflows render final YAML artifacts from CUE and push them to OCI + registries with ORAS +- `gitops` installs those released OCI artifacts through Argo CD +- `gitops` also holds the cluster-local `Platform` custom resource that carries + cluster-specific inputs + +## Ownership and Repository Boundaries + +### Platform Repo + +The `platform` repo is the source of truth for shared RGD definitions. + +It owns: + +- CUE authoring input for `platform-rgds` and `apps-rgds` +- validation of rendered RGD artifacts before publication +- release configuration and changelog management +- OCI publication of rendered artifacts + +It does not own cluster-local desired state. + +### GitOps Repo + +The `gitops` repo is the source of truth for which released API bundles a +cluster installs and which cluster-local custom resources should exist there. + +It owns: + +- Argo CD `Application` resources that install `kro` and released RGD bundles +- the cluster-local `Platform` custom resource +- the ordering and composition of those objects during cluster bootstrap + +It does not own raw shared RGD source. + +## Release Model + +The `platform` repo should manage `platform-rgds` and `apps-rgds` as separate +release trains in the same repository. + +The intended flow is: + +1. `release-please` manages release PRs, version bumps, tags, and changelog + updates. +2. `platform-rgds` and `apps-rgds` each advance only when their own changes + require a release. +3. A publish workflow runs after a release is created, renders the final YAML + artifacts, and pushes them as OCI artifacts via ORAS. +4. Cluster operators choose which released version to install by updating the + corresponding Argo CD `Application` in `gitops`. + +This keeps API release history explicit and lets lower environments validate new +bundle versions before higher environments adopt them. + +## Authoring and Build Model + +The intended authoring model for `platform-rgds` is: + +- one public `Platform` RGD +- CUE as the build-time authoring language +- a root package that defines the final public shape +- CUE subpackages for logically ordered platform capability blocks + +The subpackages are intended to correspond to stable blocks of cluster +configuration, such as: + +- core platform defaults +- secrets integration +- networking integration +- bare-metal integration such as `tinkerbell` + +These subpackages are an authoring and validation boundary, not a separate +operator-facing API surface. The published product remains the rendered RGD +YAML artifact. + +CI may import CRDs or equivalent schemas into CUE so the rendered artifact can +be validated structurally before publication. Cluster-side `kro` validation is +still responsible for the final semantic checks when the RGD is created. + +## Cluster Consumption Model + +The intended cluster-local bootstrap surface in `gitops` is: + +```text +clusters//platform/ +├── kro.yaml +├── rgds-platform.yaml +├── rgds-apps.yaml +└── platform.yaml +``` + +Each file has one job: + +- `kro.yaml`: install `kro` itself +- `rgds-platform.yaml`: install the selected released `platform-rgds` OCI + artifact +- `rgds-apps.yaml`: install the selected released `apps-rgds` OCI artifact +- `platform.yaml`: instantiate the cluster-local `Platform` custom resource + +An admin-owned Argo CD bootstrap `Application` should point at +`clusters//platform/` and use sync waves so the order is explicit: + +1. install `kro` +2. install the released RGD bundles +3. create the cluster-local `Platform` instance + +This keeps the cluster bootstrap surface intentionally small and makes the +chosen bundle versions obvious in Git. + +## Relationship to Other Designs + +This design builds on: + +- [Multi-Cluster GitOps Model](./gitops-multi-cluster.md) for cluster topology, + tenancy, and application flow +- [kro Consumption Model](./kro-consumption-model.md) for the ownership split + between platform-owned APIs, developer release intent, and GitOps + materialization + +It does not change the application-side model where developers author `App` +instances alongside application source code and Kargo materializes final +environment-specific resources into `gitops`. + +## Next Step + +The next implementation-oriented design pass should define the initial +`Platform` schema and the first concrete capability block, starting with the +platform-side `tinkerbell` inputs.