diff --git a/README.md b/README.md index 957352e..1ce5007 100644 --- a/README.md +++ b/README.md @@ -31,6 +31,8 @@ moon run docs:start - [`docs/docs/index.md`](docs/docs/index.md): docs landing page - [`docs/docs/architecture.md`](docs/docs/architecture.md): architecture overview +- [`docs/docs/designs/`](docs/docs/designs): proposed design documents that are + not yet part of the settled architecture baseline - [`docs/docs/hardware.md`](docs/docs/hardware.md): hardware inventory - [`docs/docs/network-device-backups.md`](docs/docs/network-device-backups.md): RouterOS backup design for the future platform cluster diff --git a/docs/docs/architecture.md b/docs/docs/architecture.md index fbeb964..b6d1a31 100644 --- a/docs/docs/architecture.md +++ b/docs/docs/architecture.md @@ -325,6 +325,11 @@ This means: - downstream clusters are expected to manage their own services independently - whether a downstream cluster uses `Argo CD` or some other delivery model is left to that cluster's own design +The intended future multi-cluster delivery model is being tracked separately in +[Multi-Cluster GitOps Model](./designs/gitops-multi-cluster.md). Until that +design is implemented, this architecture overview remains intentionally +conservative about downstream-cluster GitOps behavior. + ## Why This Layout This layout separates concerns in a way that matches the intended operating model: diff --git a/docs/docs/designs/app-rgd.md b/docs/docs/designs/app-rgd.md new file mode 100644 index 0000000..e83056c --- /dev/null +++ b/docs/docs/designs/app-rgd.md @@ -0,0 +1,249 @@ +--- +title: App RGD Design +description: Proposed first-pass design for the application-facing kro API. +--- + +# App RGD Design + +## Status + +Proposed. + +This document captures the intended shape of the first application-facing `kro` +API, referred to here as `App`. + +This is not the final schema. The goal of this pass is to make the API shape +clear enough that the next pass can define the actual `ResourceGraphDefinition` +without reopening the high-level model. + +## Summary + +`App` should be the primary developer-facing API for deployable application +workloads. + +The first pass should focus on four concerns: + +- containers +- secrets +- configs +- volumes + +The current design direction is: + +- developers author an `App` instance alongside application source code +- the public API stays small and application-centric +- secrets, configs, and volumes are defined at the point where they are used or + mounted +- secret store selection defaults from the target environment and is not a + normal developer-facing input +- Kargo materializes the final environment-specific `App` instance into the + `gitops` repo on `main` +- Argo CD reconciles that final `App` instance + +## Illustrative Example + +The example below is illustrative only. It exists to make the intended shape +concrete before the actual schema is written. + +```yaml +apiVersion: apps.platform.gilman.io/v1alpha1 +kind: App +metadata: + name: orders-api +spec: + team: teama + app: orders-api + + containers: + - name: api + image: + repository: ghcr.io/gilmanlab/teama/orders-api + digest: sha256:2222222222222222222222222222222222222222222222222222222222222222 + ports: + - name: http + port: 8080 + env: + - name: MAX_PRICE + value: "20" + - name: LOG_LEVEL + value: info + - name: DB_USERNAME + secret: + remoteKey: kv/orders-api + property: username + - name: DB_PASSWORD + secret: + remoteKey: kv/orders-api + property: password + - name: THIRD_PARTY_API_KEY + secret: + remoteKey: kv/shared/third-party + property: apiKey + mounts: + - path: /var/lib/orders + volume: + persistent: + size: 10Gi + - path: /etc/orders + volume: + config: + files: + application.yaml: | + http: + port: 8080 + logging: + format: json + features.yaml: | + enableDiscountsV2: true + - path: /var/run/secrets/orders + volume: + secret: + files: + db-username: + remoteKey: kv/orders-api + property: username + db-password: + remoteKey: kv/orders-api + property: password + third-party-api-key: + remoteKey: kv/shared/third-party + property: apiKey + + - name: worker + image: + repository: ghcr.io/gilmanlab/teama/orders-worker + digest: sha256:4444444444444444444444444444444444444444444444444444444444444444 + env: + - name: LOG_LEVEL + value: info + - name: DB_USERNAME + secret: + remoteKey: kv/orders-api + property: username + - name: DB_PASSWORD + secret: + remoteKey: kv/orders-api + property: password + mounts: + - path: /var/lib/orders + volume: + persistent: + size: 10Gi + + - name: metrics-proxy + image: + repository: ghcr.io/gilmanlab/platform/metrics-proxy + digest: sha256:3333333333333333333333333333333333333333333333333333333333333333 + ports: + - name: metrics + port: 9090 +``` + +This example shows the intended first-pass shape: + +- containers are the main unit of declaration +- config values are attached where they are consumed +- secret references are attached where they are consumed +- volume definitions are attached where they are mounted +- the API describes External Secrets-backed needs without exposing handwritten + Kubernetes `Secret` objects + +It also intentionally leaves some things open: + +- the exact inline shape for config files versus simple env vars +- the exact secret reference shape +- how much container surface is exposed in v1 +- how environment-specific values such as `THIRD_PARTY_URL` are merged during + promotion/materialization + +## Design Rules + +### Containers + +- `App` should support more than one container. +- The single-main-container case should still be the easiest path. +- The public API should model deployable application containers, not raw Pod + templates. +- Security and runtime defaults should be platform-owned wherever practical. + +### Secrets + +- Secrets must align with `ExternalSecret` plus `SecretStore` or + `ClusterSecretStore`. +- Plaintext secret values are out of scope. +- Hand-authored Kubernetes `Secret` manifests are not the primary path. +- Secret store selection should default from the target environment rather than + being a normal developer-facing field. + +### Configs + +- Non-secret config should be distinct from secrets. +- Release-coupled config lives with the developer-authored `App` instance. +- Environment-specific config is added during promotion/materialization. +- The public API should not force developers to author raw `ConfigMap` + manifests. + +### Volumes + +- The public API should use the term `volume`, not `PersistentVolumeClaim`. +- Volume definitions should default to point-of-use declaration at the mount + site. +- True shared-lifecycle named volumes may exist later, but they should not + shape the v1 API around a less common reuse case. + +## Why This Shape + +This design optimizes for local reasoning. + +The expected common case is: + +- one container needs one config value +- one container needs one secret value +- one mount path needs one volume definition + +That means the public API should default to point-of-use declarations instead of +top-level registries and references. + +For the same reason, the API should not expose more of Kubernetes than it needs +to. The goal is a small, opinionated application API, not a renamed PodSpec. + +## Relationship to GitOps and Promotion + +The current working lifecycle is: + +1. A developer authors an `App` instance alongside application source code. +2. CI produces an image and the corresponding Git commit. +3. Kargo bundles the image and commit into Freight. +4. Kargo promotes that Freight into an environment. +5. During promotion, Kargo combines the source `App` instance with any + environment-specific inputs or overrides. +6. Kargo writes the final `App` instance into the destination environment path + in the `gitops` repo on `main`. +7. Argo CD reconciles that final `App` instance. + +This means the `App` API is developer-facing, but the final reconciled instance +is still environment-specific GitOps state. + +## Out of Scope for This Pass + +This document does not define: + +- the final `App` schema +- the exact `ExternalSecret` generation model +- the exact promotion-time composition mechanism +- policy and governance rules that are better enforced by `Kyverno`, + `Capsule`, or similar systems + +## Open Questions + +- What is the smallest useful first schema for the common backend API case? +- How much container surface should v1 expose for commands, args, probes, + resources, and ports? +- What is the minimum honest secret-reference shape for the External Secrets + model? +- Should config files and simple env values use one unified shape or distinct + ones? + +## Next Step + +The next pass should define the actual first schema draft for `App`. diff --git a/docs/docs/designs/gitops-multi-cluster.md b/docs/docs/designs/gitops-multi-cluster.md new file mode 100644 index 0000000..78b1200 --- /dev/null +++ b/docs/docs/designs/gitops-multi-cluster.md @@ -0,0 +1,412 @@ +--- +title: Multi-Cluster GitOps Model +description: Proposed GitOps design for the platform, nonprod, and prod clusters using CAPI, Argo CD, Kargo, kro, and Capsule. +--- + +# Multi-Cluster GitOps Model + +## Status + +Proposed. + +This document describes the intended GitOps model for the lab once the platform +cluster, workload clusters, and application delivery flows are built out. It is +more specific than the architecture overview, but it is still a design rather +than a description of live state. + +Until this design is implemented, the architecture overview remains the source +of truth for the current baseline. + +## Scope + +This design covers four related concerns: + +- platform cluster responsibilities +- downstream cluster creation and management +- long-lived and ephemeral environments +- team and application isolation + +It uses the current planning baseline: + +- `1` platform cluster on the `UM760` +- `1` nonprod cluster +- `1` prod cluster +- long-lived environments: `dev`, `staging`, `prod` +- ephemeral environments for pull requests, load tests, and similar short-lived + work + +The example team layout used throughout this document is: + +- `TeamA` + - `AppA1` + - `AppA2` + - `AppA3` +- `TeamB` + - `AppB1` + - `AppB2` + +## Goals + +- Keep one management cluster responsible for cluster lifecycle and GitOps + control. +- Keep application delivery Git-native: promotion means changing Git, not + mutating clusters directly. +- Reuse `kro` APIs for application delivery instead of Helm or Kustomize + overlays. +- Keep downstream workload clusters strongly isolated from each other. +- Give each team a stable governance boundary without collapsing all of that + team's applications into a single namespace. + +## Non-Goals + +- This document does not define the exact `kro` `ResourceGraphDefinition` schema + for every platform API. +- This document does not define CI pipelines, image signing, or registry + hardening in detail. +- This document does not define every shared service that should run in + `nonprod` or `prod`. +- This document does not treat the current design as implemented reality. + +## Design Summary + +The intended control-plane model is: + +- the platform cluster runs `Argo CD`, `CAPI`, and `Kargo` +- `CAPI` creates the `nonprod` and `prod` workload clusters +- `Argo CD` runs only on the platform cluster and syncs plain YAML to all three + clusters +- `kro` provides the reusable application and platform APIs +- `Capsule` provides the team governance layer in each workload cluster +- `Kargo` promotes applications by editing environment-specific YAML in Git + +The high-level split is: + +- cluster boundary: `platform`, `nonprod`, `prod` +- team boundary: Capsule tenant per team per workload cluster +- workload boundary: namespace per `team-app-env` + +## Cluster Roles + +### Platform Cluster + +The platform cluster is the only management cluster. + +It owns: + +- `Argo CD` +- `CAPI` +- `Kargo` +- shared `kro` APIs +- platform-only controllers and operational services + +It does not host general application workloads by default. + +### Nonprod Cluster + +The `nonprod` cluster hosts: + +- `dev` environments +- `staging` environments +- ephemeral environments such as `pr-123` or `loadtest-001` +- any nonprod shared services that belong in the workload plane instead of the + platform plane + +### Prod Cluster + +The `prod` cluster hosts: + +- `prod` application environments +- prod shared services and policy + +## Namespace and Team Model + +Namespaces use the template: + +```text +team-app-env +``` + +Examples: + +- `teama-appa1-dev` +- `teama-appa1-staging` +- `teama-appa1-prod` +- `teama-appa1-pr-123` + +This keeps each application instance isolated at the namespace boundary. + +Do not use one namespace per team for all of that team's applications. That +would couple unrelated apps at the secrets, RBAC, quota, and blast-radius +layers. + +### Capsule + +Each workload cluster runs Capsule. + +Capsule tenants are cluster-local, so the intended shape is: + +- `nonprod`: + - `Tenant/teama` + - `Tenant/teamb` +- `prod`: + - `Tenant/teama` + - `Tenant/teamb` + +This means there is one logical team boundary across the lab, implemented as +one tenant object per team per workload cluster. + +`dev` and `staging` do not currently need separate team governance layers. +They share the same team-level Capsule tenant in `nonprod`, while remaining +separate per-app namespaces. + +## Environment Model + +Environments are not modeled as Helm values files or Kustomize overlays. + +Instead, each environment is a concrete instance of the same `kro` API. +Reuse lives in shared `ResourceGraphDefinition`s, while environment-specific +differences live in environment-specific custom resources. + +For example, `AppA1` should have separate resources for: + +- `teams/teama/appa1/envs/dev/app.yaml` +- `teams/teama/appa1/envs/staging/app.yaml` +- `teams/teama/appa1/envs/prod/app.yaml` + +Each file is small because the heavy lifting lives in the shared `kro` API. + +Ephemeral environments follow the same pattern under `ephemeral/`, for example: + +- `teams/teama/appa1/ephemeral/pr-123/app.yaml` + +## GitOps Repository Layout + +The intended `gitops` repository shape is: + +```text +gitops/ +├── platform/ +│ ├── argocd/ +│ │ ├── bootstrap.yaml +│ │ ├── projects/ +│ │ │ ├── platform.yaml +│ │ │ ├── teama.yaml +│ │ │ └── teamb.yaml +│ │ └── applicationsets/ +│ │ ├── platform.yaml +│ │ ├── clusters-nonprod.yaml +│ │ ├── clusters-prod.yaml +│ │ ├── teams-nonprod.yaml +│ │ └── teams-prod.yaml +│ ├── capi/ +│ │ ├── providers/ +│ │ ├── clusterclasses/ +│ │ └── clusters/ +│ │ ├── nonprod/ +│ │ └── prod/ +│ ├── kargo/ +│ │ └── projects/ +│ │ ├── teama-appa1/ +│ │ ├── teama-appa2/ +│ │ ├── teama-appa3/ +│ │ ├── teamb-appb1/ +│ │ └── teamb-appb2/ +│ └── rgds/ +│ ├── appdeployment.yaml +│ └── teamnamespace.yaml +├── clusters/ +│ ├── nonprod/ +│ │ ├── capsule/ +│ │ │ ├── teama.yaml +│ │ │ └── teamb.yaml +│ │ ├── policies/ +│ │ └── shared/ +│ └── prod/ +│ ├── capsule/ +│ │ ├── teama.yaml +│ │ └── teamb.yaml +│ ├── policies/ +│ └── shared/ +└── teams/ + ├── teama/ + │ ├── appa1/ + │ │ ├── envs/dev/app.yaml + │ │ ├── envs/staging/app.yaml + │ │ ├── envs/prod/app.yaml + │ │ └── ephemeral/pr-123/app.yaml + │ ├── appa2/ + │ └── appa3/ + └── teamb/ + ├── appb1/ + └── appb2/ +``` + +The ownership model is: + +- `platform/`: platform-cluster state and shared APIs +- `clusters/`: workload-cluster shared state +- `teams/`: team-owned application instances + +## Argo CD Model + +One `Argo CD` instance runs on the platform cluster. + +It syncs: + +- `platform/argocd`, `platform/capi`, `platform/kargo`, and `platform/rgds` to + the platform cluster +- `clusters/nonprod` to the `nonprod` cluster +- `clusters/prod` to the `prod` cluster +- `teams/*/*/envs/dev`, `teams/*/*/envs/staging`, and + `teams/*/*/ephemeral/*` to the `nonprod` cluster +- `teams/*/*/envs/prod` to the `prod` cluster + +The intended Argo shape is: + +- one `AppProject` per team +- `ApplicationSet` for platform-owned fleet generation +- `Application` resources kept in the `argocd` namespace + +Do not rely on application CRs scattered across arbitrary namespaces as the +default control model. The central `argocd` namespace is simpler unless a later +team self-service requirement makes that extra complexity worth it. + +## kro Model + +`kro` is the abstraction layer for reusable platform and application APIs. + +The intended pattern is: + +- shared `ResourceGraphDefinition`s live under `platform/rgds/` +- environment-specific custom resources live under `teams/` +- Argo CD syncs the YAML +- `kro` expands the custom resources into the Kubernetes objects they own + +An environment-specific application resource should be narrow and explicit. For +example: + +```yaml +apiVersion: apps.platform.gilman.io/v1alpha1 +kind: AppDeployment +metadata: + name: appa1 +spec: + team: teama + app: appa1 + env: dev + namespace: teama-appa1-dev + image: + repository: ghcr.io/gilmanlab/teama/appa1 + digest: sha256:... + routing: + host: appa1.dev.apps.lab.gilman.io +``` + +The shared `AppDeployment` API should stamp stable labels such as: + +- `glab.gilman.io/team` +- `glab.gilman.io/app` +- `glab.gilman.io/env` + +## CAPI Model + +`CAPI` owns workload-cluster lifecycle. + +The intended CAPI responsibilities are: + +- install and manage cluster API providers +- define reusable cluster classes +- create and scale `nonprod` and `prod` +- keep workload-cluster creation separate from application promotion + +This keeps: + +- cluster lifecycle in `CAPI` +- desired-state reconciliation in `Argo CD` +- application promotion in `Kargo` + +## Kargo Model + +`Kargo` runs on the platform cluster. + +There should be one Kargo project per application pipeline: + +- `teama-appa1` +- `teama-appa2` +- `teama-appa3` +- `teamb-appb1` +- `teamb-appb2` + +The intended durable stages are: + +- `dev` +- `staging` +- `prod` + +Ephemeral environments are intentionally outside the long-lived promotion graph. +They should be created and destroyed by automation that adds or removes the +corresponding YAML from `teams/.../ephemeral/...`. + +Promotion means editing the environment-specific resource in Git. A narrow field +such as `spec.image.digest` is the preferred promotion target. + +For `AppA1`, the promotion targets are: + +- `teams/teama/appa1/envs/dev/app.yaml` +- `teams/teama/appa1/envs/staging/app.yaml` +- `teams/teama/appa1/envs/prod/app.yaml` + +The intended policy is: + +- `dev`: automatic promotion is acceptable +- `staging`: automatic promotion is acceptable +- `prod`: promotion should require an explicit approval step + +## Worked Example: TeamA / AppA1 + +1. `CAPI` creates the `nonprod` and `prod` workload clusters. +2. `Argo CD` syncs shared platform state to the platform cluster. +3. `Argo CD` syncs Capsule tenants `teama` and `teamb` to `nonprod` and `prod`. +4. `Argo CD` syncs the shared `kro` `ResourceGraphDefinition`s. +5. `teams/teama/appa1/envs/dev/app.yaml` defines an `AppDeployment` with + namespace `teama-appa1-dev`. +6. CI publishes a new image for `AppA1`. +7. `Kargo` detects the new artifact and updates + `teams/teama/appa1/envs/dev/app.yaml`. +8. `Argo CD` syncs that file to `nonprod`. +9. `kro` expands the `AppDeployment` into the namespace and workload resources + for `teama-appa1-dev`. +10. After validation, `Kargo` updates + `teams/teama/appa1/envs/staging/app.yaml`. +11. `Argo CD` syncs the staging instance to `nonprod` namespace + `teama-appa1-staging`. +12. After approval, `Kargo` updates + `teams/teama/appa1/envs/prod/app.yaml`. +13. `Argo CD` syncs the prod instance to the `prod` cluster namespace + `teama-appa1-prod`. + +## Open Questions + +- Which `kro` APIs should exist first beyond `AppDeployment` and + team-namespace bootstrap? +- Should `Argo CD` generate one application per environment directory, one per + app, or one per team/app/cluster boundary? +- Which shared services belong under `clusters/nonprod/shared` and + `clusters/prod/shared` versus platform-wide control-plane management? +- What is the exact cluster-registration flow from `CAPI` outputs into Argo CD + destinations? + +## Migration into Architecture Docs + +This document should be folded into the architecture overview after the +following are true: + +- the `gitops` repo structure exists in a stable form +- the platform cluster is actually running `Argo CD`, `CAPI`, and `Kargo` +- the `nonprod` and `prod` clusters exist under `CAPI` +- at least one real application has exercised the `dev` -> `staging` -> `prod` + flow + +At that point, the architecture overview should be updated so it describes the +steady-state GitOps model directly, and this design document can either be +trimmed or kept as historical design context. diff --git a/docs/docs/designs/index.md b/docs/docs/designs/index.md new file mode 100644 index 0000000..4738057 --- /dev/null +++ b/docs/docs/designs/index.md @@ -0,0 +1,25 @@ +--- +title: Design Documents +description: Proposed designs that are not yet part of the settled GilmanLab architecture baseline. +slug: /designs/ +--- + +# Design Documents + +This section holds proposed designs that are specific enough to guide +implementation, but are not yet part of the settled architecture baseline. + +Use these documents when: + +- a design is clear enough to review formally +- the target implementation does not exist yet +- the architecture overview should stay conservative until the design is proven + +Current designs: + +- [Multi-Cluster GitOps Model](./gitops-multi-cluster.md) +- [kro Consumption Model](./kro-consumption-model.md) +- [App RGD Design](./app-rgd.md) + +Once a design is implemented and considered durable, its steady-state shape +should be folded back into the architecture overview and any relevant runbooks. diff --git a/docs/docs/designs/kro-consumption-model.md b/docs/docs/designs/kro-consumption-model.md new file mode 100644 index 0000000..291bb80 --- /dev/null +++ b/docs/docs/designs/kro-consumption-model.md @@ -0,0 +1,334 @@ +--- +title: kro Consumption Model +description: Proposed design for how GilmanLab publishes, consumes, and promotes kro-based application and platform APIs. +--- + +# kro Consumption Model + +## Status + +Proposed. + +This document captures the intended role of `kro` in the lab and the current +working model for how `kro` resources flow from developer-owned source +repositories into environment-specific desired state in the `gitops` repo. + +This is an initial design draft. It intentionally does not define concrete +`ResourceGraphDefinition` schemas yet. The next design pass should define the +first public APIs in detail. + +## Purpose + +The primary purpose of `kro` in this lab is two-fold: + +- give software engineers a common abstraction for deploying applications +- give platform engineers a common abstraction for maintaining platform + capabilities + +In practice, these are closely related. Platform engineers and software +engineers are both describing and operating software systems through the same +API layer. The difference is usually ownership and blast radius, not the +fundamental shape of the task. + +`kro` is therefore the public API layer for both application delivery and +platform delivery. + +## Design Principles + +The `kro` API layer should be standardized and opinionated. + +The goal is not to recreate the full input surface of the underlying Kubernetes +objects. The goal is to publish a smaller, clearer API that is safer and easier +to reason about. + +The design principles are: + +- prefer a minimal public interface over a mechanically complete one +- hardcode platform-controlled values when that improves security, + consistency, or operational correctness +- prefer defaults that "do the right thing" and reduce cognitive load +- keep required inputs minimal +- constrain mutable inputs wherever reasonable +- expose escape hatches only when there is a real need, not as a default design + posture + +The expected result is that a developer should be able to use a platform API +without needing to understand the full Kubernetes object model underneath it. + +## Terminology + +Public `kro` APIs should prefer common software engineering terms over +specialized Kubernetes terms where doing so improves clarity. + +Examples: + +- prefer `volume` over `PersistentVolumeClaim` in a public API +- prefer application-level language like `service`, `route`, `database`, or + `worker` when those terms are understandable without Kubernetes-specific + context + +This is not a strict ban on Kubernetes terminology. Some Kubernetes concepts, +such as `Deployment`, are already self-explanatory to many users. The important +point is that public APIs should be shaped for the intended consumers, not as +thin wrappers around upstream object names. + +## API Ownership Model + +There are three important ownership layers: + +- platform-owned API definitions +- developer-owned application release intent +- environment-owned deployment materialization + +### Platform-owned API definitions + +The platform layer owns the `ResourceGraphDefinition`s and any shared conventions +that define what public APIs exist and how they behave. + +Examples include future APIs such as: + +- application delivery APIs +- team namespace bootstrap APIs +- platform capability APIs + +This document does not define those schemas yet. + +### Developer-owned application release intent + +Developers should define instances of platform APIs alongside the application +source code that they belong to. + +For example: + +```text +project/ +├── src/ +├── Dockerfile +└── deployment.yaml +``` + +In this model, `deployment.yaml` is not the `RGD`. It is an instance of a +platform-owned API. + +This file should be treated as part of the application's release intent and +versioned alongside the application source and image build inputs. + +Important: `App` is only one example of a future public API. This model does +not imply that developers will only ever instantiate a single kind of `kro` +resource. + +### Environment-owned deployment materialization + +Argo CD still needs a Git source of truth to reconcile from. + +In this design, that source of truth remains the `gitops` repo on its mainline +branch. Argo CD does not read directly from developer application repositories. + +Therefore, the final environment-specific `kro` resource instances must be +materialized into the `gitops` repo under environment-specific paths on +`main`/`master`. + +This means the environment-specific folders in the `gitops` repo are still the +final desired state that Argo CD reconciles. + +## Relationship to the GitOps Model + +This design builds on the multi-cluster GitOps model described in +[Multi-Cluster GitOps Model](./gitops-multi-cluster.md). + +The important clarification is: + +- `kro` API definitions are platform-owned +- developer-authored API instances are release inputs +- Kargo materializes environment-specific outputs into the `gitops` repo +- Argo CD reconciles those outputs from `main` + +Environment-specific paths remain the expected layout, for example: + +```text +gitops/ +└── envs/ + ├── dev/ + │ └── orders-api/ + ├── staging/ + │ └── orders-api/ + └── prod/ + └── orders-api/ +``` + +This document does not settle the exact `gitops` folder layout beyond that +principle. + +## Promotion Model + +`Kargo` should treat a release as more than an image update. + +The intended model is: + +- CI produces a new image +- CI also produces a Git commit containing the developer-owned application + release intent +- Kargo bundles the image revision and Git commit into one piece of Freight +- Kargo promotes that Freight through environments +- during promotion, Kargo materializes the final environment-specific desired + state into the `gitops` repo +- Argo CD reconciles that materialized desired state from `main` + +This is different from treating promotion as nothing more than a digest bump. +Behavior-defining configuration that belongs to the application release should +travel with the release artifact. + +## Environment-specific Inputs + +Environment-specific concerns still exist and must influence the final deployed +resource instances. + +A realistic example is an application input such as `THIRD_PARTY_URL`: + +- in `dev`, the application may need to call + `sandbox.api.thirdparty.com` +- in `prod`, the application may need to call `api.thirdparty.com` + +This kind of input is not necessarily part of the application release itself. +It may instead be a property of the target environment. + +The current design direction is: + +- developer-owned release inputs live with the application source +- environment-specific inputs live in the `gitops` repo +- Kargo combines the two during promotion and writes the resulting `kro` + resource instance into the destination environment folder on `main` + +This lets release-coupled configuration move through the promotion pipeline +while still letting environment-specific concerns influence the final deployed +resource. + +## Promotion-time Composition + +The current working assumption is that any composition of: + +- developer-authored release input +- environment-specific overrides or bindings +- final Argo-reconciled output + +happens during promotion, inside Kargo. + +In other words: + +- developers do not manually write environment-specific final manifests into the + `gitops` repo +- Argo CD should reconcile the final materialized YAML +- Kargo is the place where environment-specific shaping occurs before that YAML + lands in Git + +One pragmatic option is to use Kustomize at promotion time only. + +If used, the important boundary is: + +- Kustomize is a promotion-time composition tool +- Kustomize is not the public API +- Kustomize is not the developer-facing abstraction +- Argo CD should still reconcile the final materialized YAML written by Kargo + +This keeps `kro` as the abstraction layer while allowing a familiar merge and +patch mechanism to help materialize final environment-specific manifests. + +The exact composition mechanism remains an open question. Kustomize is one +candidate, not a final commitment in this document. + +## Composition Patterns + +Cross-resource relationships need explicit design. + +The preferred default is to embed related configuration where lifecycle and +ownership naturally belong together. + +For example, if an application needs a small amount of secret material or a +simple runtime capability that is specific to that application instance, the +public API should prefer embedding that relationship rather than forcing the +developer to construct multiple loosely related peer resources. + +However, embedding will not work in every case. + +Some capabilities have independent lifecycle or sharing boundaries. A future +`Database` capability is an example: + +- if an application declares that it needs a database, how does the application + receive connection details? +- if the database is managed independently, what is the contract between the + application-facing resource and the database-facing resource? +- if multiple consumers share a capability, where should the shared contract + live? + +`kro` does not remove the need to design those contracts carefully. This is an +area that still needs explicit design guidance for the lab. + +## Guardrails and Policy + +Environment substitution and policy guardrails are related but distinct +problems. + +### Environment substitution + +This is about supplying environment-sensitive values during promotion or +materialization. + +Examples: + +- environment-specific endpoints +- environment-specific hostnames +- environment-local service references + +This concern belongs close to the promotion/materialization flow and therefore +belongs close to Kargo and the `gitops` repo. + +### Policy guardrails + +This is about constraining what a final deployed resource is allowed to do. + +Examples: + +- forbidding privileged or root workloads in prod +- enforcing tenancy boundaries +- constraining namespaces, quotas, or network access + +This concern is not necessarily a `kro` concern. It may be better handled by +policy and governance layers such as `Kyverno` or `Capsule`. + +This design intentionally does not force those responsibilities into `kro`. + +## Current Design Direction + +The current working model is: + +1. The platform team publishes shared `kro` APIs. +2. Developers instantiate those APIs alongside the application source code. +3. CI produces an image and a corresponding Git commit. +4. Kargo bundles those artifacts into Freight. +5. Kargo promotes Freight between environments. +6. During promotion, Kargo combines release input with environment-specific + inputs and writes the final resource instances into the environment-specific + area of the `gitops` repo on `main`. +7. Argo CD reconciles those final resource instances from the `gitops` repo. + +This is the design baseline for the next pass. + +## Open Questions + +- What are the first public `kro` APIs the platform should publish? +- Which relationships should be embedded by default versus modeled as explicit + peer resources? +- What is the standard contract for peer-style relationships such as an + application consuming a separately managed database? +- Should Kustomize be the default promotion-time composition mechanism, or only + one optional implementation technique? +- What exact files live in the environment-specific `gitops` folders during + promotion and after promotion? +- How should environment-specific inputs be authored, owned, and reviewed in the + `gitops` repo? + +## Next Step + +The next design pass should define one or more concrete `ResourceGraphDefinition` +schemas that embody these principles, starting with a minimal application-facing +API and its expected lifecycle. diff --git a/docs/docs/index.md b/docs/docs/index.md index 2921df0..b77794f 100644 --- a/docs/docs/index.md +++ b/docs/docs/index.md @@ -11,6 +11,7 @@ This site is the primary documentation surface for the GilmanLab homelab. Start with: - [Architecture overview](./architecture.md) +- [Design documents](./designs/) - [Hardware reference](./hardware.md) - [Network device backups](./network-device-backups.md) - [RouterOS ACME certificates](./routeros-acme.md)