From 5b43f61a3153163e8724027aa0a51c4e8f9cbf6b Mon Sep 17 00:00:00 2001 From: Arnaud Lheureux Date: Mon, 15 Jun 2026 13:13:33 +0800 Subject: [PATCH 1/8] Add private runner support to git-ape-onboarding Support all private GitHub Actions runner options from Azure/git-ape-private#12: select runner type (ACI/ACA/AKS) and migrate public->private via a single GIT_APE_RUNNER_LABEL workflow variable. - Parametrize runs-on in plan/deploy/destroy/verify workflow templates to ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }}; add a runner configuration report step to verify.yml. - Add on-demand runner IaC under templates/runners/: ARM template.json + parameters.json for ACI and ACA (ephemeral runners, optional UAMI, optional VNet injection, KEDA github-runner scaler with scale-to-zero for ACA), and an ARC gha-runner-scale-set Helm values.yaml for AKS. - Update SKILL.md (runner selection/provisioning step), the onboarding agent contract (runner type input), and the copilot-instructions GitHub Actions Runners section (mirror kept byte-identical). - Regenerate website docs and add a positive private-runner eval task. Validated: server-side az deployment validate + what-if on ACI/ACA templates (both branches of every conditional), Docusaurus build, actionlint, template-sync and scaffold-parity checks. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .github/agents/git-ape-onboarding.agent.md | 10 +- .github/copilot-instructions.md | 70 ++++++ .../tasks/positive-private-runner.yaml | 62 +++++ .github/skills/git-ape-onboarding/SKILL.md | 72 +++++- .../templates/copilot-instructions.md | 70 ++++++ .../templates/runners/README.md | 122 ++++++++++ .../templates/runners/aca/parameters.json | 37 +++ .../templates/runners/aca/template.json | 222 ++++++++++++++++++ .../templates/runners/aci/parameters.json | 37 +++ .../templates/runners/aci/template.json | 171 ++++++++++++++ .../templates/runners/aks/README.md | 75 ++++++ .../templates/runners/aks/values.yaml | 37 +++ .../templates/workflows/git-ape-deploy.yml | 4 +- .../templates/workflows/git-ape-destroy.yml | 4 +- .../templates/workflows/git-ape-plan.yml | 8 +- .../templates/workflows/git-ape-verify.yml | 20 +- website/docs/agents/git-ape-onboarding.md | 10 +- website/docs/getting-started/onboarding.md | 55 ++++- website/docs/skills/git-ape-onboarding.md | 72 +++++- website/docs/workflows/git-ape-deploy.md | 8 +- website/docs/workflows/git-ape-destroy.md | 8 +- website/docs/workflows/git-ape-plan.md | 16 +- website/docs/workflows/git-ape-verify.md | 24 +- 23 files changed, 1169 insertions(+), 45 deletions(-) create mode 100644 .github/evals/git-ape-onboarding/tasks/positive-private-runner.yaml create mode 100644 .github/skills/git-ape-onboarding/templates/runners/README.md create mode 100644 .github/skills/git-ape-onboarding/templates/runners/aca/parameters.json create mode 100644 .github/skills/git-ape-onboarding/templates/runners/aca/template.json create mode 100644 .github/skills/git-ape-onboarding/templates/runners/aci/parameters.json create mode 100644 .github/skills/git-ape-onboarding/templates/runners/aci/template.json create mode 100644 .github/skills/git-ape-onboarding/templates/runners/aks/README.md create mode 100644 .github/skills/git-ape-onboarding/templates/runners/aks/values.yaml diff --git a/.github/agents/git-ape-onboarding.agent.md b/.github/agents/git-ape-onboarding.agent.md index 709b98f..d33a58f 100644 --- a/.github/agents/git-ape-onboarding.agent.md +++ b/.github/agents/git-ape-onboarding.agent.md @@ -43,7 +43,7 @@ Always use the `/git-ape-onboarding` skill for procedure and command patterns. ## Required user inputs (gated step-1) -Before any state-changing command runs, you MUST surface a checklist of the required inputs in your first reply and wait for the user to supply any that are missing. Even when the user's opening prompt already names a few (e.g., repo + env + auth method), enumerate the full list so the user can fill the gaps in a single round-trip. At minimum, request the following **six** inputs (rendered as a numbered list, table, or explicit question block — never inferred silently): +Before any state-changing command runs, you MUST surface a checklist of the required inputs in your first reply and wait for the user to supply any that are missing. Even when the user's opening prompt already names a few (e.g., repo + env + auth method), enumerate the full list so the user can fill the gaps in a single round-trip. At minimum, request the following **seven** inputs (rendered as a numbered list, table, or explicit question block — never inferred silently): 1. **Target GitHub repository** — `/` plus confirmation of the default branch (assume `main`; only change if the user explicitly says otherwise — never silently substitute `master`). 2. **Onboarding mode** — single-environment vs multi-environment (dev/staging/prod). Even if the prompt names one, restate it explicitly for confirmation. @@ -51,14 +51,15 @@ Before any state-changing command runs, you MUST surface a checklist of the requ 4. **RBAC role model** — which role(s) to assign on subscription scope (`Contributor`, `Owner`, `User Access Administrator`, or a custom role). Default suggestion: `Contributor`. 5. **Default Azure region** — primary region for the workload (e.g., `eastus`, `westus2`). Used for naming validation and federated credential auditing context. 6. **Project / deployment name** — short slug used to name the App Registration (`sp--`), federated credentials (`fc---main-branch`), and downstream Git-Ape deployments. +7. **Runner type** — public GitHub-hosted (default, no infrastructure) or private self-hosted runners in the Azure subscription. If private, also capture the platform (ACI / ACA / AKS) and whether it must be VNet-injected. Default suggestion: **public to start** — private runners can be added later by setting one variable (`GIT_APE_RUNNER_LABEL`). -Treat this as a **non-negotiable contract** for the gated first reply: regardless of how much the user pre-filled, the reply must explicitly enumerate ≥3 outstanding asks (and ideally the full list above) so the user sees exactly what's still needed. Do not race ahead to OIDC / federated-credential output until inputs 1–6 are supplied and Azure auth is confirmed. +Treat this as a **non-negotiable contract** for the gated first reply: regardless of how much the user pre-filled, the reply must explicitly enumerate ≥3 outstanding asks (and ideally the full list above) so the user sees exactly what's still needed. Do not race ahead to OIDC / federated-credential output until inputs 1–7 are supplied and Azure auth is confirmed. ## Workflow 1. Confirm target repository URL **and default branch** (input #1 above). 2. Ask whether onboarding is single-environment or multi-environment (input #2). -3. Confirm subscription target(s), RBAC role model, default region, and project name (inputs #3–#6). +3. Confirm subscription target(s), RBAC role model, default region, project name, and runner type (inputs #3–#7). 4. Validate prerequisites: - `az`, `gh`, `jq` installed - Azure authenticated (`az account show`) @@ -73,7 +74,8 @@ Treat this as a **non-negotiable contract** for the gated first reply: regardles Both scripts produce byte-identical output. Report which files were created vs skipped. 9. Ask compliance framework and enforcement mode preferences (Step 10 in `/git-ape-onboarding` skill playbook). 10. Update the `## Compliance & Azure Policy` section in `.github/copilot-instructions.md` with the user's choices. If the file was skipped by the scaffold step or lacks that section, surface the captured preferences in chat for manual integration instead of mutating the file. -11. Summarize created/updated artifacts and next checks. +11. Select the runner type (input #7). If private runners were chosen, point the user at `./templates/runners//` for the reference IaC, have them provision it (sourcing the GitHub credential from Key Vault, never inlined), confirm the runner is online, and set the `GIT_APE_RUNNER_LABEL` variable. If public, leave the variable unset. (Step 11 in `/git-ape-onboarding` skill playbook.) +12. Summarize created/updated artifacts and next checks. ## Output Requirements diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index a6fe66a..a943f2b 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -298,6 +298,76 @@ Create two GitHub environments for protection rules: - Required reviewers (recommended — destructive action) - Deployment branches: `main` only (triggered on PR merge) +## GitHub Actions Runners + +Git-Ape workflows run on **public GitHub-hosted runners by default** and can be +switched to **private self-hosted runners** in your Azure subscription with a +single repository variable — no workflow edits required. + +### The runner switch: `GIT_APE_RUNNER_LABEL` + +Every scaffolded workflow (`git-ape-plan`, `-deploy`, `-destroy`, `-verify`) +resolves its runner like this: + +```yaml +runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} +``` + +| `GIT_APE_RUNNER_LABEL` | Effect | +|------------------------|--------| +| **unset** (default) | Jobs run on GitHub-hosted `ubuntu-latest`. No infrastructure. | +| set to a label (default `git-ape-runner`) | Jobs target your self-hosted runners registered with that label. | + +```bash +# Switch to private runners (after they are provisioned and online) +gh variable set GIT_APE_RUNNER_LABEL --repo / --body "git-ape-runner" +# Clean fallback to GitHub-hosted runners +gh variable delete GIT_APE_RUNNER_LABEL --repo / +``` + +In multi-environment mode, set the variable per environment (`--env azure-deploy`) +so only the environments that need private runners use them. + +### Runner types and platforms + +- **Public GitHub-hosted** — default; nothing to provision. +- **Self-hosted (subscription)** — runners are Azure resources with outbound + internet. Control over image, region, and identity without a VNet. +- **VNet-injected** — runners run inside a subnet you manage, for private + connectivity to Azure resources (private endpoints, no public egress except + to GitHub). + +Private runners are provisioned from on-demand reference IaC shipped with the +onboarding skill at `templates/runners/`: + +| Platform | What it provisions | +|----------|--------------------| +| **ACI** | ARM `template.json` — a container group running an ephemeral runner. | +| **ACA** | ARM `template.json` — a KEDA `github-runner`-scaled Container Apps Job (scale-to-zero). | +| **AKS** | Actions Runner Controller (ARC) via Helm `values.yaml` (cluster created with a standard Git-Ape ARM deployment). | + +These templates are **not** auto-scaffolded — the public bootstrap stays the +default. Run `/git-ape-onboarding` (Runner Selection step) to choose and +provision them. + +### Runner security model + +- **Azure access uses a user-assigned managed identity, never stored keys.** +- **The GitHub registration credential is the only secret** — source it from + Key Vault (`securestring` parameter or pre-created Kubernetes secret), never + inline it in a committed `parameters.json` or `values.yaml`. +- **Ephemeral runners by default** — one job per registration, so no state leaks + between deployments. +- The runner label (`git-ape-runner`) must equal `GIT_APE_RUNNER_LABEL`. + +### Drift workflow caveat + +`git-ape-drift.lock.yml` is a compiled GitHub Agentic Workflow (gh-aw) and does +**not** honor `GIT_APE_RUNNER_LABEL`. To run continuous drift detection on a +private runner, set `runs-on:` in the source `git-ape-drift.md` frontmatter and +recompile with `gh aw compile` — never hand-edit the `.lock.yml` (it carries an +integrity hash). + ## Security Baseline - Enable HTTPS-only for all web-facing resources diff --git a/.github/evals/git-ape-onboarding/tasks/positive-private-runner.yaml b/.github/evals/git-ape-onboarding/tasks/positive-private-runner.yaml new file mode 100644 index 0000000..09090c4 --- /dev/null +++ b/.github/evals/git-ape-onboarding/tasks/positive-private-runner.yaml @@ -0,0 +1,62 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/microsoft/waza/main/schemas/task.schema.json + +id: positive-private-runner +name: Positive — Onboard with private VNet-injected runner +description: Onboard request that selects private self-hosted runners (ACA, VNet-injected) — clear trigger plus runner-selection handling. +tags: [trigger, positive, runner] +inputs: + prompt: "Onboard this repository for Git-Ape, but I don't want to use GitHub-hosted runners — the deployment jobs must run on private self-hosted runners inside my Azure subscription, VNet-injected, on Azure Container Apps. Set up OIDC, RBAC, the GitHub environments, and tell me how the runner side works. My repo is a fresh local Git repo." +graders: + - name: trigger_relevance_positive + type: trigger + config: + skill_path: .github/skills/git-ape-onboarding/SKILL.md + mode: positive + threshold: 0.5 + + # answer_quality (LLM-as-judge): scoped per-task on positives so a flaky + # judge call only zeroes out this task, not the whole leg. See eval.yaml. + # + # IMPORTANT: waza prompt graders are binary (set_waza_grade_pass = 1.0, + # set_waza_grade_fail = 0.0). They are NOT 1–5 rubrics. The judge has NO + # access to the agent's response unless continue_session: true is set — it + # resumes the agent's own session so it can read the response. + - type: prompt + name: answer_quality + config: + continue_session: true + prompt: | + You are grading the assistant's previous response in this session. + The user asked to onboard a fresh repo for Git-Ape AND to run the + deployment workflows on PRIVATE self-hosted runners (VNet-injected, + on Azure Container Apps) instead of GitHub-hosted runners. + + The skill GATES on prerequisites and required user inputs before + executing any state-changing commands. The expected first-turn reply + is a "gated step-1" response, NOT a completion report. Runner + selection is one of the required inputs. + + PASS criteria — the response must satisfy ALL FOUR of: + 1. It is a gated handoff: prereq/auth status is shown OR required + inputs are requested before any execution (a numbered list or + explicit question block counts). + 2. It requests at least THREE required inputs from: target GitHub + repository, Azure subscription ID, RBAC role, default region, + project / deployment name, onboarding mode. + 3. It correctly handles the runner request — satisfy ANY ONE of: + (a) acknowledges the chosen private runner (self-hosted / + VNet-injected and/or Azure Container Apps / ACA) as the runner + type to configure; (b) explains the GIT_APE_RUNNER_LABEL + variable switch (public ubuntu-latest by default → private when + the label is set); or (c) points the user at the runner + reference IaC under templates/runners/ (ACI/ACA/AKS). It must + NOT ignore the runner request or claim GitHub-hosted is the only + option. + 4. It does NOT claim to have already provisioned runners, set + GIT_APE_RUNNER_LABEL, configured OIDC, created federated + credentials/environments, or assigned RBAC. The reply waits for + user input + auth before continuing. (Fabricated "I've + configured X" / "I provisioned Y" before inputs → FAIL.) + + If ALL FOUR are met, call `set_waza_grade_pass`. + Otherwise, call `set_waza_grade_fail` and list which criteria are missing. diff --git a/.github/skills/git-ape-onboarding/SKILL.md b/.github/skills/git-ape-onboarding/SKILL.md index c8f8b99..af4d0f1 100644 --- a/.github/skills/git-ape-onboarding/SKILL.md +++ b/.github/skills/git-ape-onboarding/SKILL.md @@ -27,8 +27,9 @@ This skill configures: 2. OIDC federated credentials for GitHub Actions 3. RBAC role assignment(s) on subscription scope 4. GitHub environments (`azure-deploy*`, `azure-destroy`) -5. Required GitHub secrets (`AZURE_CLIENT_ID`, `AZURE_TENANT_ID`) and the `AZURE_SUBSCRIPTION_ID` variable +5. Required GitHub secrets (`AZURE_CLIENT_ID`, `AZURE_TENANT_ID`) and the `AZURE_SUBSCRIPTION_ID` variable, plus the optional `GIT_APE_RUNNER_LABEL` variable that selects private runners 6. Scaffolded GitHub Actions workflow files (`git-ape-plan.yml`, `-deploy.yml`, `-destroy.yml`, `-verify.yml`, `-drift.{md,lock.yml}`) and deployment standards (`.github/copilot-instructions.md`) into the user's working copy +7. The GitHub Actions **runner type** the workflows run on — public GitHub-hosted (default) or private self-hosted runners in your Azure subscription (ACI / ACA / AKS, optionally VNet-injected). On-demand IaC for private runners ships at `./templates/runners/`. ## Prerequisites @@ -106,11 +107,12 @@ OIDC_PREFIX="repository_owner_id::repository_id:" - `fc-azure-deploy` subject `"$OIDC_PREFIX:environment:azure-deploy"` (one per environment in multi-env mode) - `fc-azure-destroy` subject `"$OIDC_PREFIX:environment:azure-destroy"` 6. Assign RBAC on each target subscription. -7. Set GitHub repo or environment secrets (`AZURE_CLIENT_ID`, `AZURE_TENANT_ID`) and the `AZURE_SUBSCRIPTION_ID` variable. +7. Set GitHub repo or environment secrets (`AZURE_CLIENT_ID`, `AZURE_TENANT_ID`) and the `AZURE_SUBSCRIPTION_ID` variable. (The `GIT_APE_RUNNER_LABEL` variable is set later in Step 11 only if private runners are chosen.) 8. Create GitHub environments and branch policies when permissions allow. 9. Scaffold workflow files and deployment standards into the user's working copy (see below). 10. Capture compliance and Azure Policy preferences (see below). -11. Verify federated credentials, role assignments, and secrets. +11. Select the GitHub Actions runner type and, if private runners are chosen, provision them and set `GIT_APE_RUNNER_LABEL` (see below). +12. Verify federated credentials, role assignments, and secrets. ### Step 9: Scaffold workflow files and deployment standards @@ -185,6 +187,67 @@ After RBAC and environment setup, ask the user about compliance requirements and preferences and a suggested patch in chat so the user can apply it. - In all cases, leave changes unstaged and let the user commit them. +### Step 11: Runner Selection & Provisioning (optional) + +Git-Ape workflows resolve their runner from a single variable: + +```yaml +runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} +``` + +Unset → public GitHub-hosted `ubuntu-latest` (the default; no infrastructure). +Set to a label → private self-hosted runners registered with that label. This is +the **bootstrap model: start public, switch to private later with one variable.** + +1. **Ask the runner type:** + ``` + What runner should the Git-Ape workflows run on? + - Public GitHub-hosted (recommended to start — no infrastructure) + - Self-hosted in my Azure subscription + - VNet-injected (private connectivity, no public egress except to GitHub) + ``` + +2. **If public (default):** do nothing. Leave `GIT_APE_RUNNER_LABEL` unset. + Onboarding is complete; the user can switch to private runners any time by + repeating this step. + +3. **If self-hosted or VNet-injected, ask the platform:** + ``` + Which Azure platform should host the runners? + - ACI — Azure Container Instances (simplest; a handful of runners) + - ACA — Azure Container Apps (event-driven, ephemeral, scale-to-zero) + - AKS — Azure Kubernetes Service (Actions Runner Controller; large scale) + ``` + +4. **Point the user at the reference IaC** for the chosen type × platform under + `./templates/runners/` (`aci/` and `aca/` ship ARM `template.json` + + `parameters.json`; `aks/` ships an ARC Helm `values.yaml` + README). See + `./templates/runners/README.md` for the full matrix and security model. + - The GitHub registration credential is the only secret — source it from Key + Vault, never inline it. Azure access uses a user-assigned managed identity. + - For VNet-injected, set the subnet parameter (`subnetId` for ACI, + `infrastructureSubnetId` for ACA, or a VNet node pool for AKS). + - Provision with `az deployment group create -f template.json -p @parameters.json` + (ACI/ACA) or `helm install` (AKS). Do NOT add these templates to the + scaffold helper — they are on-demand only. + +5. **Confirm the runner is online** in *GitHub → Settings → Actions → Runners* + with the `git-ape-runner` label. + +6. **Set the variable** so workflows target it (repo-wide or per environment): + ```bash + gh variable set GIT_APE_RUNNER_LABEL --repo / --body "git-ape-runner" + # per environment instead: + gh variable set GIT_APE_RUNNER_LABEL --repo / --env azure-deploy --body "git-ape-runner" + ``` + Clean fallback to GitHub-hosted runners is `gh variable delete GIT_APE_RUNNER_LABEL`. + +7. **Continuous drift detection** (`git-ape-drift.lock.yml`) is a compiled gh-aw + workflow and does NOT honor `GIT_APE_RUNNER_LABEL`. To move drift onto a + private runner, set `runs-on:` in the source `git-ape-drift.md` frontmatter + and recompile with `gh aw compile` — never hand-edit the `.lock.yml` (it + carries an integrity hash). The other four workflows need no recompile. + ## Safe-Execution Rules 1. Echo target repository and subscription(s) before execution. @@ -209,7 +272,8 @@ After RBAC and environment setup, ask the user about compliance requirements and 6. Scaffold workflow files and `copilot-instructions.md` via `./scripts/scaffold-repo.sh` on macOS/Linux/WSL, or `pwsh ./scripts/scaffold-repo.ps1` on Windows (Step 9 in playbook). Report which files were created vs skipped. 7. Ask compliance framework and enforcement mode preferences (Step 10 in playbook). 8. Update `copilot-instructions.md` with compliance preferences — or, if the file was skipped by the scaffold step, surface the preferences in chat for manual integration. -9. Summarize outcome (including scaffolded file counts) and suggest verification commands. +9. Ask the runner type (and platform if private), and — if private runners are chosen — point the user at `./templates/runners/` and set `GIT_APE_RUNNER_LABEL` (Step 11 in playbook). +10. Summarize outcome (including scaffolded file counts and the chosen runner type) and suggest verification commands. ## Known Gotchas diff --git a/.github/skills/git-ape-onboarding/templates/copilot-instructions.md b/.github/skills/git-ape-onboarding/templates/copilot-instructions.md index a6fe66a..a943f2b 100644 --- a/.github/skills/git-ape-onboarding/templates/copilot-instructions.md +++ b/.github/skills/git-ape-onboarding/templates/copilot-instructions.md @@ -298,6 +298,76 @@ Create two GitHub environments for protection rules: - Required reviewers (recommended — destructive action) - Deployment branches: `main` only (triggered on PR merge) +## GitHub Actions Runners + +Git-Ape workflows run on **public GitHub-hosted runners by default** and can be +switched to **private self-hosted runners** in your Azure subscription with a +single repository variable — no workflow edits required. + +### The runner switch: `GIT_APE_RUNNER_LABEL` + +Every scaffolded workflow (`git-ape-plan`, `-deploy`, `-destroy`, `-verify`) +resolves its runner like this: + +```yaml +runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} +``` + +| `GIT_APE_RUNNER_LABEL` | Effect | +|------------------------|--------| +| **unset** (default) | Jobs run on GitHub-hosted `ubuntu-latest`. No infrastructure. | +| set to a label (default `git-ape-runner`) | Jobs target your self-hosted runners registered with that label. | + +```bash +# Switch to private runners (after they are provisioned and online) +gh variable set GIT_APE_RUNNER_LABEL --repo / --body "git-ape-runner" +# Clean fallback to GitHub-hosted runners +gh variable delete GIT_APE_RUNNER_LABEL --repo / +``` + +In multi-environment mode, set the variable per environment (`--env azure-deploy`) +so only the environments that need private runners use them. + +### Runner types and platforms + +- **Public GitHub-hosted** — default; nothing to provision. +- **Self-hosted (subscription)** — runners are Azure resources with outbound + internet. Control over image, region, and identity without a VNet. +- **VNet-injected** — runners run inside a subnet you manage, for private + connectivity to Azure resources (private endpoints, no public egress except + to GitHub). + +Private runners are provisioned from on-demand reference IaC shipped with the +onboarding skill at `templates/runners/`: + +| Platform | What it provisions | +|----------|--------------------| +| **ACI** | ARM `template.json` — a container group running an ephemeral runner. | +| **ACA** | ARM `template.json` — a KEDA `github-runner`-scaled Container Apps Job (scale-to-zero). | +| **AKS** | Actions Runner Controller (ARC) via Helm `values.yaml` (cluster created with a standard Git-Ape ARM deployment). | + +These templates are **not** auto-scaffolded — the public bootstrap stays the +default. Run `/git-ape-onboarding` (Runner Selection step) to choose and +provision them. + +### Runner security model + +- **Azure access uses a user-assigned managed identity, never stored keys.** +- **The GitHub registration credential is the only secret** — source it from + Key Vault (`securestring` parameter or pre-created Kubernetes secret), never + inline it in a committed `parameters.json` or `values.yaml`. +- **Ephemeral runners by default** — one job per registration, so no state leaks + between deployments. +- The runner label (`git-ape-runner`) must equal `GIT_APE_RUNNER_LABEL`. + +### Drift workflow caveat + +`git-ape-drift.lock.yml` is a compiled GitHub Agentic Workflow (gh-aw) and does +**not** honor `GIT_APE_RUNNER_LABEL`. To run continuous drift detection on a +private runner, set `runs-on:` in the source `git-ape-drift.md` frontmatter and +recompile with `gh aw compile` — never hand-edit the `.lock.yml` (it carries an +integrity hash). + ## Security Baseline - Enable HTTPS-only for all web-facing resources diff --git a/.github/skills/git-ape-onboarding/templates/runners/README.md b/.github/skills/git-ape-onboarding/templates/runners/README.md new file mode 100644 index 0000000..ce0dedd --- /dev/null +++ b/.github/skills/git-ape-onboarding/templates/runners/README.md @@ -0,0 +1,122 @@ +# Git-Ape self-hosted runner templates + +These are **reference Infrastructure-as-Code templates** for provisioning private +GitHub Actions runners that execute the Git-Ape deployment workflows +(`git-ape-plan`, `-deploy`, `-destroy`, `-verify`) inside **your** Azure +subscription instead of on GitHub-hosted runners. + +They are **not** scaffolded into your repository automatically. The +`/git-ape-onboarding` flow copies and customizes the template for the runner +type and platform you choose, then provisions it. The bootstrap model is: + +> **Start on public runners, switch to private runners later — with one variable.** + +## The runner switch: `GIT_APE_RUNNER_LABEL` + +Every scaffolded Git-Ape workflow resolves its runner like this: + +```yaml +runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} +``` + +| `GIT_APE_RUNNER_LABEL` | Effect | +|------------------------|--------| +| **unset** (default) | Jobs run on GitHub-hosted `ubuntu-latest`. No infrastructure. | +| set to a label (default `git-ape-runner`) | Jobs target your self-hosted runners registered with that label. | + +Switching is a one-line change and is fully reversible: + +```bash +# Switch to private runners (after they are provisioned and online) +gh variable set GIT_APE_RUNNER_LABEL --repo / --body "git-ape-runner" + +# Clean fallback to GitHub-hosted runners +gh variable delete GIT_APE_RUNNER_LABEL --repo / +``` + +In multi-environment mode, set the variable per environment +(`--env azure-deploy-prod`) so only the environments that need private runners +use them. + +## Runner type × platform matrix + +| | **Azure Container Instances (ACI)** | **Azure Container Apps (ACA)** | **Azure Kubernetes Service (AKS)** | +|---|---|---|---| +| **Self-hosted (subscription)** | [`aci/`](./aci) — single container group, simplest | [`aca/`](./aca) — KEDA-scaled ephemeral jobs | [`aks/`](./aks) — Actions Runner Controller (ARC) | +| **VNet-injected** | [`aci/`](./aci) with `subnetId` set | [`aca/`](./aca) with `infrastructureSubnetId` set | [`aks/`](./aks) — runners on cluster node subnet | + +- **Self-hosted (subscription)** — runners are Azure resources in your + subscription with outbound internet. Gives you control over image, region, + and identity without managing a VNet. +- **VNet-injected** — runners run inside a subnet of a VNet you manage, for + workloads that need private connectivity to Azure resources (private + endpoints, no public egress except to GitHub). Choose this when deployments + must reach VNet-isolated targets or when policy forbids public runners. + +### Which platform? + +| Choose | When | +|--------|------| +| **ACI** | Fewest moving parts. A handful of runners, simple scaling, fast to stand up. | +| **ACA** | You want **event-driven, ephemeral** runners that scale to zero between jobs (KEDA `github-runner` scaler). Best cost/utilization. | +| **AKS** | You already run AKS, need large-scale autoscaling, or want ARC's ephemeral runner pods and fine-grained scheduling. | + +## Security model + +- **Azure access uses a managed identity, never secrets.** Each template + attaches a **user-assigned managed identity** to the runner so the workflows + can authenticate to Azure (the runner host identity) — but Git-Ape workflows + still use **OIDC federation** for `az` actions, so the managed identity only + needs what the runtime requires. Do not put subscription keys or connection + strings on the runner. +- **The GitHub registration credential is the one unavoidable secret.** GitHub + requires a credential to register a runner. Order of preference: + 1. **GitHub App** installation token (recommended for org-scale; ARC supports + this natively). + 2. **Fine-grained PAT** with `administration:write` (repo runners) or + organization `self-hosted runners` write (org runners). + Source it from **Azure Key Vault** (`securestring` params + Key Vault + reference), never inline it in a committed `parameters.json`. +- **Ephemeral runners by default.** Templates register **ephemeral** runners + (one job per runner, then re-register). This prevents state leaking between + jobs — important when runners are shared across deployments. +- **Label scoping.** All templates register the runner with the label + `git-ape-runner` (override via parameter). That label is what + `GIT_APE_RUNNER_LABEL` must match. + +## Provisioning flow (all platforms) + +```mermaid +flowchart LR + A[Choose type + platform] --> B[Copy template into
.azure/runners/] + B --> C[Provide GitHub creds
via Key Vault] + C --> D[Deploy IaC
az deployment / helm] + D --> E[Runner registers
with label git-ape-runner] + E --> F[Set GIT_APE_RUNNER_LABEL
variable] + F --> G[Workflows now run
on private runners] + G -.clean fallback.-> H[Unset variable →
back to ubuntu-latest] +``` + +1. **Choose** the runner type and platform (the `/git-ape-onboarding` flow asks). +2. **Copy** the chosen platform folder into your repo under + `.azure/runners//` and edit parameters for your repo/org, region, + labels, and (for VNet-injected) the target `subnetId`. +3. **Store the GitHub credential** in Key Vault and reference it from the + secure parameter — do not commit it. +4. **Deploy** the runner infrastructure (see each platform's README/notes). +5. **Confirm** the runner is **online** in + *GitHub → Settings → Actions → Runners* with the `git-ape-runner` label. +6. **Set** `GIT_APE_RUNNER_LABEL=git-ape-runner` (repo or per-environment). +7. **Verify** by running the `Git-Ape: Verify Setup` workflow — its *Runner + Configuration* step reports the active runner mode. + +## Note on the drift workflow + +`git-ape-drift.lock.yml` is a **compiled GitHub Agentic Workflow** (gh-aw). Its +runner is fixed at compile time and gh-aw only supports GitHub-hosted Ubuntu +labels for its agent job. To run continuous drift detection on a private runner, +set `runs-on:` (and optionally `runs-on-slim:`) in the **source** +`git-ape-drift.md` frontmatter to a supported label and recompile with +`gh aw compile`. Do **not** hand-edit the `.lock.yml` — it carries an integrity +hash and will fail its stale-lock check. The other four workflows honor +`GIT_APE_RUNNER_LABEL` directly with no recompile. diff --git a/.github/skills/git-ape-onboarding/templates/runners/aca/parameters.json b/.github/skills/git-ape-onboarding/templates/runners/aca/parameters.json new file mode 100644 index 0000000..135541b --- /dev/null +++ b/.github/skills/git-ape-onboarding/templates/runners/aca/parameters.json @@ -0,0 +1,37 @@ +{ + "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#", + "contentVersion": "1.0.0.0", + "parameters": { + "runnerName": { + "value": "git-ape-runner" + }, + "githubOwnerRepo": { + "value": "your-org/your-repo" + }, + "runnerScope": { + "value": "repo" + }, + "runnerLabels": { + "value": "git-ape-runner" + }, + "_comment_githubAccessToken": "Do NOT inline the token. Reference it from Key Vault as shown, or pass it at deploy time with -p githubAccessToken=$(az keyvault secret show ...).", + "githubAccessToken": { + "reference": { + "keyVault": { + "id": "/subscriptions//resourceGroups//providers/Microsoft.KeyVault/vaults/" + }, + "secretName": "github-runner-token" + } + }, + "userAssignedIdentityId": { + "value": "" + }, + "_comment_infrastructureSubnetId": "Leave empty for a self-hosted (subscription) environment. For VNet-injected, set a subnet resource ID (Consumption needs a /23 or larger).", + "infrastructureSubnetId": { + "value": "" + }, + "maxRunners": { + "value": 10 + } + } +} diff --git a/.github/skills/git-ape-onboarding/templates/runners/aca/template.json b/.github/skills/git-ape-onboarding/templates/runners/aca/template.json new file mode 100644 index 0000000..cdaf36a --- /dev/null +++ b/.github/skills/git-ape-onboarding/templates/runners/aca/template.json @@ -0,0 +1,222 @@ +{ + "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#", + "contentVersion": "1.0.0.0", + "metadata": { + "_generator": "git-ape-onboarding", + "description": "Git-Ape self-hosted runner on Azure Container Apps (ACA). Provisions a managed environment and an event-driven Container Apps Job that scales ephemeral GitHub Actions runners on demand with the KEDA 'github-runner' scaler (scale-to-zero between jobs). Runners register with the label 'git-ape-runner' (override via parameter). Leave infrastructureSubnetId empty for a self-hosted (subscription) environment; set it for VNet injection (Consumption needs a /23 or larger). The GitHub credential is the only secret and must be sourced from Key Vault.\n\nDeploy:\n az group create -n rg-git-ape-runners -l eastus\n az deployment group create -g rg-git-ape-runners -f template.json -p @parameters.json" + }, + "parameters": { + "location": { + "type": "string", + "defaultValue": "[resourceGroup().location]", + "metadata": { + "description": "Azure region. Defaults to the resource group location." + } + }, + "runnerName": { + "type": "string", + "defaultValue": "git-ape-runner", + "metadata": { + "description": "Base name for the managed environment and job." + } + }, + "githubOwnerRepo": { + "type": "string", + "metadata": { + "description": "GitHub target in / form for repo-scoped runners, or the org/owner name for org-scoped runners." + } + }, + "runnerScope": { + "type": "string", + "defaultValue": "repo", + "allowedValues": [ + "repo", + "org" + ], + "metadata": { + "description": "Runner registration scope." + } + }, + "runnerLabels": { + "type": "string", + "defaultValue": "git-ape-runner", + "metadata": { + "description": "Comma-separated runner labels. Must include the value used for the GIT_APE_RUNNER_LABEL workflow variable." + } + }, + "githubAccessToken": { + "type": "securestring", + "metadata": { + "description": "GitHub credential used to register the runner and poll the queue: a fine-grained PAT or GitHub App token. Source from Key Vault - never commit it." + } + }, + "runnerImage": { + "type": "string", + "defaultValue": "myoung34/github-runner:latest", + "metadata": { + "description": "Runner container image. Swap for a hardened internal image as needed." + } + }, + "userAssignedIdentityId": { + "type": "string", + "defaultValue": "", + "metadata": { + "description": "Resource ID of a user-assigned managed identity for the runner to access Azure. Leave empty for none." + } + }, + "infrastructureSubnetId": { + "type": "string", + "defaultValue": "", + "metadata": { + "description": "Resource ID of an infrastructure subnet for VNet injection. Leave empty for a non-VNet environment." + } + }, + "maxRunners": { + "type": "int", + "defaultValue": 10, + "metadata": { + "description": "Maximum concurrent runner executions (KEDA maxExecutions)." + } + }, + "cpuCores": { + "type": "string", + "defaultValue": "1.0", + "metadata": { + "description": "vCPU for the runner container." + } + }, + "memorySize": { + "type": "string", + "defaultValue": "2Gi", + "metadata": { + "description": "Memory for the runner container (e.g. 2Gi)." + } + } + }, + "variables": { + "isOrgScope": "[equals(parameters('runnerScope'), 'org')]", + "hasIdentity": "[not(empty(parameters('userAssignedIdentityId')))]", + "isVnet": "[not(empty(parameters('infrastructureSubnetId')))]", + "envName": "[format('{0}-env', parameters('runnerName'))]", + "ownerName": "[if(variables('isOrgScope'), parameters('githubOwnerRepo'), first(split(parameters('githubOwnerRepo'), '/')))]", + "repoName": "[if(variables('isOrgScope'), '', last(split(parameters('githubOwnerRepo'), '/')))]", + "scalerMetadataBase": { + "owner": "[variables('ownerName')]", + "runnerScope": "[parameters('runnerScope')]", + "labels": "[parameters('runnerLabels')]", + "targetWorkflowQueueLength": "1", + "applicationID": "" + }, + "scalerMetadata": "[if(variables('isOrgScope'), variables('scalerMetadataBase'), union(variables('scalerMetadataBase'), createObject('repos', variables('repoName'))))]", + "scopeEnv": "[if(variables('isOrgScope'), createArray(createObject('name', 'ORG_NAME', 'value', parameters('githubOwnerRepo'))), createArray(createObject('name', 'REPO_URL', 'value', concat('https://github.com/', parameters('githubOwnerRepo')))))]", + "baseEnv": [ + { + "name": "RUNNER_SCOPE", + "value": "[parameters('runnerScope')]" + }, + { + "name": "RUNNER_NAME_PREFIX", + "value": "[parameters('runnerName')]" + }, + { + "name": "LABELS", + "value": "[parameters('runnerLabels')]" + }, + { + "name": "EPHEMERAL", + "value": "true" + }, + { + "name": "DISABLE_AUTO_UPDATE", + "value": "true" + }, + { + "name": "ACCESS_TOKEN", + "secretRef": "github-pat" + } + ], + "containerEnv": "[concat(variables('scopeEnv'), variables('baseEnv'))]", + "identityBlock": "[if(variables('hasIdentity'), createObject('type', 'UserAssigned', 'userAssignedIdentities', createObject(parameters('userAssignedIdentityId'), createObject())), createObject('type', 'None'))]", + "vnetConfiguration": "[if(variables('isVnet'), createObject('infrastructureSubnetId', parameters('infrastructureSubnetId'), 'internal', false()), json('null'))]" + }, + "resources": [ + { + "type": "Microsoft.App/managedEnvironments", + "apiVersion": "2024-03-01", + "name": "[variables('envName')]", + "location": "[parameters('location')]", + "properties": { + "vnetConfiguration": "[variables('vnetConfiguration')]" + } + }, + { + "type": "Microsoft.App/jobs", + "apiVersion": "2024-03-01", + "name": "[parameters('runnerName')]", + "location": "[parameters('location')]", + "identity": "[variables('identityBlock')]", + "dependsOn": [ + "[resourceId('Microsoft.App/managedEnvironments', variables('envName'))]" + ], + "properties": { + "environmentId": "[resourceId('Microsoft.App/managedEnvironments', variables('envName'))]", + "configuration": { + "triggerType": "Event", + "replicaTimeout": 1800, + "replicaRetryLimit": 1, + "secrets": [ + { + "name": "github-pat", + "value": "[parameters('githubAccessToken')]" + } + ], + "eventTriggerConfig": { + "parallelism": 1, + "replicaCompletionCount": 1, + "scale": { + "minExecutions": 0, + "maxExecutions": "[parameters('maxRunners')]", + "pollingInterval": 30, + "rules": [ + { + "name": "github-runner", + "type": "github-runner", + "metadata": "[variables('scalerMetadata')]", + "auth": [ + { + "secretRef": "github-pat", + "triggerParameter": "personalAccessToken" + } + ] + } + ] + } + } + }, + "template": { + "containers": [ + { + "name": "[parameters('runnerName')]", + "image": "[parameters('runnerImage')]", + "env": "[variables('containerEnv')]", + "resources": { + "cpu": "[json(parameters('cpuCores'))]", + "memory": "[parameters('memorySize')]" + } + } + ] + } + } + } + ], + "outputs": { + "runnerJobId": { + "type": "string", + "value": "[resourceId('Microsoft.App/jobs', parameters('runnerName'))]" + }, + "runnerLabel": { + "type": "string", + "value": "[parameters('runnerLabels')]" + } + } +} diff --git a/.github/skills/git-ape-onboarding/templates/runners/aci/parameters.json b/.github/skills/git-ape-onboarding/templates/runners/aci/parameters.json new file mode 100644 index 0000000..fd2f7c0 --- /dev/null +++ b/.github/skills/git-ape-onboarding/templates/runners/aci/parameters.json @@ -0,0 +1,37 @@ +{ + "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#", + "contentVersion": "1.0.0.0", + "parameters": { + "runnerName": { + "value": "git-ape-runner" + }, + "githubOwnerRepo": { + "value": "your-org/your-repo" + }, + "runnerScope": { + "value": "repo" + }, + "runnerLabels": { + "value": "git-ape-runner" + }, + "_comment_githubAccessToken": "Do NOT inline the token. Reference it from Key Vault as shown below, or pass it at deploy time with -p githubAccessToken=$(az keyvault secret show ...).", + "githubAccessToken": { + "reference": { + "keyVault": { + "id": "/subscriptions//resourceGroups//providers/Microsoft.KeyVault/vaults/" + }, + "secretName": "github-runner-token" + } + }, + "userAssignedIdentityId": { + "value": "" + }, + "_comment_subnetId": "Leave empty for a self-hosted (subscription) runner. For a VNet-injected runner, set the resource ID of a subnet delegated to Microsoft.ContainerInstance/containerGroups.", + "subnetId": { + "value": "" + }, + "ephemeral": { + "value": true + } + } +} diff --git a/.github/skills/git-ape-onboarding/templates/runners/aci/template.json b/.github/skills/git-ape-onboarding/templates/runners/aci/template.json new file mode 100644 index 0000000..c1061ec --- /dev/null +++ b/.github/skills/git-ape-onboarding/templates/runners/aci/template.json @@ -0,0 +1,171 @@ +{ + "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#", + "contentVersion": "1.0.0.0", + "metadata": { + "_generator": "git-ape-onboarding", + "description": "Git-Ape self-hosted runner on Azure Container Instances (ACI). Provisions a container group that registers an ephemeral GitHub Actions runner with the label 'git-ape-runner' (override via parameter). Leave subnetId empty for a self-hosted (subscription) runner; set it to a delegated subnet for VNet injection. The GitHub credential is the only secret and must be sourced from Key Vault. Azure access uses an optional user-assigned managed identity, never stored keys.\n\nDeploy:\n az group create -n rg-git-ape-runners -l eastus\n az deployment group create -g rg-git-ape-runners -f template.json -p @parameters.json" + }, + "parameters": { + "location": { + "type": "string", + "defaultValue": "[resourceGroup().location]", + "metadata": { + "description": "Azure region for the runner. Defaults to the resource group location." + } + }, + "runnerName": { + "type": "string", + "defaultValue": "git-ape-runner", + "metadata": { + "description": "Name of the container group / runner instance." + } + }, + "githubOwnerRepo": { + "type": "string", + "metadata": { + "description": "GitHub target in / form for repo-scoped runners, or the org/owner name for org-scoped runners." + } + }, + "runnerScope": { + "type": "string", + "defaultValue": "repo", + "allowedValues": [ + "repo", + "org" + ], + "metadata": { + "description": "Runner registration scope." + } + }, + "runnerLabels": { + "type": "string", + "defaultValue": "git-ape-runner", + "metadata": { + "description": "Comma-separated runner labels. Must include the value used for the GIT_APE_RUNNER_LABEL workflow variable." + } + }, + "githubAccessToken": { + "type": "securestring", + "metadata": { + "description": "GitHub credential used to register the runner: a fine-grained PAT (administration:write) or GitHub App token. Source from Key Vault - never commit it." + } + }, + "runnerImage": { + "type": "string", + "defaultValue": "myoung34/github-runner:latest", + "metadata": { + "description": "Runner container image. Swap for a hardened internal image as needed." + } + }, + "userAssignedIdentityId": { + "type": "string", + "defaultValue": "", + "metadata": { + "description": "Resource ID of a user-assigned managed identity for the runner to access Azure. Leave empty for none." + } + }, + "subnetId": { + "type": "string", + "defaultValue": "", + "metadata": { + "description": "Resource ID of a delegated subnet for VNet injection. Leave empty for a non-VNet (public-egress) runner." + } + }, + "ephemeral": { + "type": "bool", + "defaultValue": true, + "metadata": { + "description": "Register the runner as ephemeral (one job per registration, then re-register). Recommended." + } + }, + "cpuCores": { + "type": "int", + "defaultValue": 2, + "metadata": { + "description": "vCPU cores for the runner container." + } + }, + "memoryInGB": { + "type": "int", + "defaultValue": 4, + "metadata": { + "description": "Memory (GB) for the runner container." + } + } + }, + "variables": { + "isOrgScope": "[equals(parameters('runnerScope'), 'org')]", + "hasIdentity": "[not(empty(parameters('userAssignedIdentityId')))]", + "isVnet": "[not(empty(parameters('subnetId')))]", + "scopeEnv": "[if(variables('isOrgScope'), createArray(createObject('name', 'ORG_NAME', 'value', parameters('githubOwnerRepo'))), createArray(createObject('name', 'REPO_URL', 'value', concat('https://github.com/', parameters('githubOwnerRepo')))))]", + "baseEnv": [ + { + "name": "RUNNER_SCOPE", + "value": "[parameters('runnerScope')]" + }, + { + "name": "RUNNER_NAME_PREFIX", + "value": "[parameters('runnerName')]" + }, + { + "name": "LABELS", + "value": "[parameters('runnerLabels')]" + }, + { + "name": "EPHEMERAL", + "value": "[if(parameters('ephemeral'), 'true', 'false')]" + }, + { + "name": "DISABLE_AUTO_UPDATE", + "value": "true" + }, + { + "name": "ACCESS_TOKEN", + "secureValue": "[parameters('githubAccessToken')]" + } + ], + "environmentVariables": "[concat(variables('scopeEnv'), variables('baseEnv'))]", + "identityBlock": "[if(variables('hasIdentity'), createObject('type', 'UserAssigned', 'userAssignedIdentities', createObject(parameters('userAssignedIdentityId'), createObject())), createObject('type', 'None'))]", + "subnetIds": "[if(variables('isVnet'), createArray(createObject('id', parameters('subnetId'))), json('null'))]" + }, + "resources": [ + { + "type": "Microsoft.ContainerInstance/containerGroups", + "apiVersion": "2023-05-01", + "name": "[parameters('runnerName')]", + "location": "[parameters('location')]", + "identity": "[variables('identityBlock')]", + "properties": { + "sku": "Standard", + "osType": "Linux", + "restartPolicy": "Always", + "containers": [ + { + "name": "[parameters('runnerName')]", + "properties": { + "image": "[parameters('runnerImage')]", + "resources": { + "requests": { + "cpu": "[parameters('cpuCores')]", + "memoryInGB": "[parameters('memoryInGB')]" + } + }, + "environmentVariables": "[variables('environmentVariables')]" + } + } + ], + "subnetIds": "[variables('subnetIds')]" + } + } + ], + "outputs": { + "runnerId": { + "type": "string", + "value": "[resourceId('Microsoft.ContainerInstance/containerGroups', parameters('runnerName'))]" + }, + "runnerLabel": { + "type": "string", + "value": "[parameters('runnerLabels')]" + } + } +} diff --git a/.github/skills/git-ape-onboarding/templates/runners/aks/README.md b/.github/skills/git-ape-onboarding/templates/runners/aks/README.md new file mode 100644 index 0000000..2b967ff --- /dev/null +++ b/.github/skills/git-ape-onboarding/templates/runners/aks/README.md @@ -0,0 +1,75 @@ +# Git-Ape self-hosted runners on AKS (Actions Runner Controller) + +On AKS, private runners are provisioned with the **Actions Runner Controller +(ARC)** using the official `gha-runner-scale-set` Helm chart. ARC runs +**ephemeral runner pods** that scale on demand and scale to zero between jobs. + +> There is no ARM-only path to install ARC — the controller and runner scale set +> are Kubernetes resources installed via Helm. The **AKS cluster itself** can be +> created with a standard Git-Ape ARM deployment (`/git-ape` → +> `Microsoft.ContainerService/managedClusters`); this folder covers the ARC layer +> that runs on top of it. + +## The label is the scale set name + +The runner scale set's name **is** the `runs-on` label. Set +`runnerScaleSetName: git-ape-runner` (below) and then set the repo variable +`GIT_APE_RUNNER_LABEL=git-ape-runner`. The two must match. + +## Prerequisites + +- An AKS cluster (self-hosted: any cluster; VNet-injected: a cluster on your + VNet/subnet, e.g. Azure CNI). +- `kubectl` context pointing at the cluster and `helm` installed. +- A GitHub credential (GitHub App recommended, or a fine-grained PAT) stored in + Key Vault. Do not commit it. + +## Install + +```bash +# 1. Install the ARC controller (once per cluster) +helm install arc \ + --namespace arc-systems --create-namespace \ + oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set-controller + +# 2. Create the GitHub credential secret from Key Vault (never commit it) +GH_TOKEN=$(az keyvault secret show --vault-name \ + --name github-runner-token --query value -o tsv) + +kubectl create namespace arc-runners +kubectl create secret generic git-ape-runner-secret \ + --namespace arc-runners \ + --from-literal=github_token="$GH_TOKEN" + +# 3. Install the runner scale set with the Git-Ape values +helm install git-ape-runner \ + --namespace arc-runners \ + -f values.yaml \ + oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set +``` + +Edit `values.yaml` first: set `githubConfigUrl` to your repo (or org) URL and, +for VNet-injected clusters, schedule runner pods onto the VNet node pool via +`template.spec.nodeSelector`. + +## Verify + +Confirm the scale set is registered in +*GitHub → Settings → Actions → Runners → Runner scale sets* (or org-level), then +set the workflow variable: + +```bash +gh variable set GIT_APE_RUNNER_LABEL --repo / --body "git-ape-runner" +``` + +Run **Git-Ape: Verify Setup** — its *Runner Configuration* step reports the +active runner mode. + +## Security notes + +- Prefer a **GitHub App** over a PAT for org-scale (`githubConfigSecret` then + carries the App id/installation id/private key instead of a token). +- Give runner pods Azure access with **AAD Workload Identity** (federated, no + stored keys) rather than mounting credentials. Git-Ape workflows still use + OIDC for `az` actions. +- Ephemeral runners are the ARC default — no state leaks between jobs. diff --git a/.github/skills/git-ape-onboarding/templates/runners/aks/values.yaml b/.github/skills/git-ape-onboarding/templates/runners/aks/values.yaml new file mode 100644 index 0000000..4374d65 --- /dev/null +++ b/.github/skills/git-ape-onboarding/templates/runners/aks/values.yaml @@ -0,0 +1,37 @@ +# Git-Ape runner scale set values for the ARC `gha-runner-scale-set` Helm chart. +# Install: +# helm install git-ape-runner -n arc-runners -f values.yaml \ +# oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set +# +# The release name and `runnerScaleSetName` become the runs-on label. Keep them +# equal to the value of the GIT_APE_RUNNER_LABEL workflow variable. + +# Repo-scoped runners: https://github.com// +# Org-scoped runners: https://github.com/ +githubConfigUrl: "https://github.com/your-org/your-repo" + +# Reference the pre-created secret (see README) sourced from Key Vault. +# For a GitHub App instead of a PAT, populate github_app_id / +# github_app_installation_id / github_app_private_key in that secret. +githubConfigSecret: git-ape-runner-secret + +# This name IS the runs-on label. Must match GIT_APE_RUNNER_LABEL. +runnerScaleSetName: git-ape-runner + +minRunners: 0 +maxRunners: 10 + +# Ephemeral runner pods (ARC default). "dind" enables Docker-in-Docker if your +# deployment steps build images; use "kubernetes" for rootless pod-per-job. +containerMode: + type: "dind" + +template: + spec: + # VNet-injected: pin runner pods to the node pool on your VNet subnet. + # nodeSelector: + # agentpool: vnetpool + containers: + - name: runner + image: ghcr.io/actions/actions-runner:latest + command: ["/home/runner/run.sh"] diff --git a/.github/skills/git-ape-onboarding/templates/workflows/git-ape-deploy.yml b/.github/skills/git-ape-onboarding/templates/workflows/git-ape-deploy.yml index 1e85b8d..bd2dd91 100644 --- a/.github/skills/git-ape-onboarding/templates/workflows/git-ape-deploy.yml +++ b/.github/skills/git-ape-onboarding/templates/workflows/git-ape-deploy.yml @@ -36,7 +36,7 @@ concurrency: jobs: detect-deployments: name: Detect deployments to execute - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} outputs: deployment_ids: ${{ steps.find.outputs.deployment_ids }} has_deployments: ${{ steps.find.outputs.has_deployments }} @@ -80,7 +80,7 @@ jobs: name: "Deploy: ${{ matrix.deployment_id }}" needs: [detect-deployments] if: needs.detect-deployments.outputs.has_deployments == 'true' - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} environment: azure-deploy strategy: matrix: diff --git a/.github/skills/git-ape-onboarding/templates/workflows/git-ape-destroy.yml b/.github/skills/git-ape-onboarding/templates/workflows/git-ape-destroy.yml index 651432b..fb4bec1 100644 --- a/.github/skills/git-ape-onboarding/templates/workflows/git-ape-destroy.yml +++ b/.github/skills/git-ape-onboarding/templates/workflows/git-ape-destroy.yml @@ -40,7 +40,7 @@ concurrency: jobs: detect-destroys: name: Detect destroy requests - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} outputs: deployment_ids: ${{ steps.find.outputs.deployment_ids }} has_destroys: ${{ steps.find.outputs.has_destroys }} @@ -130,7 +130,7 @@ jobs: name: "Destroy: ${{ matrix.deployment_id }}" needs: detect-destroys if: needs.detect-destroys.outputs.has_destroys == 'true' - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} environment: azure-destroy strategy: matrix: diff --git a/.github/skills/git-ape-onboarding/templates/workflows/git-ape-plan.yml b/.github/skills/git-ape-onboarding/templates/workflows/git-ape-plan.yml index c0d6d68..9a4eb5d 100644 --- a/.github/skills/git-ape-onboarding/templates/workflows/git-ape-plan.yml +++ b/.github/skills/git-ape-onboarding/templates/workflows/git-ape-plan.yml @@ -28,7 +28,7 @@ concurrency: jobs: detect-deployments: name: Detect changed deployments - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} outputs: deployment_ids: ${{ steps.find.outputs.deployment_ids }} has_deployments: ${{ steps.find.outputs.has_deployments }} @@ -77,7 +77,7 @@ jobs: name: "Plan Local: ${{ matrix.deployment_id }}" needs: detect-deployments if: needs.detect-deployments.outputs.has_deployments == 'true' - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} strategy: matrix: deployment_id: ${{ fromJson(needs.detect-deployments.outputs.deployment_ids) }} @@ -365,7 +365,7 @@ jobs: name: "Plan Azure: ${{ matrix.deployment_id }}" needs: detect-deployments if: needs.detect-deployments.outputs.has_deployments == 'true' - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} strategy: matrix: deployment_id: ${{ fromJson(needs.detect-deployments.outputs.deployment_ids) }} @@ -549,7 +549,7 @@ jobs: name: "Plan Comment: ${{ matrix.deployment_id }}" needs: [detect-deployments, plan-local, plan-azure] if: always() && needs.detect-deployments.outputs.has_deployments == 'true' - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} strategy: matrix: deployment_id: ${{ fromJson(needs.detect-deployments.outputs.deployment_ids) }} diff --git a/.github/skills/git-ape-onboarding/templates/workflows/git-ape-verify.yml b/.github/skills/git-ape-onboarding/templates/workflows/git-ape-verify.yml index 55770ce..8f91957 100644 --- a/.github/skills/git-ape-onboarding/templates/workflows/git-ape-verify.yml +++ b/.github/skills/git-ape-onboarding/templates/workflows/git-ape-verify.yml @@ -13,7 +13,7 @@ permissions: jobs: verify: name: Verify Git-Ape configuration - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} steps: - uses: actions/checkout@v6 @@ -139,6 +139,24 @@ jobs: fi done + - name: Check runner configuration + env: + RUNNER_LABEL: ${{ vars.GIT_APE_RUNNER_LABEL }} + run: | + echo "## Runner Configuration" + echo "" + # Git-Ape workflows resolve `runs-on` from the GIT_APE_RUNNER_LABEL + # variable, falling back to GitHub-hosted `ubuntu-latest` when unset. + if [[ -z "$RUNNER_LABEL" ]]; then + echo "ℹ️ GIT_APE_RUNNER_LABEL is unset — jobs run on GitHub-hosted runners (ubuntu-latest)." + echo " To switch to private/self-hosted runners, provision them via" + echo " @Git-Ape Onboarding and set the GIT_APE_RUNNER_LABEL variable." + else + echo "✅ GIT_APE_RUNNER_LABEL is set to '$RUNNER_LABEL' — jobs target self-hosted runners with this label." + echo " Ensure at least one online runner is registered with the '$RUNNER_LABEL' label," + echo " otherwise deployment jobs will queue indefinitely." + fi + - name: Print summary if: always() run: | diff --git a/website/docs/agents/git-ape-onboarding.md b/website/docs/agents/git-ape-onboarding.md index eef2c6e..17814a2 100644 --- a/website/docs/agents/git-ape-onboarding.md +++ b/website/docs/agents/git-ape-onboarding.md @@ -70,7 +70,7 @@ Always use the `/git-ape-onboarding` skill for procedure and command patterns. ## Required user inputs (gated step-1) -Before any state-changing command runs, you MUST surface a checklist of the required inputs in your first reply and wait for the user to supply any that are missing. Even when the user's opening prompt already names a few (e.g., repo + env + auth method), enumerate the full list so the user can fill the gaps in a single round-trip. At minimum, request the following **six** inputs (rendered as a numbered list, table, or explicit question block — never inferred silently): +Before any state-changing command runs, you MUST surface a checklist of the required inputs in your first reply and wait for the user to supply any that are missing. Even when the user's opening prompt already names a few (e.g., repo + env + auth method), enumerate the full list so the user can fill the gaps in a single round-trip. At minimum, request the following **seven** inputs (rendered as a numbered list, table, or explicit question block — never inferred silently): 1. **Target GitHub repository** — `/` plus confirmation of the default branch (assume `main`; only change if the user explicitly says otherwise — never silently substitute `master`). 2. **Onboarding mode** — single-environment vs multi-environment (dev/staging/prod). Even if the prompt names one, restate it explicitly for confirmation. @@ -78,14 +78,15 @@ Before any state-changing command runs, you MUST surface a checklist of the requ 4. **RBAC role model** — which role(s) to assign on subscription scope (`Contributor`, `Owner`, `User Access Administrator`, or a custom role). Default suggestion: `Contributor`. 5. **Default Azure region** — primary region for the workload (e.g., `eastus`, `westus2`). Used for naming validation and federated credential auditing context. 6. **Project / deployment name** — short slug used to name the App Registration (`sp--`), federated credentials (`fc---main-branch`), and downstream Git-Ape deployments. +7. **Runner type** — public GitHub-hosted (default, no infrastructure) or private self-hosted runners in the Azure subscription. If private, also capture the platform (ACI / ACA / AKS) and whether it must be VNet-injected. Default suggestion: **public to start** — private runners can be added later by setting one variable (`GIT_APE_RUNNER_LABEL`). -Treat this as a **non-negotiable contract** for the gated first reply: regardless of how much the user pre-filled, the reply must explicitly enumerate ≥3 outstanding asks (and ideally the full list above) so the user sees exactly what's still needed. Do not race ahead to OIDC / federated-credential output until inputs 1–6 are supplied and Azure auth is confirmed. +Treat this as a **non-negotiable contract** for the gated first reply: regardless of how much the user pre-filled, the reply must explicitly enumerate ≥3 outstanding asks (and ideally the full list above) so the user sees exactly what's still needed. Do not race ahead to OIDC / federated-credential output until inputs 1–7 are supplied and Azure auth is confirmed. ## Workflow 1. Confirm target repository URL **and default branch** (input #1 above). 2. Ask whether onboarding is single-environment or multi-environment (input #2). -3. Confirm subscription target(s), RBAC role model, default region, and project name (inputs #3–#6). +3. Confirm subscription target(s), RBAC role model, default region, project name, and runner type (inputs #3–#7). 4. Validate prerequisites: - `az`, `gh`, `jq` installed - Azure authenticated (`az account show`) @@ -100,7 +101,8 @@ Treat this as a **non-negotiable contract** for the gated first reply: regardles Both scripts produce byte-identical output. Report which files were created vs skipped. 9. Ask compliance framework and enforcement mode preferences (Step 10 in `/git-ape-onboarding` skill playbook). 10. Update the `## Compliance & Azure Policy` section in `.github/copilot-instructions.md` with the user's choices. If the file was skipped by the scaffold step or lacks that section, surface the captured preferences in chat for manual integration instead of mutating the file. -11. Summarize created/updated artifacts and next checks. +11. Select the runner type (input #7). If private runners were chosen, point the user at `./templates/runners//` for the reference IaC, have them provision it (sourcing the GitHub credential from Key Vault, never inlined), confirm the runner is online, and set the `GIT_APE_RUNNER_LABEL` variable. If public, leave the variable unset. (Step 11 in `/git-ape-onboarding` skill playbook.) +12. Summarize created/updated artifacts and next checks. ## Output Requirements diff --git a/website/docs/getting-started/onboarding.md b/website/docs/getting-started/onboarding.md index 9e442f4..d3d0a13 100644 --- a/website/docs/getting-started/onboarding.md +++ b/website/docs/getting-started/onboarding.md @@ -113,13 +113,14 @@ or: /git-ape-onboarding ``` -The skill collects five inputs (or uses sensible defaults): +The skill collects six inputs (or uses sensible defaults): 1. **GitHub repository URL** — for example, `https://github.com/your-org/your-repo` 2. **Entra ID App Registration name** — for example, `sp-git-ape-your-repo` 3. **Mode** — single or multi-environment 4. **Azure subscription(s)** — defaults to your current `az` subscription 5. **RBAC role(s)** — Contributor (default) or Contributor + User Access Administrator +6. **Runner type** — public GitHub-hosted (default) or private self-hosted runners in your Azure subscription (ACI / ACA / AKS, optionally VNet-injected). You can start public and switch later. ### Example: single environment @@ -467,6 +468,58 @@ This adds four workflows: | `git-ape-destroy.yml` | PR merge with `destroy-requested` status | Delete resource group | | `git-ape-verify.yml` | Manual dispatch | Verify OIDC and RBAC health | +All four workflows resolve their runner from a single variable, so they run on +public GitHub-hosted runners by default and can be pointed at private runners +later without editing the workflows: + +```yaml +runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} +``` + +--- + +### Step 6: Choose your runner (optional) + +By default the workflows run on **public GitHub-hosted `ubuntu-latest`** — no +infrastructure required. To run them on **private self-hosted runners** inside +your Azure subscription (for private connectivity or policy reasons), provision +runners and set one variable. + +| `GIT_APE_RUNNER_LABEL` | Effect | +|------------------------|--------| +| **unset** (default) | Jobs run on GitHub-hosted `ubuntu-latest`. | +| set to a label (default `git-ape-runner`) | Jobs target your self-hosted runners registered with that label. | + +**Runner types:** + +- **Self-hosted (subscription)** — runners are Azure resources with outbound + internet; control over image, region, and identity without a VNet. +- **VNet-injected** — runners run inside a subnet you manage, for private + connectivity to Azure resources (no public egress except to GitHub). + +**Platforms** (on-demand reference IaC ships with the onboarding skill under +[`templates/runners/`](https://github.com/Azure/git-ape/tree/main/.github/skills/git-ape-onboarding/templates/runners)): + +| Platform | What it provisions | +|----------|--------------------| +| **ACI** | ARM `template.json` — a container group running an ephemeral runner. | +| **ACA** | ARM `template.json` — a KEDA `github-runner`-scaled Container Apps Job (scale-to-zero). | +| **AKS** | Actions Runner Controller (ARC) via Helm `values.yaml`. | + +Once a runner is online (with the `git-ape-runner` label), flip the switch: + +```bash +# Switch to private runners +gh variable set GIT_APE_RUNNER_LABEL --repo your-org/your-repo --body "git-ape-runner" + +# Clean fallback to GitHub-hosted runners +gh variable delete GIT_APE_RUNNER_LABEL --repo your-org/your-repo +``` + +The GitHub registration credential is the only secret — source it from Key Vault, +never inline it. Azure access uses a user-assigned managed identity. Run +`@Git-Ape Onboarding` and pick a private runner to be walked through provisioning. + --- ## Verify your setup {#verify-setup} diff --git a/website/docs/skills/git-ape-onboarding.md b/website/docs/skills/git-ape-onboarding.md index d8f826e..cf0ab59 100644 --- a/website/docs/skills/git-ape-onboarding.md +++ b/website/docs/skills/git-ape-onboarding.md @@ -44,8 +44,9 @@ This skill configures: 2. OIDC federated credentials for GitHub Actions 3. RBAC role assignment(s) on subscription scope 4. GitHub environments (`azure-deploy*`, `azure-destroy`) -5. Required GitHub secrets (`AZURE_CLIENT_ID`, `AZURE_TENANT_ID`) and the `AZURE_SUBSCRIPTION_ID` variable +5. Required GitHub secrets (`AZURE_CLIENT_ID`, `AZURE_TENANT_ID`) and the `AZURE_SUBSCRIPTION_ID` variable, plus the optional `GIT_APE_RUNNER_LABEL` variable that selects private runners 6. Scaffolded GitHub Actions workflow files (`git-ape-plan.yml`, `-deploy.yml`, `-destroy.yml`, `-verify.yml`, `-drift.{md,lock.yml}`) and deployment standards (`.github/copilot-instructions.md`) into the user's working copy +7. The GitHub Actions **runner type** the workflows run on — public GitHub-hosted (default) or private self-hosted runners in your Azure subscription (ACI / ACA / AKS, optionally VNet-injected). On-demand IaC for private runners ships at `./templates/runners/`. ## Prerequisites @@ -123,11 +124,12 @@ OIDC_PREFIX="repository_owner_id::repository_id:" - `fc-azure-deploy` subject `"$OIDC_PREFIX:environment:azure-deploy"` (one per environment in multi-env mode) - `fc-azure-destroy` subject `"$OIDC_PREFIX:environment:azure-destroy"` 6. Assign RBAC on each target subscription. -7. Set GitHub repo or environment secrets (`AZURE_CLIENT_ID`, `AZURE_TENANT_ID`) and the `AZURE_SUBSCRIPTION_ID` variable. +7. Set GitHub repo or environment secrets (`AZURE_CLIENT_ID`, `AZURE_TENANT_ID`) and the `AZURE_SUBSCRIPTION_ID` variable. (The `GIT_APE_RUNNER_LABEL` variable is set later in Step 11 only if private runners are chosen.) 8. Create GitHub environments and branch policies when permissions allow. 9. Scaffold workflow files and deployment standards into the user's working copy (see below). 10. Capture compliance and Azure Policy preferences (see below). -11. Verify federated credentials, role assignments, and secrets. +11. Select the GitHub Actions runner type and, if private runners are chosen, provision them and set `GIT_APE_RUNNER_LABEL` (see below). +12. Verify federated credentials, role assignments, and secrets. ### Step 9: Scaffold workflow files and deployment standards @@ -202,6 +204,67 @@ After RBAC and environment setup, ask the user about compliance requirements and preferences and a suggested patch in chat so the user can apply it. - In all cases, leave changes unstaged and let the user commit them. +### Step 11: Runner Selection & Provisioning (optional) + +Git-Ape workflows resolve their runner from a single variable: + +```yaml +runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} +``` + +Unset → public GitHub-hosted `ubuntu-latest` (the default; no infrastructure). +Set to a label → private self-hosted runners registered with that label. This is +the **bootstrap model: start public, switch to private later with one variable.** + +1. **Ask the runner type:** + ``` + What runner should the Git-Ape workflows run on? + - Public GitHub-hosted (recommended to start — no infrastructure) + - Self-hosted in my Azure subscription + - VNet-injected (private connectivity, no public egress except to GitHub) + ``` + +2. **If public (default):** do nothing. Leave `GIT_APE_RUNNER_LABEL` unset. + Onboarding is complete; the user can switch to private runners any time by + repeating this step. + +3. **If self-hosted or VNet-injected, ask the platform:** + ``` + Which Azure platform should host the runners? + - ACI — Azure Container Instances (simplest; a handful of runners) + - ACA — Azure Container Apps (event-driven, ephemeral, scale-to-zero) + - AKS — Azure Kubernetes Service (Actions Runner Controller; large scale) + ``` + +4. **Point the user at the reference IaC** for the chosen type × platform under + `./templates/runners/` (`aci/` and `aca/` ship ARM `template.json` + + `parameters.json`; `aks/` ships an ARC Helm `values.yaml` + README). See + `./templates/runners/README.md` for the full matrix and security model. + - The GitHub registration credential is the only secret — source it from Key + Vault, never inline it. Azure access uses a user-assigned managed identity. + - For VNet-injected, set the subnet parameter (`subnetId` for ACI, + `infrastructureSubnetId` for ACA, or a VNet node pool for AKS). + - Provision with `az deployment group create -f template.json -p @parameters.json` + (ACI/ACA) or `helm install` (AKS). Do NOT add these templates to the + scaffold helper — they are on-demand only. + +5. **Confirm the runner is online** in *GitHub → Settings → Actions → Runners* + with the `git-ape-runner` label. + +6. **Set the variable** so workflows target it (repo-wide or per environment): + ```bash + gh variable set GIT_APE_RUNNER_LABEL --repo / --body "git-ape-runner" + # per environment instead: + gh variable set GIT_APE_RUNNER_LABEL --repo / --env azure-deploy --body "git-ape-runner" + ``` + Clean fallback to GitHub-hosted runners is `gh variable delete GIT_APE_RUNNER_LABEL`. + +7. **Continuous drift detection** (`git-ape-drift.lock.yml`) is a compiled gh-aw + workflow and does NOT honor `GIT_APE_RUNNER_LABEL`. To move drift onto a + private runner, set `runs-on:` in the source `git-ape-drift.md` frontmatter + and recompile with `gh aw compile` — never hand-edit the `.lock.yml` (it + carries an integrity hash). The other four workflows need no recompile. + ## Safe-Execution Rules 1. Echo target repository and subscription(s) before execution. @@ -226,7 +289,8 @@ After RBAC and environment setup, ask the user about compliance requirements and 6. Scaffold workflow files and `copilot-instructions.md` via `./scripts/scaffold-repo.sh` on macOS/Linux/WSL, or `pwsh ./scripts/scaffold-repo.ps1` on Windows (Step 9 in playbook). Report which files were created vs skipped. 7. Ask compliance framework and enforcement mode preferences (Step 10 in playbook). 8. Update `copilot-instructions.md` with compliance preferences — or, if the file was skipped by the scaffold step, surface the preferences in chat for manual integration. -9. Summarize outcome (including scaffolded file counts) and suggest verification commands. +9. Ask the runner type (and platform if private), and — if private runners are chosen — point the user at `./templates/runners/` and set `GIT_APE_RUNNER_LABEL` (Step 11 in playbook). +10. Summarize outcome (including scaffolded file counts and the chosen runner type) and suggest verification commands. ## Known Gotchas diff --git a/website/docs/workflows/git-ape-deploy.md b/website/docs/workflows/git-ape-deploy.md index 6d6b96f..8840a85 100644 --- a/website/docs/workflows/git-ape-deploy.md +++ b/website/docs/workflows/git-ape-deploy.md @@ -36,7 +36,7 @@ This workflow is **shipped as a template** under `.github/skills/git-ape-onboard | Property | Value | |----------|-------| | **Display Name** | Detect deployments to execute | -| **Runs On** | `ubuntu-latest` | +| **Runs On** | `${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }}` | | **Steps** | 2 | ### `deploy` @@ -44,7 +44,7 @@ This workflow is **shipped as a template** under `.github/skills/git-ape-onboard | Property | Value | |----------|-------| | **Display Name** | Deploy: ${{ matrix.deployment_id }} | -| **Runs On** | `ubuntu-latest` | +| **Runs On** | `${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }}` | | **Environment** | `azure-deploy` | | **Depends On** | `detect-deployments` | | **Steps** | 17 | @@ -95,7 +95,7 @@ concurrency: jobs: detect-deployments: name: Detect deployments to execute - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} outputs: deployment_ids: ${{ steps.find.outputs.deployment_ids }} has_deployments: ${{ steps.find.outputs.has_deployments }} @@ -139,7 +139,7 @@ jobs: name: "Deploy: ${{ matrix.deployment_id }}" needs: [detect-deployments] if: needs.detect-deployments.outputs.has_deployments == 'true' - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} environment: azure-deploy strategy: matrix: diff --git a/website/docs/workflows/git-ape-destroy.md b/website/docs/workflows/git-ape-destroy.md index 0da5928..1cac8c0 100644 --- a/website/docs/workflows/git-ape-destroy.md +++ b/website/docs/workflows/git-ape-destroy.md @@ -35,7 +35,7 @@ This workflow is **shipped as a template** under `.github/skills/git-ape-onboard | Property | Value | |----------|-------| | **Display Name** | Detect destroy requests | -| **Runs On** | `ubuntu-latest` | +| **Runs On** | `${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }}` | | **Steps** | 2 | ### `destroy` @@ -43,7 +43,7 @@ This workflow is **shipped as a template** under `.github/skills/git-ape-onboard | Property | Value | |----------|-------| | **Display Name** | Destroy: ${{ matrix.deployment_id }} | -| **Runs On** | `ubuntu-latest` | +| **Runs On** | `${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }}` | | **Environment** | `azure-destroy` | | **Depends On** | `detect-destroys` | | **Steps** | 9 | @@ -98,7 +98,7 @@ concurrency: jobs: detect-destroys: name: Detect destroy requests - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} outputs: deployment_ids: ${{ steps.find.outputs.deployment_ids }} has_destroys: ${{ steps.find.outputs.has_destroys }} @@ -188,7 +188,7 @@ jobs: name: "Destroy: ${{ matrix.deployment_id }}" needs: detect-destroys if: needs.detect-destroys.outputs.has_destroys == 'true' - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} environment: azure-destroy strategy: matrix: diff --git a/website/docs/workflows/git-ape-plan.md b/website/docs/workflows/git-ape-plan.md index 5898701..3947ddb 100644 --- a/website/docs/workflows/git-ape-plan.md +++ b/website/docs/workflows/git-ape-plan.md @@ -35,7 +35,7 @@ This workflow is **shipped as a template** under `.github/skills/git-ape-onboard | Property | Value | |----------|-------| | **Display Name** | Detect changed deployments | -| **Runs On** | `ubuntu-latest` | +| **Runs On** | `${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }}` | | **Steps** | 2 | ### `plan-local` @@ -43,7 +43,7 @@ This workflow is **shipped as a template** under `.github/skills/git-ape-onboard | Property | Value | |----------|-------| | **Display Name** | Plan Local: ${{ matrix.deployment_id }} | -| **Runs On** | `ubuntu-latest` | +| **Runs On** | `${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }}` | | **Depends On** | `detect-deployments` | | **Steps** | 12 | @@ -52,7 +52,7 @@ This workflow is **shipped as a template** under `.github/skills/git-ape-onboard | Property | Value | |----------|-------| | **Display Name** | Plan Azure: ${{ matrix.deployment_id }} | -| **Runs On** | `ubuntu-latest` | +| **Runs On** | `${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }}` | | **Depends On** | `detect-deployments` | | **Steps** | 8 | @@ -61,7 +61,7 @@ This workflow is **shipped as a template** under `.github/skills/git-ape-onboard | Property | Value | |----------|-------| | **Display Name** | Plan Comment: ${{ matrix.deployment_id }} | -| **Runs On** | `ubuntu-latest` | +| **Runs On** | `${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }}` | | **Depends On** | `detect-deployments`, `plan-local`, `plan-azure` | | **Steps** | 3 | @@ -103,7 +103,7 @@ concurrency: jobs: detect-deployments: name: Detect changed deployments - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} outputs: deployment_ids: ${{ steps.find.outputs.deployment_ids }} has_deployments: ${{ steps.find.outputs.has_deployments }} @@ -152,7 +152,7 @@ jobs: name: "Plan Local: ${{ matrix.deployment_id }}" needs: detect-deployments if: needs.detect-deployments.outputs.has_deployments == 'true' - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} strategy: matrix: deployment_id: ${{ fromJson(needs.detect-deployments.outputs.deployment_ids) }} @@ -440,7 +440,7 @@ jobs: name: "Plan Azure: ${{ matrix.deployment_id }}" needs: detect-deployments if: needs.detect-deployments.outputs.has_deployments == 'true' - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} strategy: matrix: deployment_id: ${{ fromJson(needs.detect-deployments.outputs.deployment_ids) }} @@ -624,7 +624,7 @@ jobs: name: "Plan Comment: ${{ matrix.deployment_id }}" needs: [detect-deployments, plan-local, plan-azure] if: always() && needs.detect-deployments.outputs.has_deployments == 'true' - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} strategy: matrix: deployment_id: ${{ fromJson(needs.detect-deployments.outputs.deployment_ids) }} diff --git a/website/docs/workflows/git-ape-verify.md b/website/docs/workflows/git-ape-verify.md index fa42233..f60482b 100644 --- a/website/docs/workflows/git-ape-verify.md +++ b/website/docs/workflows/git-ape-verify.md @@ -32,8 +32,8 @@ This workflow is **shipped as a template** under `.github/skills/git-ape-onboard | Property | Value | |----------|-------| | **Display Name** | Verify Git-Ape configuration | -| **Runs On** | `ubuntu-latest` | -| **Steps** | 6 | +| **Runs On** | `${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }}` | +| **Steps** | 7 | @@ -58,7 +58,7 @@ permissions: jobs: verify: name: Verify Git-Ape configuration - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} steps: - uses: actions/checkout@v6 @@ -184,6 +184,24 @@ jobs: fi done + - name: Check runner configuration + env: + RUNNER_LABEL: ${{ vars.GIT_APE_RUNNER_LABEL }} + run: | + echo "## Runner Configuration" + echo "" + # Git-Ape workflows resolve `runs-on` from the GIT_APE_RUNNER_LABEL + # variable, falling back to GitHub-hosted `ubuntu-latest` when unset. + if [[ -z "$RUNNER_LABEL" ]]; then + echo "ℹ️ GIT_APE_RUNNER_LABEL is unset — jobs run on GitHub-hosted runners (ubuntu-latest)." + echo " To switch to private/self-hosted runners, provision them via" + echo " @Git-Ape Onboarding and set the GIT_APE_RUNNER_LABEL variable." + else + echo "✅ GIT_APE_RUNNER_LABEL is set to '$RUNNER_LABEL' — jobs target self-hosted runners with this label." + echo " Ensure at least one online runner is registered with the '$RUNNER_LABEL' label," + echo " otherwise deployment jobs will queue indefinitely." + fi + - name: Print summary if: always() run: | From 80d3956d4dfc846b226a879d8eaae9ebf4cdc5d4 Mon Sep 17 00:00:00 2001 From: Arnaud Lheureux Date: Wed, 17 Jun 2026 14:40:08 +0800 Subject: [PATCH 2/8] feat: add custom runner Dockerfile and document provisioning learnings - Add Dockerfile based on ghcr.io/actions/runner (GitHub official) with az, gh, jq - Update ACA/ACI templates: default image warning about missing tools - Document KEDA cold-start workaround (minExecutions=1) - Add Known Gotchas: missing tools, KEDA cold start, stale workflow files - Expand Step 11 with full provisioning flow (ACR + image + registry creds) - Update mermaid diagram to include ACR/image build step - Remove all references to community image myoung34/github-runner Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .github/skills/git-ape-onboarding/SKILL.md | 118 +++++++++++++++--- .../templates/runners/Dockerfile | 50 ++++++++ .../templates/runners/README.md | 102 ++++++++++++--- .../templates/runners/aca/template.json | 4 +- .../templates/runners/aci/template.json | 4 +- 5 files changed, 243 insertions(+), 35 deletions(-) create mode 100644 .github/skills/git-ape-onboarding/templates/runners/Dockerfile diff --git a/.github/skills/git-ape-onboarding/SKILL.md b/.github/skills/git-ape-onboarding/SKILL.md index af4d0f1..fb6a8f9 100644 --- a/.github/skills/git-ape-onboarding/SKILL.md +++ b/.github/skills/git-ape-onboarding/SKILL.md @@ -219,22 +219,58 @@ the **bootstrap model: start public, switch to private later with one variable.* - AKS — Azure Kubernetes Service (Actions Runner Controller; large scale) ``` -4. **Point the user at the reference IaC** for the chosen type × platform under - `./templates/runners/` (`aci/` and `aca/` ship ARM `template.json` + - `parameters.json`; `aks/` ships an ARC Helm `values.yaml` + README). See - `./templates/runners/README.md` for the full matrix and security model. +4. **Build the custom runner image.** The base `ghcr.io/actions/runner:latest` + (GitHub's official runner image) does **NOT** include `az`, `gh`, or `jq`. + Workflows will fail with `Unable to locate executable file: az` without a + custom image. + ```bash + # Create ACR (one-time) + az acr create --name --resource-group --location --sku Basic --admin-enabled true + + # Build and push image (runs in Azure, ~3 min, no local Docker needed) + az acr build --registry --image git-ape-runner:latest \ + --file ./templates/runners/Dockerfile ./templates/runners/ + ``` + The `Dockerfile` at `./templates/runners/Dockerfile` extends the base runner + with all Git-Ape prerequisites (`az`, `gh`, `jq`, `git`). + +5. **Deploy the runner infrastructure** using the chosen platform template. + Pass the custom image via the `runnerImage` parameter: + ```bash + az deployment group create -g -f template.json \ + -p runnerImage='.azurecr.io/git-ape-runner:latest' \ + githubOwnerRepo='/' \ + githubAccessToken='' + ``` - The GitHub registration credential is the only secret — source it from Key Vault, never inline it. Azure access uses a user-assigned managed identity. - For VNet-injected, set the subnet parameter (`subnetId` for ACI, `infrastructureSubnetId` for ACA, or a VNet node pool for AKS). - - Provision with `az deployment group create -f template.json -p @parameters.json` - (ACI/ACA) or `helm install` (AKS). Do NOT add these templates to the - scaffold helper — they are on-demand only. + - For AKS, use `helm install` instead of ARM. + - Do NOT add these templates to the scaffold helper — they are on-demand only. -5. **Confirm the runner is online** in *GitHub → Settings → Actions → Runners* - with the `git-ape-runner` label. +6. **Configure ACR pull credentials** on the ACA/ACI job (if using ACR): + ```bash + az containerapp job registry set --name git-ape-runner --resource-group \ + --server .azurecr.io --username \ + --password $(az acr credential show -n --query "passwords[0].value" -o tsv) + ``` -6. **Set the variable** so workflows target it (repo-wide or per environment): +7. **Set `minExecutions=1`** (recommended) so at least one runner is always + warm and visible in GitHub Settings. Without this, KEDA scale-from-zero can + take 1–3 minutes on cold start, during which GitHub shows "No runners + configured": + ```bash + az containerapp job update --name git-ape-runner --resource-group --min-executions 1 + ``` + Leave at `0` only if you prefer true scale-to-zero and can tolerate cold-start + delays. + +8. **Confirm the runner is online** in *GitHub → Settings → Actions → Runners* + with the `git-ape-runner` label. (With `minExecutions=1`, a runner should + appear within 30–60 seconds of deployment.) + +9. **Set the variable** so workflows target it (repo-wide or per environment): ```bash gh variable set GIT_APE_RUNNER_LABEL --repo / --body "git-ape-runner" # per environment instead: @@ -242,11 +278,14 @@ the **bootstrap model: start public, switch to private later with one variable.* ``` Clean fallback to GitHub-hosted runners is `gh variable delete GIT_APE_RUNNER_LABEL`. -7. **Continuous drift detection** (`git-ape-drift.lock.yml`) is a compiled gh-aw - workflow and does NOT honor `GIT_APE_RUNNER_LABEL`. To move drift onto a - private runner, set `runs-on:` in the source `git-ape-drift.md` frontmatter - and recompile with `gh aw compile` — never hand-edit the `.lock.yml` (it - carries an integrity hash). The other four workflows need no recompile. +10. **Verify** by triggering `Git-Ape: Verify Setup` and confirming all steps + pass on the private runner (especially "Test OIDC login" which requires `az`). + +11. **Continuous drift detection** (`git-ape-drift.lock.yml`) is a compiled gh-aw + workflow and does NOT honor `GIT_APE_RUNNER_LABEL`. To move drift onto a + private runner, set `runs-on:` in the source `git-ape-drift.md` frontmatter + and recompile with `gh aw compile` — never hand-edit the `.lock.yml` (it + carries an integrity hash). The other four workflows need no recompile. ## Safe-Execution Rules @@ -272,11 +311,56 @@ the **bootstrap model: start public, switch to private later with one variable.* 6. Scaffold workflow files and `copilot-instructions.md` via `./scripts/scaffold-repo.sh` on macOS/Linux/WSL, or `pwsh ./scripts/scaffold-repo.ps1` on Windows (Step 9 in playbook). Report which files were created vs skipped. 7. Ask compliance framework and enforcement mode preferences (Step 10 in playbook). 8. Update `copilot-instructions.md` with compliance preferences — or, if the file was skipped by the scaffold step, surface the preferences in chat for manual integration. -9. Ask the runner type (and platform if private), and — if private runners are chosen — point the user at `./templates/runners/` and set `GIT_APE_RUNNER_LABEL` (Step 11 in playbook). -10. Summarize outcome (including scaffolded file counts and the chosen runner type) and suggest verification commands. +9. Ask the runner type (and platform if private), and — if private runners are chosen — provision the full stack: ACR + custom image + ACA/ACI deployment + `minExecutions=1` + registry credentials + `GIT_APE_RUNNER_LABEL` (Step 11 in playbook). +10. **Verify** by triggering `Git-Ape: Verify Setup` and confirming ALL steps pass on the private runner. +11. Summarize outcome (including scaffolded file counts and the chosen runner type) and suggest verification commands. ## Known Gotchas +### Default runner image lacks required tools + +The base image `ghcr.io/actions/runner:latest` (GitHub's official runner) is a +**minimal** self-hosted runner — it does NOT include `az`, `gh`, or `jq`. If you +deploy without the custom image, workflows will fail with: + +``` +Error: Unable to locate executable file: az +``` + +**Fix:** Always build and use the custom image from `./templates/runners/Dockerfile`. +The onboarding flow must: +1. Create an ACR (`az acr create`) +2. Build the image (`az acr build --image git-ape-runner:latest`) +3. Configure pull credentials on the ACA/ACI job (`az containerapp job registry set`) +4. Set the `runnerImage` parameter to the ACR image + +### KEDA scale-from-zero cold start + +With `minExecutions=0` (the default), KEDA's `github-runner` scaler polls the +GitHub Actions queue every 30 seconds. On a fresh deployment or after long idle +periods, the first job can wait 1–3 minutes before a runner spins up. During +this time: +- GitHub shows the job as "Waiting for a runner to pick up this job" +- The Settings → Runners page shows "No runners configured" (ephemeral runners + only register while executing) + +**Fix:** Set `minExecutions=1` to keep one runner always warm. This costs +~$30–50/month on the Consumption plan but eliminates cold-start delays and +ensures a runner is always visible in GitHub Settings. + +### Stale workflow files in target repos + +If the target repo was onboarded before the `GIT_APE_RUNNER_LABEL` pattern was +introduced, its workflow files may have hardcoded `runs-on: ubuntu-latest`. The +private runner will never pick up jobs because workflows don't request its label. + +**Fix:** The scaffold helper (`scaffold-repo.sh` / `.ps1`) skips existing files. +To update stale workflows, the agent must either: +1. Detect the stale pattern (`grep 'runs-on: ubuntu-latest'`) and offer to + update all 4 workflow files with the dynamic pattern, OR +2. Advise the user to manually replace `runs-on: ubuntu-latest` with + `runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }}` in each job. + ### GitHub Org Custom OIDC Subject Template (e.g. Azure org) Some GitHub organizations (notably the `Azure` org) override the default OIDC subject diff --git a/.github/skills/git-ape-onboarding/templates/runners/Dockerfile b/.github/skills/git-ape-onboarding/templates/runners/Dockerfile new file mode 100644 index 0000000..93daa18 --- /dev/null +++ b/.github/skills/git-ape-onboarding/templates/runners/Dockerfile @@ -0,0 +1,50 @@ +# Git-Ape self-hosted runner image +# +# Extends the official GitHub Actions runner with tools required by Git-Ape +# workflows: +# - az (Azure CLI) +# - gh (GitHub CLI) +# - jq (JSON processor) +# - git (already in base image) +# +# Base image: ghcr.io/actions/runner (GitHub's official runner image, Ubuntu-based) +# The official image includes the runner binary and basic OS tools but does NOT +# include az, gh, or jq. Without this custom image, workflows fail with +# "Unable to locate executable file: az". +# +# Build with ACR Tasks (no local Docker required): +# az acr build --registry --image git-ape-runner:latest . +# +# Or locally: +# docker build -t git-ape-runner:latest . + +FROM ghcr.io/actions/runner:latest + +USER root + +# Install Azure CLI +RUN curl -sL https://aka.ms/InstallAzureCLIDeb | bash + +# Install GitHub CLI +RUN curl -fsSL https://cli.github.com/packages/githubcli-archive-keyring.gpg \ + | dd of=/usr/share/keyrings/githubcli-archive-keyring.gpg \ + && chmod go+r /usr/share/keyrings/githubcli-archive-keyring.gpg \ + && echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" \ + | tee /etc/apt/sources.list.d/github-cli-stable.list > /dev/null \ + && apt-get update \ + && apt-get install -y --no-install-recommends gh \ + && apt-get clean && rm -rf /var/lib/apt/lists/* + +# Install jq (git is already included in the base image) +RUN apt-get update \ + && apt-get install -y --no-install-recommends jq \ + && apt-get clean && rm -rf /var/lib/apt/lists/* + +# Switch back to the runner user +USER runner + +# Verify all required tools are present +RUN az version --output table \ + && gh --version \ + && jq --version \ + && git --version diff --git a/.github/skills/git-ape-onboarding/templates/runners/README.md b/.github/skills/git-ape-onboarding/templates/runners/README.md index ce0dedd..a7e0de9 100644 --- a/.github/skills/git-ape-onboarding/templates/runners/README.md +++ b/.github/skills/git-ape-onboarding/templates/runners/README.md @@ -61,6 +61,76 @@ use them. | **ACA** | You want **event-driven, ephemeral** runners that scale to zero between jobs (KEDA `github-runner` scaler). Best cost/utilization. | | **AKS** | You already run AKS, need large-scale autoscaling, or want ARC's ephemeral runner pods and fine-grained scheduling. | +## Custom runner image (required) + +> **⚠️ The base `ghcr.io/actions/runner:latest` (GitHub's official runner image) +> does NOT include `az`, `gh`, or `jq`.** Git-Ape workflows will fail with +> `Unable to locate executable file: az` if you use it directly. + +You **must** build and use the custom image from the [`Dockerfile`](./Dockerfile) +in this directory. It extends the base runner with all Git-Ape prerequisites. + +### Build with ACR Tasks (recommended — no local Docker required) + +```bash +# Create an ACR (one-time) +az acr create --name --resource-group --location --sku Basic --admin-enabled true + +# Build and push the image (runs in Azure, ~3 min) +az acr build --registry --image git-ape-runner:latest \ + --file .github/skills/git-ape-onboarding/templates/runners/Dockerfile \ + .github/skills/git-ape-onboarding/templates/runners/ + +# Configure ACR pull credentials on the ACA job +az containerapp job registry set --name git-ape-runner --resource-group \ + --server .azurecr.io \ + --username --password $(az acr credential show -n --query "passwords[0].value" -o tsv) + +# Update the job to use the custom image +az containerapp job update --name git-ape-runner --resource-group \ + --image .azurecr.io/git-ape-runner:latest +``` + +### Or pass it at deploy time + +```bash +az deployment group create -g -f template.json \ + -p runnerImage='.azurecr.io/git-ape-runner:latest' \ + githubOwnerRepo='org/repo' \ + githubAccessToken='...' +``` + +### Tools included in the custom image + +| Tool | Minimum version | Purpose | +|------|----------------|---------| +| `az` | 2.50+ | Azure CLI — OIDC login, deployments, resource management | +| `gh` | 2.0+ | GitHub CLI — PR comments, workflow dispatch | +| `jq` | 1.6+ | JSON processing in shell scripts | +| `git` | (any) | Checkout, commit state files | + +## KEDA cold-start considerations + +The KEDA `github-runner` scaler polls the GitHub Actions queue every 30 seconds. +On a fresh deployment, there can be a delay of 1–3 minutes before KEDA detects +queued jobs and spins up a runner. During this window, GitHub shows the job as +"Waiting for a runner" and the Settings page shows "No runners configured" +(ephemeral runners only exist during job execution). + +**Recommendations:** + +- **Set `minExecutions=1`** if you want at least one runner always warm and + visible in GitHub Settings. This eliminates cold-start delays at the cost of + one always-running container (~$30–50/month on Consumption plan). + ```bash + az containerapp job update --name git-ape-runner --resource-group --min-executions 1 + ``` +- **Leave `minExecutions=0`** (default) for true scale-to-zero if you can + tolerate 1–3 minute cold starts. Runners will appear in GitHub only while + jobs are executing. +- **Fine-grained PATs** work with the KEDA scaler but require + `administration:write` permission on the target repo. + ## Security model - **Azure access uses a managed identity, never secrets.** Each template @@ -88,26 +158,30 @@ use them. ```mermaid flowchart LR - A[Choose type + platform] --> B[Copy template into
.azure/runners/] - B --> C[Provide GitHub creds
via Key Vault] - C --> D[Deploy IaC
az deployment / helm] - D --> E[Runner registers
with label git-ape-runner] - E --> F[Set GIT_APE_RUNNER_LABEL
variable] - F --> G[Workflows now run
on private runners] - G -.clean fallback.-> H[Unset variable →
back to ubuntu-latest] + A[Choose type + platform] --> B[Create ACR +
build custom image] + B --> C[Copy template into
.azure/runners/] + C --> D[Provide GitHub creds
via Key Vault] + D --> E[Deploy IaC
az deployment / helm] + E --> F[Set minExecutions=1
+ registry creds] + F --> G[Runner registers
with label git-ape-runner] + G --> H[Set GIT_APE_RUNNER_LABEL
variable] + H --> I[Workflows now run
on private runners] + I -.clean fallback.-> J[Unset variable →
back to ubuntu-latest] ``` 1. **Choose** the runner type and platform (the `/git-ape-onboarding` flow asks). -2. **Copy** the chosen platform folder into your repo under +2. **Create an ACR** and build the custom runner image (see above). +3. **Copy** the chosen platform folder into your repo under `.azure/runners//` and edit parameters for your repo/org, region, - labels, and (for VNet-injected) the target `subnetId`. -3. **Store the GitHub credential** in Key Vault and reference it from the + labels, image, and (for VNet-injected) the target `subnetId`. +4. **Store the GitHub credential** in Key Vault and reference it from the secure parameter — do not commit it. -4. **Deploy** the runner infrastructure (see each platform's README/notes). -5. **Confirm** the runner is **online** in +5. **Deploy** the runner infrastructure (see each platform's README/notes). +6. **Confirm** the runner is **online** in *GitHub → Settings → Actions → Runners* with the `git-ape-runner` label. -6. **Set** `GIT_APE_RUNNER_LABEL=git-ape-runner` (repo or per-environment). -7. **Verify** by running the `Git-Ape: Verify Setup` workflow — its *Runner + (With `minExecutions=0`, the runner only appears while a job is running.) +7. **Set** `GIT_APE_RUNNER_LABEL=git-ape-runner` (repo or per-environment). +8. **Verify** by running the `Git-Ape: Verify Setup` workflow — its *Runner Configuration* step reports the active runner mode. ## Note on the drift workflow diff --git a/.github/skills/git-ape-onboarding/templates/runners/aca/template.json b/.github/skills/git-ape-onboarding/templates/runners/aca/template.json index cdaf36a..009bd1e 100644 --- a/.github/skills/git-ape-onboarding/templates/runners/aca/template.json +++ b/.github/skills/git-ape-onboarding/templates/runners/aca/template.json @@ -52,9 +52,9 @@ }, "runnerImage": { "type": "string", - "defaultValue": "myoung34/github-runner:latest", + "defaultValue": "ghcr.io/actions/runner:latest", "metadata": { - "description": "Runner container image. Swap for a hardened internal image as needed." + "description": "Runner container image. IMPORTANT: The default image does NOT include az, gh, or jq. Build the custom image from ../Dockerfile and push to your ACR, then set this parameter to '.azurecr.io/git-ape-runner:latest'." } }, "userAssignedIdentityId": { diff --git a/.github/skills/git-ape-onboarding/templates/runners/aci/template.json b/.github/skills/git-ape-onboarding/templates/runners/aci/template.json index c1061ec..1fd22a1 100644 --- a/.github/skills/git-ape-onboarding/templates/runners/aci/template.json +++ b/.github/skills/git-ape-onboarding/templates/runners/aci/template.json @@ -52,9 +52,9 @@ }, "runnerImage": { "type": "string", - "defaultValue": "myoung34/github-runner:latest", + "defaultValue": "ghcr.io/actions/runner:latest", "metadata": { - "description": "Runner container image. Swap for a hardened internal image as needed." + "description": "Runner container image. IMPORTANT: The default image does NOT include az, gh, or jq. Build the custom image from ../Dockerfile and push to your ACR, then set this parameter to '.azurecr.io/git-ape-runner:latest'." } }, "userAssignedIdentityId": { From fe22bb3fe3517daba705196a03c1ed42ba9de0ac Mon Sep 17 00:00:00 2001 From: Arnaud Lheureux Date: Wed, 17 Jun 2026 16:06:38 +0800 Subject: [PATCH 3/8] feat: add hosted compute networking to runner onboarding MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Replace 'VNet-injected' terminology with 'hosted compute networking' (GitHub's official term for GitHub-managed runners in Azure VNet) - Add full Step 11a playbook: Azure networking → GitHub.Network resource → network config (using GitHubId tag) → runner group → hosted runner - Consolidate all required GitHub token scopes into single upfront auth call (admin:org, admin:enterprise, manage_runners:org, read:enterprise, write:network_configurations) to avoid repeated device-code prompts - Ask org vs enterprise scope upfront (determines businessId and API paths) - Add 4 new Known Gotchas documenting non-obvious API behaviors: - network_settings_ids expects GitHubId tag, not Azure resource ID - businessId is immutable and scope-specific - Repeated auth prompts from missing scopes - Image/size IDs are numeric/GitHub-specific - Restructure README: Option 1 (hosted compute) vs Option 2 (self-hosted) - Keep self-hosted (ACI/ACA/AKS) as Step 11b for custom image scenarios Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .github/skills/git-ape-onboarding/SKILL.md | 258 ++++++++++++++++-- .../templates/runners/README.md | 220 +++++++++++++-- 2 files changed, 436 insertions(+), 42 deletions(-) diff --git a/.github/skills/git-ape-onboarding/SKILL.md b/.github/skills/git-ape-onboarding/SKILL.md index fb6a8f9..edd2315 100644 --- a/.github/skills/git-ape-onboarding/SKILL.md +++ b/.github/skills/git-ape-onboarding/SKILL.md @@ -29,7 +29,7 @@ This skill configures: 4. GitHub environments (`azure-deploy*`, `azure-destroy`) 5. Required GitHub secrets (`AZURE_CLIENT_ID`, `AZURE_TENANT_ID`) and the `AZURE_SUBSCRIPTION_ID` variable, plus the optional `GIT_APE_RUNNER_LABEL` variable that selects private runners 6. Scaffolded GitHub Actions workflow files (`git-ape-plan.yml`, `-deploy.yml`, `-destroy.yml`, `-verify.yml`, `-drift.{md,lock.yml}`) and deployment standards (`.github/copilot-instructions.md`) into the user's working copy -7. The GitHub Actions **runner type** the workflows run on — public GitHub-hosted (default) or private self-hosted runners in your Azure subscription (ACI / ACA / AKS, optionally VNet-injected). On-demand IaC for private runners ships at `./templates/runners/`. +7. The GitHub Actions **runner type** the workflows run on — public GitHub-hosted (default), **hosted compute networking** (GitHub-managed runners with Azure private networking, requires GHEC), or self-hosted runners in your Azure subscription (ACI / ACA / AKS). On-demand IaC for private runners ships at `./templates/runners/`. ## Prerequisites @@ -111,7 +111,7 @@ OIDC_PREFIX="repository_owner_id::repository_id:" 8. Create GitHub environments and branch policies when permissions allow. 9. Scaffold workflow files and deployment standards into the user's working copy (see below). 10. Capture compliance and Azure Policy preferences (see below). -11. Select the GitHub Actions runner type and, if private runners are chosen, provision them and set `GIT_APE_RUNNER_LABEL` (see below). +11. Select the GitHub Actions runner type (public / hosted compute networking / self-hosted) and, if private runners are chosen, provision them and set `GIT_APE_RUNNER_LABEL` (see below). 12. Verify federated credentials, role assignments, and secrets. ### Step 9: Scaffold workflow files and deployment standards @@ -196,22 +196,188 @@ runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} ``` Unset → public GitHub-hosted `ubuntu-latest` (the default; no infrastructure). -Set to a label → private self-hosted runners registered with that label. This is -the **bootstrap model: start public, switch to private later with one variable.** +Set to a label → private runners with that label. This is the **bootstrap model: +start public, switch to private later with one variable.** 1. **Ask the runner type:** ``` What runner should the Git-Ape workflows run on? - Public GitHub-hosted (recommended to start — no infrastructure) - - Self-hosted in my Azure subscription - - VNet-injected (private connectivity, no public egress except to GitHub) + - Hosted compute networking (GitHub-managed runners in your Azure VNet — requires GHEC) + - Self-hosted in my Azure subscription (you manage compute, image, scaling) ``` 2. **If public (default):** do nothing. Leave `GIT_APE_RUNNER_LABEL` unset. Onboarding is complete; the user can switch to private runners any time by repeating this step. -3. **If self-hosted or VNet-injected, ask the platform:** +3. **If hosted compute networking:** + Follow the hosted compute sub-flow (Step 11a below). + +4. **If self-hosted:** + Follow the self-hosted sub-flow (Step 11b below). + +--- + +### Step 11a: Hosted Compute Networking (GitHub-managed, Azure private networking) + +GitHub-hosted runners with Azure private networking. GitHub manages the compute +(full Ubuntu images with `az`, `gh`, `jq`, `git` pre-installed), runners execute +inside your Azure VNet for private connectivity. + +**Prerequisites:** GitHub Enterprise Cloud. + +**Reference:** [About networking for hosted compute products](https://docs.github.com/en/enterprise-cloud@latest/admin/configuring-settings/configuring-private-networking-for-hosted-compute-products/about-networking-for-hosted-compute-products-in-your-enterprise) + +#### a. Consolidate GitHub auth scopes first + +Before starting provisioning, authenticate with **all required scopes in one +call** to avoid repeated auth prompts: + +```bash +gh auth refresh -h github.com -s admin:org,admin:enterprise,manage_runners:org,read:enterprise,write:network_configurations +``` + +| Scope | Purpose | +|-------|---------| +| `admin:org` | Create org-level runner groups, assign repos | +| `admin:enterprise` | Enterprise-level runner groups and hosted runners | +| `manage_runners:org` | Create/manage hosted runners | +| `read:enterprise` | Query enterprise metadata (databaseId, org membership) | +| `write:network_configurations` | Create network configurations | + +#### b. Ask scope: organization or enterprise + +``` +Where should the network configuration live? +- Enterprise level (shared across all orgs in the enterprise) +- Organization level (scoped to this org only) +``` + +| Scope | `businessId` value | UI location | +|-------|-------------------|-------------| +| **Enterprise** | Enterprise `databaseId` (from GraphQL) | Enterprise Settings → Hosted compute networking | +| **Organization** | Org numeric ID (REST: `.id` field) | Org Settings → Hosted compute networking | + +Query the needed ID: +```bash +# Enterprise databaseId (for enterprise scope): +gh api graphql -f query='{enterprise(slug: "") { databaseId }}' --jq '.data.enterprise.databaseId' + +# Org numeric ID (for org scope): +gh api orgs/ --jq '.id' +``` + +#### c. Provision Azure networking + +1. Create resource group and VNet with a `/28` subnet (minimum 16 IPs): + ```bash + az group create --name --location + az network vnet create --name --resource-group \ + --address-prefix 10.0.0.0/16 --subnet-name snet-runners --subnet-prefix 10.0.0.0/28 + ``` + +2. Delegate subnet to `GitHub.Network/networkSettings`: + ```bash + az network vnet subnet update --name snet-runners --vnet-name \ + --resource-group --delegations GitHub.Network/networkSettings + ``` + +3. Register `GitHub.Network` resource provider: + ```bash + az provider register --namespace GitHub.Network + # Wait until Registered: + az provider show --namespace GitHub.Network --query "registrationState" -o tsv + ``` + +4. Create `GitHub.Network/networkSettings` resource: + ```bash + az rest --method PUT \ + --url "https://management.azure.com/subscriptions//resourceGroups//providers/GitHub.Network/networkSettings/?api-version=2024-04-02" \ + --body '{ + "location": "", + "properties": { + "businessId": "", + "subnetId": "" + } + }' + ``` + ⚠️ **`businessId` is immutable.** If wrong, you must delete and recreate. + +5. Extract the `GitHubId` tag from the resource — this is the ID GitHub uses: + ```bash + az rest --method GET \ + --url "https://management.azure.com/subscriptions//resourceGroups//providers/GitHub.Network/networkSettings/?api-version=2024-04-02" \ + --query "tags.GitHubId" -o tsv + ``` + +#### d. Create GitHub network configuration + +Use the **`GitHubId` tag value** (NOT the Azure resource ID): + +```bash +# Enterprise scope: +gh api --method POST enterprises//network-configurations \ + -f name="" -f compute_service="actions" \ + -f network_settings_ids[]="" + +# Organization scope: +gh api --method POST orgs//settings/network-configurations \ + -f name="" -f compute_service="actions" \ + -f network_settings_ids[]="" +``` + +Save the returned `id` — needed for the runner group. + +#### e. Create runner group and hosted runner + +```bash +# Enterprise scope: +gh api --method POST enterprises//actions/runner-groups \ + -f name="" -f visibility="selected" \ + -F allows_public_repositories=false \ + -f network_configuration_id="" + +# Assign the org to the enterprise runner group: +gh api --method PUT enterprises//actions/runner-groups//organizations/ + +# For enterprise groups: also assign the repo at org level (inherited group ID): +gh api orgs//actions/runner-groups --jq '.runner_groups[] | select(.name=="") | .id' +gh api --method PUT orgs//actions/runner-groups//repositories/ +``` + +```bash +# Query available images and sizes: +gh api orgs//actions/hosted-runners/images/github-owned --jq '.images[] | {id, display_name, platform}' +gh api orgs//actions/hosted-runners/machine-sizes --jq '.machine_specs[:5] | .[] | {id, cpu_cores, memory_gb}' + +# Create hosted runner (image IDs are NUMERIC, sizes are like "4-core"): +echo '{"name":"","runner_group_id":,"platform":"linux-x64","image":{"id":"","source":"github"},"size":"4-core","maximum_runners":5}' | \ + gh api --method POST enterprises//actions/hosted-runners --input - +``` + +Wait for `status: "Ready"`: +```bash +gh api enterprises//actions/hosted-runners --jq '.runners[] | {name, status}' +``` + +#### f. Set variable and verify + +```bash +gh variable set GIT_APE_RUNNER_LABEL --repo / --body "" +gh workflow run git-ape-verify.yml --repo / +``` + +Confirm all steps pass — no custom image needed, GitHub provides everything. + +--- + +### Step 11b: Self-Hosted Runners (ACI / ACA / AKS) + +Self-hosted runners run in your Azure subscription. You manage compute, image, +scaling, and networking. + +1. **Ask the platform:** ``` Which Azure platform should host the runners? - ACI — Azure Container Instances (simplest; a handful of runners) @@ -219,7 +385,7 @@ the **bootstrap model: start public, switch to private later with one variable.* - AKS — Azure Kubernetes Service (Actions Runner Controller; large scale) ``` -4. **Build the custom runner image.** The base `ghcr.io/actions/runner:latest` +2. **Build the custom runner image.** The base `ghcr.io/actions/runner:latest` (GitHub's official runner image) does **NOT** include `az`, `gh`, or `jq`. Workflows will fail with `Unable to locate executable file: az` without a custom image. @@ -234,7 +400,7 @@ the **bootstrap model: start public, switch to private later with one variable.* The `Dockerfile` at `./templates/runners/Dockerfile` extends the base runner with all Git-Ape prerequisites (`az`, `gh`, `jq`, `git`). -5. **Deploy the runner infrastructure** using the chosen platform template. +3. **Deploy the runner infrastructure** using the chosen platform template. Pass the custom image via the `runnerImage` parameter: ```bash az deployment group create -g -f template.json \ @@ -244,19 +410,18 @@ the **bootstrap model: start public, switch to private later with one variable.* ``` - The GitHub registration credential is the only secret — source it from Key Vault, never inline it. Azure access uses a user-assigned managed identity. - - For VNet-injected, set the subnet parameter (`subnetId` for ACI, + - For private networking, set the subnet parameter (`subnetId` for ACI, `infrastructureSubnetId` for ACA, or a VNet node pool for AKS). - For AKS, use `helm install` instead of ARM. - - Do NOT add these templates to the scaffold helper — they are on-demand only. -6. **Configure ACR pull credentials** on the ACA/ACI job (if using ACR): +4. **Configure ACR pull credentials** on the ACA/ACI job (if using ACR): ```bash az containerapp job registry set --name git-ape-runner --resource-group \ --server .azurecr.io --username \ --password $(az acr credential show -n --query "passwords[0].value" -o tsv) ``` -7. **Set `minExecutions=1`** (recommended) so at least one runner is always +5. **Set `minExecutions=1`** (recommended) so at least one runner is always warm and visible in GitHub Settings. Without this, KEDA scale-from-zero can take 1–3 minutes on cold start, during which GitHub shows "No runners configured": @@ -266,11 +431,11 @@ the **bootstrap model: start public, switch to private later with one variable.* Leave at `0` only if you prefer true scale-to-zero and can tolerate cold-start delays. -8. **Confirm the runner is online** in *GitHub → Settings → Actions → Runners* +6. **Confirm the runner is online** in *GitHub → Settings → Actions → Runners* with the `git-ape-runner` label. (With `minExecutions=1`, a runner should appear within 30–60 seconds of deployment.) -9. **Set the variable** so workflows target it (repo-wide or per environment): +7. **Set the variable** so workflows target it (repo-wide or per environment): ```bash gh variable set GIT_APE_RUNNER_LABEL --repo / --body "git-ape-runner" # per environment instead: @@ -278,10 +443,10 @@ the **bootstrap model: start public, switch to private later with one variable.* ``` Clean fallback to GitHub-hosted runners is `gh variable delete GIT_APE_RUNNER_LABEL`. -10. **Verify** by triggering `Git-Ape: Verify Setup` and confirming all steps +8. **Verify** by triggering `Git-Ape: Verify Setup` and confirming all steps pass on the private runner (especially "Test OIDC login" which requires `az`). -11. **Continuous drift detection** (`git-ape-drift.lock.yml`) is a compiled gh-aw +9. **Continuous drift detection** (`git-ape-drift.lock.yml`) is a compiled gh-aw workflow and does NOT honor `GIT_APE_RUNNER_LABEL`. To move drift onto a private runner, set `runs-on:` in the source `git-ape-drift.md` frontmatter and recompile with `gh aw compile` — never hand-edit the `.lock.yml` (it @@ -311,13 +476,68 @@ the **bootstrap model: start public, switch to private later with one variable.* 6. Scaffold workflow files and `copilot-instructions.md` via `./scripts/scaffold-repo.sh` on macOS/Linux/WSL, or `pwsh ./scripts/scaffold-repo.ps1` on Windows (Step 9 in playbook). Report which files were created vs skipped. 7. Ask compliance framework and enforcement mode preferences (Step 10 in playbook). 8. Update `copilot-instructions.md` with compliance preferences — or, if the file was skipped by the scaffold step, surface the preferences in chat for manual integration. -9. Ask the runner type (and platform if private), and — if private runners are chosen — provision the full stack: ACR + custom image + ACA/ACI deployment + `minExecutions=1` + registry credentials + `GIT_APE_RUNNER_LABEL` (Step 11 in playbook). +9. Ask the runner type (and platform/scope if private), and — if private runners are chosen — provision the full stack. For **hosted compute networking**: consolidate gh auth scopes → ask org vs enterprise scope → provision Azure VNet + subnet → create GitHub.Network/networkSettings → create network config + runner group + hosted runner → assign repo → set `GIT_APE_RUNNER_LABEL` (Step 11a). For **self-hosted**: ACR + custom image + ACA/ACI deployment + `minExecutions=1` + registry credentials + `GIT_APE_RUNNER_LABEL` (Step 11b). 10. **Verify** by triggering `Git-Ape: Verify Setup` and confirming ALL steps pass on the private runner. 11. Summarize outcome (including scaffolded file counts and the chosen runner type) and suggest verification commands. ## Known Gotchas -### Default runner image lacks required tools +### Hosted compute: `network_settings_ids` expects the GitHubId tag, not the Azure resource ID + +When creating a GitHub network configuration, the `network_settings_ids` field +expects the **`GitHubId` tag value** (a SHA-256 hash assigned by GitHub to the +Azure `GitHub.Network/networkSettings` resource), NOT the Azure resource ID path. + +```bash +# ❌ WRONG — Azure resource ID +-f network_settings_ids[]="/subscriptions/.../providers/GitHub.Network/networkSettings/my-resource" + +# ✅ CORRECT — GitHubId tag value from the Azure resource +-f network_settings_ids[]="FA1AD85973374477AF8C49119ADEA731EFD4B9BD6B7764A8FCD6B036CBA796F3" +``` + +Extract the GitHubId after creating the Azure resource: +```bash +az rest --method GET \ + --url "https://management.azure.com/subscriptions//resourceGroups//providers/GitHub.Network/networkSettings/?api-version=2024-04-02" \ + --query "tags.GitHubId" -o tsv +``` + +### Hosted compute: `businessId` is immutable and scope-specific + +The `businessId` on `GitHub.Network/networkSettings` determines whether the +resource works at enterprise or organization scope: +- **Enterprise scope:** use the enterprise `databaseId` (query via GraphQL) +- **Organization scope:** use the org's numeric ID (query via REST `.id` field) + +If wrong, the GitHub API returns `"The business ID is invalid or does not match"`. +The property is **immutable** — you cannot update it; you must delete and recreate. + +### Hosted compute: repeated auth prompts from missing scopes + +The hosted compute provisioning flow requires **5 distinct GitHub token scopes** +(`admin:org`, `admin:enterprise`, `manage_runners:org`, `read:enterprise`, +`write:network_configurations`). If not collected upfront, each missing scope +triggers a separate `gh auth refresh` device-code flow. + +**Fix:** Always consolidate auth at the start of Step 11a: +```bash +gh auth refresh -h github.com -s admin:org,admin:enterprise,manage_runners:org,read:enterprise,write:network_configurations +``` + +### Hosted compute: image and size IDs are GitHub-specific + +The hosted runners API uses **numeric image IDs** (e.g., `"2295"` = Ubuntu 24.04) +and **GitHub-specific size IDs** (e.g., `"4-core"`, `"8-core"`), not Azure VM SKU +names or Ubuntu version strings. + +Always query available options first: +```bash +gh api orgs//actions/hosted-runners/images/github-owned --jq '.images[] | {id, display_name}' +gh api orgs//actions/hosted-runners/machine-sizes --jq '.machine_specs[:10] | .[] | {id, cpu_cores, memory_gb}' +``` + +### Default runner image lacks required tools (self-hosted only) The base image `ghcr.io/actions/runner:latest` (GitHub's official runner) is a **minimal** self-hosted runner — it does NOT include `az`, `gh`, or `jq`. If you diff --git a/.github/skills/git-ape-onboarding/templates/runners/README.md b/.github/skills/git-ape-onboarding/templates/runners/README.md index a7e0de9..53bca8b 100644 --- a/.github/skills/git-ape-onboarding/templates/runners/README.md +++ b/.github/skills/git-ape-onboarding/templates/runners/README.md @@ -1,15 +1,17 @@ -# Git-Ape self-hosted runner templates +# Git-Ape Private Runner Templates -These are **reference Infrastructure-as-Code templates** for provisioning private -GitHub Actions runners that execute the Git-Ape deployment workflows -(`git-ape-plan`, `-deploy`, `-destroy`, `-verify`) inside **your** Azure -subscription instead of on GitHub-hosted runners. +These are **reference templates** for provisioning private GitHub Actions +runners that execute the Git-Ape deployment workflows (`git-ape-plan`, +`-deploy`, `-destroy`, `-verify`) with private network connectivity. -They are **not** scaffolded into your repository automatically. The -`/git-ape-onboarding` flow copies and customizes the template for the runner -type and platform you choose, then provisions it. The bootstrap model is: +Git-Ape supports two private runner strategies: -> **Start on public runners, switch to private runners later — with one variable.** +| Strategy | Who manages compute? | Infrastructure you manage | Best for | +|----------|---------------------|--------------------------|----------| +| **Hosted compute networking** | GitHub | Azure VNet + subnet only | Private connectivity with zero runner management | +| **Self-hosted runners** | You | Full runner stack (ACI/ACA/AKS + image + scaling) | Custom images, air-gapped, compliance constraints | + +> **Bootstrap model: Start on public runners, switch to private later — with one variable.** ## The runner switch: `GIT_APE_RUNNER_LABEL` @@ -22,15 +24,19 @@ runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} | `GIT_APE_RUNNER_LABEL` | Effect | |------------------------|--------| | **unset** (default) | Jobs run on GitHub-hosted `ubuntu-latest`. No infrastructure. | -| set to a label (default `git-ape-runner`) | Jobs target your self-hosted runners registered with that label. | +| Set to a hosted runner name (e.g. `git-ape-vnet-4vcpu`) | Jobs run on GitHub-hosted compute with Azure private networking. | +| Set to a self-hosted label (e.g. `git-ape-runner`) | Jobs target your self-hosted runners registered with that label. | Switching is a one-line change and is fully reversible: ```bash -# Switch to private runners (after they are provisioned and online) +# Switch to hosted compute networking runner +gh variable set GIT_APE_RUNNER_LABEL --repo / --body "git-ape-vnet-4vcpu" + +# Switch to self-hosted runners gh variable set GIT_APE_RUNNER_LABEL --repo / --body "git-ape-runner" -# Clean fallback to GitHub-hosted runners +# Clean fallback to GitHub-hosted runners (public) gh variable delete GIT_APE_RUNNER_LABEL --repo / ``` @@ -38,20 +44,188 @@ In multi-environment mode, set the variable per environment (`--env azure-deploy-prod`) so only the environments that need private runners use them. -## Runner type × platform matrix +--- + +## Option 1: Hosted Compute Networking (recommended) + +**GitHub-hosted runners with Azure private networking.** GitHub manages the +compute (Ubuntu VMs with all standard tools pre-installed), but the runners +execute inside your Azure VNet for private connectivity to your resources. + +> **Requires:** GitHub Enterprise Cloud. No custom image, no ACR, no KEDA — +> GitHub provides full Ubuntu images with `az`, `gh`, `jq`, `git` pre-installed. + +**Reference:** +[About networking for hosted compute products](https://docs.github.com/en/enterprise-cloud@latest/admin/configuring-settings/configuring-private-networking-for-hosted-compute-products/about-networking-for-hosted-compute-products-in-your-enterprise) + +### Scope: Organization vs Enterprise + +Hosted compute network configurations can be created at two levels: + +| Scope | `businessId` value | API endpoint | UI location | +|-------|-------------------|--------------|-------------| +| **Enterprise** | Enterprise `databaseId` (from GraphQL) | `enterprises/{slug}/network-configurations` | Enterprise Settings → Hosted compute networking | +| **Organization** | Org numeric ID (from REST API) | `orgs/{org}/settings/network-configurations` | Organization Settings → Hosted compute networking | + +Enterprise-scoped configs can be shared across all orgs in the enterprise. +Organization-scoped configs are independent (requires enterprise policy to allow). + +### Provisioning flow + +```mermaid +flowchart LR + A[Create Azure VNet
+ /28 subnet] --> B[Delegate subnet to
GitHub.Network/networkSettings] + B --> C[Register GitHub.Network
resource provider] + C --> D[Create networkSettings
Azure resource] + D --> E[Create network config
via GitHub API] + E --> F[Create runner group
linked to network config] + F --> G[Create hosted runner
in runner group] + G --> H[Assign org/repo
to runner group] + H --> I[Set GIT_APE_RUNNER_LABEL
= runner name] +``` + +### Step-by-step + +1. **Create Azure VNet and subnet** (minimum `/28` — 16 IPs): + ```bash + az group create --name --location + az network vnet create --name --resource-group \ + --address-prefix 10.0.0.0/16 --subnet-name snet-runners --subnet-prefix 10.0.0.0/28 + ``` + +2. **Delegate subnet** to `GitHub.Network/networkSettings`: + ```bash + az network vnet subnet update --name snet-runners --vnet-name \ + --resource-group --delegations GitHub.Network/networkSettings + ``` + +3. **Register the `GitHub.Network` resource provider** on the subscription: + ```bash + az provider register --namespace GitHub.Network + az provider show --namespace GitHub.Network --query "registrationState" -o tsv + # Wait until "Registered" + ``` + +4. **Create the `GitHub.Network/networkSettings` resource:** + ```bash + # businessId = enterprise databaseId (enterprise scope) or org numeric ID (org scope) + az rest --method PUT \ + --url "https://management.azure.com/subscriptions//resourceGroups//providers/GitHub.Network/networkSettings/?api-version=2024-04-02" \ + --body '{ + "location": "", + "properties": { + "businessId": "", + "subnetId": "/subscriptions//resourceGroups//providers/Microsoft.Network/virtualNetworks//subnets/snet-runners" + } + }' + ``` + ⚠️ **`businessId` is immutable** — if wrong, you must delete and recreate the resource. + + The resource will have a `GitHubId` tag (a SHA-256 hash) — this is the ID + GitHub uses to reference the network settings. + +5. **Create the network configuration** on GitHub (use the `GitHubId` tag value, + NOT the Azure resource ID): + ```bash + # Enterprise scope: + gh api --method POST enterprises//network-configurations \ + -f name="" \ + -f compute_service="actions" \ + -f network_settings_ids[]="" + + # Organization scope: + gh api --method POST orgs//settings/network-configurations \ + -f name="" \ + -f compute_service="actions" \ + -f network_settings_ids[]="" + ``` + +6. **Create a runner group** linked to the network configuration: + ```bash + # Enterprise scope: + gh api --method POST enterprises//actions/runner-groups \ + -f name="" -f visibility="selected" \ + -F allows_public_repositories=false \ + -f network_configuration_id="" + + # Organization scope: + gh api --method POST orgs//actions/runner-groups \ + -f name="" -f visibility="selected" \ + -F allows_public_repositories=false \ + -f network_configuration_id="" + ``` + +7. **Assign org/repo to the runner group:** + ```bash + # Enterprise: assign org + gh api --method PUT enterprises//actions/runner-groups//organizations/ + # Org: assign repo (for inherited enterprise groups, use the inherited group ID at org level) + gh api --method PUT orgs//actions/runner-groups//repositories/ + ``` + +8. **Create a hosted runner** in the group: + ```bash + # Query available images and sizes first: + gh api orgs//actions/hosted-runners/images/github-owned + gh api orgs//actions/hosted-runners/machine-sizes + + # Create runner (image IDs are NUMERIC, sizes are like "4-core"): + echo '{"name":"","runner_group_id":,"platform":"linux-x64","image":{"id":"","source":"github"},"size":"4-core","maximum_runners":5}' | \ + gh api --method POST enterprises//actions/hosted-runners --input - + ``` + +9. **Set the variable:** + ```bash + gh variable set GIT_APE_RUNNER_LABEL --repo / --body "" + ``` + +10. **Verify** by triggering `Git-Ape: Verify Setup`. + +### Key facts + +- **No custom image needed** — GitHub's hosted compute uses full Ubuntu images + with all standard tools (`az`, `gh`, `jq`, `git`, Docker, etc.) +- **No KEDA, no cold start** — runners are always available (status: "Ready") +- **`network_settings_ids`** expects the `GitHubId` tag value (SHA-256 hash + from the Azure resource), NOT the Azure resource ID +- **Image IDs are numeric** (e.g., `"2295"` for Ubuntu 24.04) — query them via + `GET orgs/{org}/actions/hosted-runners/images/github-owned` +- **Size IDs** are GitHub-specific (e.g., `"4-core"`, `"8-core"`) — query via + `GET orgs/{org}/actions/hosted-runners/machine-sizes` +- **`businessId` is immutable** on the Azure resource — getting it wrong means + delete + recreate + +### Required GitHub token scopes + +All scopes must be present **before** starting provisioning to avoid repeated +auth prompts: + +| Scope | Purpose | +|-------|---------| +| `admin:org` | Create runner groups, assign repos | +| `admin:enterprise` | Enterprise-level runner groups and hosted runners | +| `manage_runners:org` | Create/manage hosted runners | +| `read:enterprise` | Query enterprise metadata (databaseId) | +| `write:network_configurations` | Create network configurations | + +```bash +# Authenticate once with all required scopes: +gh auth refresh -h github.com -s admin:org,admin:enterprise,manage_runners:org,read:enterprise,write:network_configurations +``` + +--- + +## Option 2: Self-Hosted Runners (ACI / ACA / AKS) + +Self-hosted runners run in **your** Azure subscription. You manage the compute, +image, scaling, and networking. + +### Platform matrix | | **Azure Container Instances (ACI)** | **Azure Container Apps (ACA)** | **Azure Kubernetes Service (AKS)** | |---|---|---|---| -| **Self-hosted (subscription)** | [`aci/`](./aci) — single container group, simplest | [`aca/`](./aca) — KEDA-scaled ephemeral jobs | [`aks/`](./aks) — Actions Runner Controller (ARC) | -| **VNet-injected** | [`aci/`](./aci) with `subnetId` set | [`aca/`](./aca) with `infrastructureSubnetId` set | [`aks/`](./aks) — runners on cluster node subnet | - -- **Self-hosted (subscription)** — runners are Azure resources in your - subscription with outbound internet. Gives you control over image, region, - and identity without managing a VNet. -- **VNet-injected** — runners run inside a subnet of a VNet you manage, for - workloads that need private connectivity to Azure resources (private - endpoints, no public egress except to GitHub). Choose this when deployments - must reach VNet-isolated targets or when policy forbids public runners. +| **Basic** | [`aci/`](./aci) — single container group, simplest | [`aca/`](./aca) — KEDA-scaled ephemeral jobs | [`aks/`](./aks) — Actions Runner Controller (ARC) | +| **With private networking** | [`aci/`](./aci) with `subnetId` set | [`aca/`](./aca) with `infrastructureSubnetId` set | [`aks/`](./aks) — runners on cluster node subnet | ### Which platform? From e5e8a0261a53fb9a7fd93008810149525177761e Mon Sep 17 00:00:00 2001 From: Arnaud Lheureux Date: Wed, 17 Jun 2026 16:51:29 +0800 Subject: [PATCH 4/8] Fix onboarding agent step cross-references after merge renumbering The merge that combined the drift-detector (Step 10) and runner (Step 12) features renumbered the SKILL.md playbook, but the agent.md cross-references were left pointing at the pre-merge numbers. Compliance is now Step 11 (was Step 10) and runner selection is Step 12 (was Step 11). Regenerated docs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .github/agents/git-ape-onboarding.agent.md | 4 ++-- website/docs/agents/git-ape-onboarding.md | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/.github/agents/git-ape-onboarding.agent.md b/.github/agents/git-ape-onboarding.agent.md index d33a58f..75a3842 100644 --- a/.github/agents/git-ape-onboarding.agent.md +++ b/.github/agents/git-ape-onboarding.agent.md @@ -72,9 +72,9 @@ Treat this as a **non-negotiable contract** for the gated first reply: regardles - macOS / Linux / WSL: `./scripts/scaffold-repo.sh` - Windows (PowerShell 7+): `pwsh ./scripts/scaffold-repo.ps1` Both scripts produce byte-identical output. Report which files were created vs skipped. -9. Ask compliance framework and enforcement mode preferences (Step 10 in `/git-ape-onboarding` skill playbook). +9. Ask compliance framework and enforcement mode preferences (Step 11 in `/git-ape-onboarding` skill playbook). 10. Update the `## Compliance & Azure Policy` section in `.github/copilot-instructions.md` with the user's choices. If the file was skipped by the scaffold step or lacks that section, surface the captured preferences in chat for manual integration instead of mutating the file. -11. Select the runner type (input #7). If private runners were chosen, point the user at `./templates/runners//` for the reference IaC, have them provision it (sourcing the GitHub credential from Key Vault, never inlined), confirm the runner is online, and set the `GIT_APE_RUNNER_LABEL` variable. If public, leave the variable unset. (Step 11 in `/git-ape-onboarding` skill playbook.) +11. Select the runner type (input #7). If private runners were chosen, point the user at `./templates/runners//` for the reference IaC, have them provision it (sourcing the GitHub credential from Key Vault, never inlined), confirm the runner is online, and set the `GIT_APE_RUNNER_LABEL` variable. If public, leave the variable unset. (Step 12 in `/git-ape-onboarding` skill playbook.) 12. Summarize created/updated artifacts and next checks. ## Output Requirements diff --git a/website/docs/agents/git-ape-onboarding.md b/website/docs/agents/git-ape-onboarding.md index 17814a2..882ae69 100644 --- a/website/docs/agents/git-ape-onboarding.md +++ b/website/docs/agents/git-ape-onboarding.md @@ -99,9 +99,9 @@ Treat this as a **non-negotiable contract** for the gated first reply: regardles - macOS / Linux / WSL: `./scripts/scaffold-repo.sh` - Windows (PowerShell 7+): `pwsh ./scripts/scaffold-repo.ps1` Both scripts produce byte-identical output. Report which files were created vs skipped. -9. Ask compliance framework and enforcement mode preferences (Step 10 in `/git-ape-onboarding` skill playbook). +9. Ask compliance framework and enforcement mode preferences (Step 11 in `/git-ape-onboarding` skill playbook). 10. Update the `## Compliance & Azure Policy` section in `.github/copilot-instructions.md` with the user's choices. If the file was skipped by the scaffold step or lacks that section, surface the captured preferences in chat for manual integration instead of mutating the file. -11. Select the runner type (input #7). If private runners were chosen, point the user at `./templates/runners//` for the reference IaC, have them provision it (sourcing the GitHub credential from Key Vault, never inlined), confirm the runner is online, and set the `GIT_APE_RUNNER_LABEL` variable. If public, leave the variable unset. (Step 11 in `/git-ape-onboarding` skill playbook.) +11. Select the runner type (input #7). If private runners were chosen, point the user at `./templates/runners//` for the reference IaC, have them provision it (sourcing the GitHub credential from Key Vault, never inlined), confirm the runner is online, and set the `GIT_APE_RUNNER_LABEL` variable. If public, leave the variable unset. (Step 12 in `/git-ape-onboarding` skill playbook.) 12. Summarize created/updated artifacts and next checks. ## Output Requirements From 3319d54add0db83f92661acd46dd55556b6ad8bc Mon Sep 17 00:00:00 2001 From: Arnaud Lheureux Date: Wed, 17 Jun 2026 17:05:20 +0800 Subject: [PATCH 5/8] Fix self-hosted runner image so runners actually register MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The ACI/ACA templates drive registration through the env-var contract of a containerized runner (ACCESS_TOKEN, REPO_URL/ORG_NAME, RUNNER_SCOPE, LABELS, EPHEMERAL, ...) with no command override, but the image they pointed at — ghcr.io/actions/runner:latest — does not exist (404 on ghcr) and the official runner image ships no registration entrypoint. Result: runners never came online on ACI/ACA. Fixes, keeping everything on GitHub-official images (no third-party base): - Add entrypoint.sh that exchanges the PAT/App token for a registration token, configures an ephemeral runner, and deregisters on shutdown. Honors the same env-var contract the templates already set. - Dockerfile: base on the real ghcr.io/actions/actions-runner:latest, install curl/ca-certificates for the entrypoint, COPY + wire ENTRYPOINT. - ACI/ACA templates: correct runnerImage default to actions-runner and clarify that the custom image is required (tools + registration). - AKS (consistency): values.yaml now uses the custom image with imagePullSecrets (stock image lacks az/gh/jq); README documents the ACR build + pull secret. ARC overrides the command, so the baked entrypoint is unused on AKS. - Update README/SKILL image references and regenerate docs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .github/skills/git-ape-onboarding/SKILL.md | 17 +-- .../templates/runners/Dockerfile | 28 +++-- .../templates/runners/README.md | 11 +- .../templates/runners/aca/template.json | 4 +- .../templates/runners/aci/template.json | 4 +- .../templates/runners/aks/README.md | 33 +++++- .../templates/runners/aks/values.yaml | 14 ++- .../templates/runners/entrypoint.sh | 104 ++++++++++++++++++ website/docs/skills/git-ape-onboarding.md | 17 +-- 9 files changed, 198 insertions(+), 34 deletions(-) create mode 100644 .github/skills/git-ape-onboarding/templates/runners/entrypoint.sh diff --git a/.github/skills/git-ape-onboarding/SKILL.md b/.github/skills/git-ape-onboarding/SKILL.md index 61e6ecc..5280fa7 100644 --- a/.github/skills/git-ape-onboarding/SKILL.md +++ b/.github/skills/git-ape-onboarding/SKILL.md @@ -441,9 +441,10 @@ scaling, and networking. - AKS — Azure Kubernetes Service (Actions Runner Controller; large scale) ``` -2. **Build the custom runner image.** The base `ghcr.io/actions/runner:latest` - (GitHub's official runner image) does **NOT** include `az`, `gh`, or `jq`. - Workflows will fail with `Unable to locate executable file: az` without a +2. **Build the custom runner image.** The base `ghcr.io/actions/actions-runner:latest` + (GitHub's official runner image) does **NOT** include `az`, `gh`, or `jq`, and + ships no registration entrypoint. Workflows fail with `Unable to locate + executable file: az` — and on ACI/ACA the runner never registers — without a custom image. ```bash # Create ACR (one-time) @@ -454,7 +455,8 @@ scaling, and networking. --file ./templates/runners/Dockerfile ./templates/runners/ ``` The `Dockerfile` at `./templates/runners/Dockerfile` extends the base runner - with all Git-Ape prerequisites (`az`, `gh`, `jq`, `git`). + with all Git-Ape prerequisites (`az`, `gh`, `jq`, `git`) and an `entrypoint.sh` + that self-registers the runner on ACI/ACA (on AKS, ARC handles registration). 3. **Deploy the runner infrastructure** using the chosen platform template. Pass the custom image via the `runnerImage` parameter: @@ -599,9 +601,10 @@ gh api orgs//actions/hosted-runners/machine-sizes --jq '.machine_specs[:10] ### Default runner image lacks required tools (self-hosted only) -The base image `ghcr.io/actions/runner:latest` (GitHub's official runner) is a -**minimal** self-hosted runner — it does NOT include `az`, `gh`, or `jq`. If you -deploy without the custom image, workflows will fail with: +The base image `ghcr.io/actions/actions-runner:latest` (GitHub's official runner) +is a **minimal** self-hosted runner — it does NOT include `az`, `gh`, or `jq`, and +ships no registration entrypoint. If you deploy without the custom image, the +runner never registers on ACI/ACA and workflows fail with: ``` Error: Unable to locate executable file: az diff --git a/.github/skills/git-ape-onboarding/templates/runners/Dockerfile b/.github/skills/git-ape-onboarding/templates/runners/Dockerfile index 93daa18..4563f57 100644 --- a/.github/skills/git-ape-onboarding/templates/runners/Dockerfile +++ b/.github/skills/git-ape-onboarding/templates/runners/Dockerfile @@ -1,15 +1,18 @@ # Git-Ape self-hosted runner image # -# Extends the official GitHub Actions runner with tools required by Git-Ape -# workflows: +# Extends the official GitHub Actions runner with the tools Git-Ape workflows need +# plus a registration entrypoint so the runner self-registers on standalone +# container hosts (ACI, ACA Jobs): # - az (Azure CLI) # - gh (GitHub CLI) # - jq (JSON processor) # - git (already in base image) +# - entrypoint.sh (PAT -> registration-token exchange + config.sh/run.sh) # -# Base image: ghcr.io/actions/runner (GitHub's official runner image, Ubuntu-based) -# The official image includes the runner binary and basic OS tools but does NOT -# include az, gh, or jq. Without this custom image, workflows fail with +# Base image: ghcr.io/actions/actions-runner (GitHub's official runner image, +# Ubuntu-based). It includes the runner binary but does NOT include az, gh, or jq, +# and ships no registration entrypoint. Without this custom image, container hosts +# never register a runner and workflows fail with # "Unable to locate executable file: az". # # Build with ACR Tasks (no local Docker required): @@ -18,7 +21,7 @@ # Or locally: # docker build -t git-ape-runner:latest . -FROM ghcr.io/actions/runner:latest +FROM ghcr.io/actions/actions-runner:latest USER root @@ -35,16 +38,25 @@ RUN curl -fsSL https://cli.github.com/packages/githubcli-archive-keyring.gpg \ && apt-get install -y --no-install-recommends gh \ && apt-get clean && rm -rf /var/lib/apt/lists/* -# Install jq (git is already included in the base image) +# Install jq plus curl/ca-certificates used by the registration entrypoint +# (git is already included in the base image). RUN apt-get update \ - && apt-get install -y --no-install-recommends jq \ + && apt-get install -y --no-install-recommends jq curl ca-certificates \ && apt-get clean && rm -rf /var/lib/apt/lists/* +# Registration entrypoint (self-registers the runner on ACI/ACA; bypassed on +# AKS, where ARC overrides the container command). +COPY entrypoint.sh /home/runner/entrypoint.sh +RUN chmod +x /home/runner/entrypoint.sh + # Switch back to the runner user USER runner +WORKDIR /home/runner # Verify all required tools are present RUN az version --output table \ && gh --version \ && jq --version \ && git --version + +ENTRYPOINT ["/home/runner/entrypoint.sh"] diff --git a/.github/skills/git-ape-onboarding/templates/runners/README.md b/.github/skills/git-ape-onboarding/templates/runners/README.md index 53bca8b..0184d0f 100644 --- a/.github/skills/git-ape-onboarding/templates/runners/README.md +++ b/.github/skills/git-ape-onboarding/templates/runners/README.md @@ -237,12 +237,15 @@ image, scaling, and networking. ## Custom runner image (required) -> **⚠️ The base `ghcr.io/actions/runner:latest` (GitHub's official runner image) -> does NOT include `az`, `gh`, or `jq`.** Git-Ape workflows will fail with -> `Unable to locate executable file: az` if you use it directly. +> **⚠️ The base `ghcr.io/actions/actions-runner:latest` (GitHub's official runner +> image) does NOT include `az`, `gh`, or `jq`, and ships no registration +> entrypoint.** Git-Ape workflows will fail with +> `Unable to locate executable file: az` — and on ACI/ACA the runner never even +> registers — if you use it directly. You **must** build and use the custom image from the [`Dockerfile`](./Dockerfile) -in this directory. It extends the base runner with all Git-Ape prerequisites. +in this directory. It extends the base runner with all Git-Ape prerequisites and +an [`entrypoint.sh`](./entrypoint.sh) that self-registers the runner on ACI/ACA. ### Build with ACR Tasks (recommended — no local Docker required) diff --git a/.github/skills/git-ape-onboarding/templates/runners/aca/template.json b/.github/skills/git-ape-onboarding/templates/runners/aca/template.json index 009bd1e..2ab7e54 100644 --- a/.github/skills/git-ape-onboarding/templates/runners/aca/template.json +++ b/.github/skills/git-ape-onboarding/templates/runners/aca/template.json @@ -52,9 +52,9 @@ }, "runnerImage": { "type": "string", - "defaultValue": "ghcr.io/actions/runner:latest", + "defaultValue": "ghcr.io/actions/actions-runner:latest", "metadata": { - "description": "Runner container image. IMPORTANT: The default image does NOT include az, gh, or jq. Build the custom image from ../Dockerfile and push to your ACR, then set this parameter to '.azurecr.io/git-ape-runner:latest'." + "description": "Runner container image. IMPORTANT: the stock image shown here neither includes az/gh/jq nor self-registers a runner. Build the custom image from ../Dockerfile (which adds those tools and a registration entrypoint) and push it to your ACR, then set this parameter to '.azurecr.io/git-ape-runner:latest'." } }, "userAssignedIdentityId": { diff --git a/.github/skills/git-ape-onboarding/templates/runners/aci/template.json b/.github/skills/git-ape-onboarding/templates/runners/aci/template.json index 1fd22a1..40cc041 100644 --- a/.github/skills/git-ape-onboarding/templates/runners/aci/template.json +++ b/.github/skills/git-ape-onboarding/templates/runners/aci/template.json @@ -52,9 +52,9 @@ }, "runnerImage": { "type": "string", - "defaultValue": "ghcr.io/actions/runner:latest", + "defaultValue": "ghcr.io/actions/actions-runner:latest", "metadata": { - "description": "Runner container image. IMPORTANT: The default image does NOT include az, gh, or jq. Build the custom image from ../Dockerfile and push to your ACR, then set this parameter to '.azurecr.io/git-ape-runner:latest'." + "description": "Runner container image. IMPORTANT: the stock image shown here neither includes az/gh/jq nor self-registers a runner. Build the custom image from ../Dockerfile (which adds those tools and a registration entrypoint) and push it to your ACR, then set this parameter to '.azurecr.io/git-ape-runner:latest'." } }, "userAssignedIdentityId": { diff --git a/.github/skills/git-ape-onboarding/templates/runners/aks/README.md b/.github/skills/git-ape-onboarding/templates/runners/aks/README.md index 2b967ff..c39090e 100644 --- a/.github/skills/git-ape-onboarding/templates/runners/aks/README.md +++ b/.github/skills/git-ape-onboarding/templates/runners/aks/README.md @@ -23,6 +23,25 @@ The runner scale set's name **is** the `runs-on` label. Set - `kubectl` context pointing at the cluster and `helm` installed. - A GitHub credential (GitHub App recommended, or a fine-grained PAT) stored in Key Vault. Do not commit it. +- A **custom runner image** (see below) pushed to a registry the cluster can pull. + +## Custom runner image (required) + +Like the ACI/ACA paths, AKS runner pods need `az`, `gh`, and `jq` — the stock +`ghcr.io/actions/actions-runner` image has none of them, so Git-Ape steps fail +with `Unable to locate executable file: az`. Build the custom image from the +shared [`Dockerfile`](../Dockerfile) and push it to your ACR: + +```bash +az acr create --name --resource-group --location --sku Basic --admin-enabled true +az acr build --registry --image git-ape-runner:latest \ + --file ../Dockerfile .. +``` + +Set `template.spec.containers[0].image` in `values.yaml` to +`.azurecr.io/git-ape-runner:latest`. ARC overrides the container +command with `run.sh`, so the image's self-register entrypoint is unused on AKS +(the controller registers pods) — but the tools are still required. ## Install @@ -41,15 +60,23 @@ kubectl create secret generic git-ape-runner-secret \ --namespace arc-runners \ --from-literal=github_token="$GH_TOKEN" -# 3. Install the runner scale set with the Git-Ape values +# 3. Create the ACR pull secret so pods can pull the custom image +kubectl create secret docker-registry acr-pull \ + --namespace arc-runners \ + --docker-server=.azurecr.io \ + --docker-username= \ + --docker-password="$(az acr credential show -n --query passwords[0].value -o tsv)" + +# 4. Install the runner scale set with the Git-Ape values helm install git-ape-runner \ --namespace arc-runners \ -f values.yaml \ oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set ``` -Edit `values.yaml` first: set `githubConfigUrl` to your repo (or org) URL and, -for VNet-injected clusters, schedule runner pods onto the VNet node pool via +Edit `values.yaml` first: set `githubConfigUrl` to your repo (or org) URL, set +`template.spec.containers[0].image` to your custom ACR image, and, for +VNet-injected clusters, schedule runner pods onto the VNet node pool via `template.spec.nodeSelector`. ## Verify diff --git a/.github/skills/git-ape-onboarding/templates/runners/aks/values.yaml b/.github/skills/git-ape-onboarding/templates/runners/aks/values.yaml index 4374d65..8d80d0f 100644 --- a/.github/skills/git-ape-onboarding/templates/runners/aks/values.yaml +++ b/.github/skills/git-ape-onboarding/templates/runners/aks/values.yaml @@ -28,10 +28,22 @@ containerMode: template: spec: + # Pull the custom image from a private ACR. Create the secret once: + # kubectl create secret docker-registry acr-pull -n arc-runners \ + # --docker-server=.azurecr.io \ + # --docker-username= \ + # --docker-password="$(az acr credential show -n --query passwords[0].value -o tsv)" + imagePullSecrets: + - name: acr-pull # VNet-injected: pin runner pods to the node pool on your VNet subnet. # nodeSelector: # agentpool: vnetpool containers: - name: runner - image: ghcr.io/actions/actions-runner:latest + # MUST be the Git-Ape custom image (az/gh/jq). The stock + # ghcr.io/actions/actions-runner image lacks those tools and every + # deployment step calling az/gh/jq will fail. Build it per the README + # ("Custom runner image") and push to your ACR. ARC overrides the + # command below, so the image's self-register entrypoint is unused here. + image: .azurecr.io/git-ape-runner:latest command: ["/home/runner/run.sh"] diff --git a/.github/skills/git-ape-onboarding/templates/runners/entrypoint.sh b/.github/skills/git-ape-onboarding/templates/runners/entrypoint.sh new file mode 100644 index 0000000..ec7bc50 --- /dev/null +++ b/.github/skills/git-ape-onboarding/templates/runners/entrypoint.sh @@ -0,0 +1,104 @@ +#!/usr/bin/env bash +# Git-Ape self-hosted runner entrypoint. +# +# The official GitHub Actions runner image (ghcr.io/actions/actions-runner) ships +# the runner binary but NO registration entrypoint — on Kubernetes the Actions +# Runner Controller supplies one, but standalone container hosts (ACI, ACA Jobs) +# have nothing to register the runner. This script is that missing layer: it +# exchanges ACCESS_TOKEN (a fine-grained PAT with administration:write, or a +# GitHub App installation token) for a short-lived registration token, configures +# an ephemeral runner, starts it, and deregisters on shutdown. +# +# It honors the same environment-variable contract the ACI/ACA templates set: +# ACCESS_TOKEN, RUNNER_SCOPE (repo|org), REPO_URL or ORG_NAME, LABELS, +# RUNNER_NAME_PREFIX, EPHEMERAL, DISABLE_AUTO_UPDATE. +# +# On AKS, ARC overrides the container command (command: ["/home/runner/run.sh"]), +# so this entrypoint is bypassed there. +set -euo pipefail + +RUNNER_HOME="${RUNNER_HOME:-/home/runner}" +cd "${RUNNER_HOME}" + +: "${ACCESS_TOKEN:?ACCESS_TOKEN (GitHub PAT or App installation token) is required}" +RUNNER_SCOPE="${RUNNER_SCOPE:-repo}" +LABELS="${LABELS:-git-ape-runner}" +EPHEMERAL="${EPHEMERAL:-true}" +DISABLE_AUTO_UPDATE="${DISABLE_AUTO_UPDATE:-true}" +GITHUB_API="${GITHUB_API_URL:-https://api.github.com}" +API_VERSION="2022-11-28" + +case "${RUNNER_SCOPE}" in + org) + : "${ORG_NAME:?ORG_NAME is required for org-scoped runners}" + REG_URL="https://github.com/${ORG_NAME}" + RUNNERS_API="${GITHUB_API}/orgs/${ORG_NAME}/actions/runners" + ;; + repo) + : "${REPO_URL:?REPO_URL is required for repo-scoped runners}" + REG_URL="${REPO_URL}" + owner_repo="${REPO_URL#https://github.com/}" + RUNNERS_API="${GITHUB_API}/repos/${owner_repo}/actions/runners" + ;; + *) + echo "Unsupported RUNNER_SCOPE '${RUNNER_SCOPE}' (expected 'repo' or 'org')" >&2 + exit 1 + ;; +esac + +# Exchange the PAT/App token for a short-lived registration or remove token. +# $1 = registration | remove +runner_token() { + curl -fsSL -X POST \ + -H "Authorization: Bearer ${ACCESS_TOKEN}" \ + -H "Accept: application/vnd.github+json" \ + -H "X-GitHub-Api-Version: ${API_VERSION}" \ + "${RUNNERS_API}/$1-token" | jq -r '.token' +} + +RUNNER_NAME="${RUNNER_NAME_PREFIX:-git-ape-runner}-$(hostname)-${RANDOM}" + +echo "Requesting registration token (${RUNNER_SCOPE} scope) ..." +REG_TOKEN="$(runner_token registration)" +if [ -z "${REG_TOKEN}" ] || [ "${REG_TOKEN}" = "null" ]; then + echo "Failed to obtain a registration token. Check that ACCESS_TOKEN has" >&2 + echo "administration:write (repo) or self-hosted runner admin (org) rights." >&2 + exit 1 +fi + +config_args=( + --url "${REG_URL}" + --token "${REG_TOKEN}" + --name "${RUNNER_NAME}" + --labels "${LABELS}" + --work _work + --unattended + --replace +) +[ "${EPHEMERAL}" = "true" ] && config_args+=(--ephemeral) +[ "${DISABLE_AUTO_UPDATE}" = "true" ] && config_args+=(--disableupdate) + +./config.sh "${config_args[@]}" + +deregister() { + echo "Removing runner registration ..." + local rm_token + rm_token="$(runner_token remove || true)" + if [ -n "${rm_token}" ] && [ "${rm_token}" != "null" ]; then + ./config.sh remove --token "${rm_token}" || true + fi +} + +./run.sh & +RUNNER_PID=$! +trap 'kill -TERM "${RUNNER_PID}" 2>/dev/null || true' INT TERM + +set +e +wait "${RUNNER_PID}" +EXIT_CODE=$? +set -e + +# Ephemeral runners deregister themselves after one job; this is a safety net for +# non-ephemeral runners and graceful-shutdown signals (idempotent on re-run). +deregister +exit "${EXIT_CODE}" diff --git a/website/docs/skills/git-ape-onboarding.md b/website/docs/skills/git-ape-onboarding.md index 17aa923..d52cb56 100644 --- a/website/docs/skills/git-ape-onboarding.md +++ b/website/docs/skills/git-ape-onboarding.md @@ -458,9 +458,10 @@ scaling, and networking. - AKS — Azure Kubernetes Service (Actions Runner Controller; large scale) ``` -2. **Build the custom runner image.** The base `ghcr.io/actions/runner:latest` - (GitHub's official runner image) does **NOT** include `az`, `gh`, or `jq`. - Workflows will fail with `Unable to locate executable file: az` without a +2. **Build the custom runner image.** The base `ghcr.io/actions/actions-runner:latest` + (GitHub's official runner image) does **NOT** include `az`, `gh`, or `jq`, and + ships no registration entrypoint. Workflows fail with `Unable to locate + executable file: az` — and on ACI/ACA the runner never registers — without a custom image. ```bash # Create ACR (one-time) @@ -471,7 +472,8 @@ scaling, and networking. --file ./templates/runners/Dockerfile ./templates/runners/ ``` The `Dockerfile` at `./templates/runners/Dockerfile` extends the base runner - with all Git-Ape prerequisites (`az`, `gh`, `jq`, `git`). + with all Git-Ape prerequisites (`az`, `gh`, `jq`, `git`) and an `entrypoint.sh` + that self-registers the runner on ACI/ACA (on AKS, ARC handles registration). 3. **Deploy the runner infrastructure** using the chosen platform template. Pass the custom image via the `runnerImage` parameter: @@ -616,9 +618,10 @@ gh api orgs//actions/hosted-runners/machine-sizes --jq '.machine_specs[:10] ### Default runner image lacks required tools (self-hosted only) -The base image `ghcr.io/actions/runner:latest` (GitHub's official runner) is a -**minimal** self-hosted runner — it does NOT include `az`, `gh`, or `jq`. If you -deploy without the custom image, workflows will fail with: +The base image `ghcr.io/actions/actions-runner:latest` (GitHub's official runner) +is a **minimal** self-hosted runner — it does NOT include `az`, `gh`, or `jq`, and +ships no registration entrypoint. If you deploy without the custom image, the +runner never registers on ACI/ACA and workflows fail with: ``` Error: Unable to locate executable file: az From d56b67dd9971ba370b060d589d4f2fe9e0224177 Mon Sep 17 00:00:00 2001 From: Arnaud Lheureux Date: Wed, 17 Jun 2026 18:35:08 +0800 Subject: [PATCH 6/8] feat(runners): managed identity ACR pull, CRLF safety net, Windows workarounds - ACA template: add acrServer param + identity-based registry auth (no admin creds) - Dockerfile: add sed CRLF strip after COPY entrypoint.sh - README: rewrite build/pull sections for cloud build + managed identity - SKILL.md Step 12b: managed identity flow, --no-logs on Windows - SKILL.md: 3 new Known Gotchas (CRLF, az acr build Windows crash, ACA env delay) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .github/skills/git-ape-onboarding/SKILL.md | 134 +++++++++++++++--- .../templates/runners/Dockerfile | 6 +- .../templates/runners/README.md | 95 ++++++++++--- .../templates/runners/aca/parameters.json | 4 + .../templates/runners/aca/template.json | 12 +- 5 files changed, 203 insertions(+), 48 deletions(-) diff --git a/.github/skills/git-ape-onboarding/SKILL.md b/.github/skills/git-ape-onboarding/SKILL.md index 5280fa7..8ee8416 100644 --- a/.github/skills/git-ape-onboarding/SKILL.md +++ b/.github/skills/git-ape-onboarding/SKILL.md @@ -441,43 +441,73 @@ scaling, and networking. - AKS — Azure Kubernetes Service (Actions Runner Controller; large scale) ``` -2. **Build the custom runner image.** The base `ghcr.io/actions/actions-runner:latest` - (GitHub's official runner image) does **NOT** include `az`, `gh`, or `jq`, and - ships no registration entrypoint. Workflows fail with `Unable to locate - executable file: az` — and on ACI/ACA the runner never registers — without a - custom image. +2. **Build the custom runner image using ACR Tasks (cloud build).** The base + `ghcr.io/actions/actions-runner:latest` (GitHub's official runner image) does + **NOT** include `az`, `gh`, or `jq`, and ships no registration entrypoint. + Workflows fail with `Unable to locate executable file: az` — and on ACI/ACA + the runner never registers — without a custom image. + + Always build via **ACR Tasks** (cloud build) — never local Docker. This + avoids Windows CRLF line-ending corruption of `entrypoint.sh` and eliminates + the need for a local Docker install. ```bash - # Create ACR (one-time) - az acr create --name --resource-group --location --sku Basic --admin-enabled true + # Create ACR (one-time) — no --admin-enabled; use managed identity for pulls + az acr create --name --resource-group --location --sku Basic # Build and push image (runs in Azure, ~3 min, no local Docker needed) + # On Windows, add --no-logs to avoid a Unicode encoding crash in log streaming az acr build --registry --image git-ape-runner:latest \ - --file ./templates/runners/Dockerfile ./templates/runners/ + --file ./templates/runners/Dockerfile ./templates/runners/ --no-logs ``` The `Dockerfile` at `./templates/runners/Dockerfile` extends the base runner with all Git-Ape prerequisites (`az`, `gh`, `jq`, `git`) and an `entrypoint.sh` that self-registers the runner on ACI/ACA (on AKS, ARC handles registration). + It includes a `sed` safety net that strips CRLF line endings from + `entrypoint.sh` at build time. -3. **Deploy the runner infrastructure** using the chosen platform template. - Pass the custom image via the `runnerImage` parameter: + After the build, verify the image exists: + ```bash + az acr repository list --name -o table + ``` + +3. **Create a managed identity and assign `AcrPull` role** for image pulls: + ```bash + # Create identity + az identity create --name id-git-ape-runner --resource-group --location + + # Get IDs + IDENTITY_ID=$(az identity show --name id-git-ape-runner --resource-group --query id -o tsv) + PRINCIPAL_ID=$(az identity show --name id-git-ape-runner --resource-group --query principalId -o tsv) + ACR_ID=$(az acr show --name --query id -o tsv) + + # Assign AcrPull role (may take 30–60s to propagate) + az role assignment create --assignee-object-id $PRINCIPAL_ID --assignee-principal-type ServicePrincipal \ + --role AcrPull --scope $ACR_ID + ``` + **Do NOT use ACR admin credentials** (`--admin-enabled true` + username/password). + Managed identity is the secure, recommended approach. + +4. **Deploy the runner infrastructure** using the chosen platform template. + Pass the custom image, ACR server, and managed identity: ```bash az deployment group create -g -f template.json \ -p runnerImage='.azurecr.io/git-ape-runner:latest' \ + acrServer='.azurecr.io' \ + userAssignedIdentityId=$IDENTITY_ID \ githubOwnerRepo='/' \ githubAccessToken='' ``` + - The ACA template's `registries` block automatically uses identity-based + auth when both `acrServer` and `userAssignedIdentityId` are set. - The GitHub registration credential is the only secret — source it from Key - Vault, never inline it. Azure access uses a user-assigned managed identity. + Vault, never inline it. - For private networking, set the subnet parameter (`subnetId` for ACI, `infrastructureSubnetId` for ACA, or a VNet node pool for AKS). - For AKS, use `helm install` instead of ARM. - -4. **Configure ACR pull credentials** on the ACA/ACI job (if using ACR): - ```bash - az containerapp job registry set --name git-ape-runner --resource-group \ - --server .azurecr.io --username \ - --password $(az acr credential show -n --query "passwords[0].value" -o tsv) - ``` + - **Note:** The ACA managed environment may take 1–2 minutes to fully + provision. If deploying step-by-step (not via ARM template), wait for the + environment's `provisioningState` to reach `Succeeded` before creating the + job. 5. **Set `minExecutions=1`** (recommended) so at least one runner is always warm and visible in GitHub Settings. Without this, KEDA scale-from-zero can @@ -538,7 +568,7 @@ scaling, and networking. 7. *(Optional)* Offer to onboard the drift detector workflow by provisioning `COPILOT_GITHUB_TOKEN` (Step 10 in playbook). Skip if the user does not want scheduled drift detection. 8. Ask compliance framework and enforcement mode preferences (Step 11 in playbook). 9. Update `copilot-instructions.md` with compliance preferences — or, if the file was skipped by the scaffold step, surface the preferences in chat for manual integration. -10. Ask the runner type (and platform/scope if private), and — if private runners are chosen — provision the full stack. For **hosted compute networking**: consolidate gh auth scopes → ask org vs enterprise scope → provision Azure VNet + subnet → create GitHub.Network/networkSettings → create network config + runner group + hosted runner → assign repo → set `GIT_APE_RUNNER_LABEL` (Step 12a). For **self-hosted**: ACR + custom image + ACA/ACI deployment + `minExecutions=1` + registry credentials + `GIT_APE_RUNNER_LABEL` (Step 12b). +10. Ask the runner type (and platform/scope if private), and — if private runners are chosen — provision the full stack. For **hosted compute networking**: consolidate gh auth scopes → ask org vs enterprise scope → provision Azure VNet + subnet → create GitHub.Network/networkSettings → create network config + runner group + hosted runner → assign repo → set `GIT_APE_RUNNER_LABEL` (Step 12a). For **self-hosted**: ACR (no admin) + cloud build via ACR Tasks (`--no-logs` on Windows) + managed identity with `AcrPull` role + ACA/ACI deployment with identity-based registry auth + `minExecutions=1` + `GIT_APE_RUNNER_LABEL` (Step 12b). 11. **Verify** by triggering `Git-Ape: Verify Setup` and confirming ALL steps pass on the private runner. 12. Summarize outcome (including scaffolded file counts and the chosen runner type) and suggest verification commands. @@ -612,10 +642,10 @@ Error: Unable to locate executable file: az **Fix:** Always build and use the custom image from `./templates/runners/Dockerfile`. The onboarding flow must: -1. Create an ACR (`az acr create`) -2. Build the image (`az acr build --image git-ape-runner:latest`) -3. Configure pull credentials on the ACA/ACI job (`az containerapp job registry set`) -4. Set the `runnerImage` parameter to the ACR image +1. Create an ACR (`az acr create` — no `--admin-enabled`) +2. Build the image via ACR Tasks (`az acr build --no-logs` on Windows) +3. Create a managed identity with `AcrPull` role on the ACR +4. Deploy the template with `acrServer`, `userAssignedIdentityId`, and `runnerImage` ### KEDA scale-from-zero cold start @@ -631,6 +661,64 @@ this time: ~$30–50/month on the Consumption plan but eliminates cold-start delays and ensures a runner is always visible in GitHub Settings. +### Windows CRLF corrupts `entrypoint.sh` (self-hosted only) + +When the `Dockerfile` build context is uploaded from a Windows checkout (where +`git autocrlf` converts LF to CRLF), `entrypoint.sh` gets `\r\n` line endings. +Linux interprets the shebang as `#!/usr/bin/env bash\r`, failing with: + +``` +'bash\r': No such file or directory +``` + +The runner container starts but never registers, and all executions fail +immediately. + +**Fix (belt-and-suspenders):** +1. The `Dockerfile` includes a `sed -i 's/\r$//'` line after `COPY entrypoint.sh` + that strips CRLF at build time — this is always safe and is a no-op on clean + LF files. +2. Prefer **ACR Tasks** (cloud build) over local `docker build` — ACR Tasks run + in Linux and handle the context correctly. +3. If building locally on Windows, ensure `.gitattributes` marks `*.sh` as + `text eol=lf`, or run `dos2unix entrypoint.sh` before building. + +### `az acr build` crashes on Windows (Unicode encoding) + +On Windows, `az acr build` may crash while streaming build logs with: + +``` +UnicodeEncodeError: 'charmap' codec can't encode character '\u2192' +``` + +This is a known Azure CLI bug — the `colorama` library on Windows can't encode +Unicode characters (like `→`) in `apt-get` output. The build itself may or may +not have completed in Azure before the crash. + +**Fix:** Always use `--no-logs` when running `az acr build` on Windows: +```bash +az acr build --registry --image git-ape-runner:latest \ + --file ... ... --no-logs +``` +The build runs in Azure regardless; `--no-logs` just skips the local log +streaming. Verify success with `az acr repository list --name `. + +### ACA managed environment provisioning delay + +The `Microsoft.App/managedEnvironments` resource can take 1–2 minutes to +provision. If you create the ACA job immediately after the environment, the +deployment may fail with `ManagedEnvironmentNotProvisioned`. + +**Fix:** When deploying via ARM template (`az deployment group create`), the +`dependsOn` in the template handles ordering automatically. When deploying +step-by-step (e.g., `az containerapp env create` followed by +`az containerapp job create`), poll the environment status first: +```bash +az containerapp env show --name --resource-group \ + --query "properties.provisioningState" -o tsv +# Wait until "Succeeded" before creating the job +``` + ### Stale workflow files in target repos If the target repo was onboarded before the `GIT_APE_RUNNER_LABEL` pattern was diff --git a/.github/skills/git-ape-onboarding/templates/runners/Dockerfile b/.github/skills/git-ape-onboarding/templates/runners/Dockerfile index 4563f57..1792567 100644 --- a/.github/skills/git-ape-onboarding/templates/runners/Dockerfile +++ b/.github/skills/git-ape-onboarding/templates/runners/Dockerfile @@ -47,7 +47,11 @@ RUN apt-get update \ # Registration entrypoint (self-registers the runner on ACI/ACA; bypassed on # AKS, where ARC overrides the container command). COPY entrypoint.sh /home/runner/entrypoint.sh -RUN chmod +x /home/runner/entrypoint.sh +# Strip Windows CRLF line endings if present. When the build context is +# uploaded from a Windows checkout (git autocrlf) or via `az acr build` from a +# Windows host, the shebang becomes "#!/usr/bin/env bash\r" and Linux returns +# "bash\r: No such file or directory". This sed is a no-op on clean LF files. +RUN sed -i 's/\r$//' /home/runner/entrypoint.sh && chmod +x /home/runner/entrypoint.sh # Switch back to the runner user USER runner diff --git a/.github/skills/git-ape-onboarding/templates/runners/README.md b/.github/skills/git-ape-onboarding/templates/runners/README.md index 0184d0f..f0c4141 100644 --- a/.github/skills/git-ape-onboarding/templates/runners/README.md +++ b/.github/skills/git-ape-onboarding/templates/runners/README.md @@ -247,34 +247,74 @@ You **must** build and use the custom image from the [`Dockerfile`](./Dockerfile in this directory. It extends the base runner with all Git-Ape prerequisites and an [`entrypoint.sh`](./entrypoint.sh) that self-registers the runner on ACI/ACA. -### Build with ACR Tasks (recommended — no local Docker required) +### Build with ACR Tasks (recommended — cloud build, no local Docker) + +Always build the image in Azure using ACR Tasks. This avoids: +- Needing Docker installed locally +- **Windows CRLF line-ending corruption** — when the build context is uploaded + from a Windows checkout (`git autocrlf`), `entrypoint.sh` may have `\r\n` + endings. The Dockerfile includes a `sed` safety net, but cloud builds on ACR + Tasks run in Linux and handle this cleanly. ```bash -# Create an ACR (one-time) -az acr create --name --resource-group --location --sku Basic --admin-enabled true +# Create an ACR (one-time) — admin-enabled false; use managed identity for pulls +az acr create --name --resource-group --location --sku Basic # Build and push the image (runs in Azure, ~3 min) az acr build --registry --image git-ape-runner:latest \ --file .github/skills/git-ape-onboarding/templates/runners/Dockerfile \ .github/skills/git-ape-onboarding/templates/runners/ +``` -# Configure ACR pull credentials on the ACA job -az containerapp job registry set --name git-ape-runner --resource-group \ - --server .azurecr.io \ - --username --password $(az acr credential show -n --query "passwords[0].value" -o tsv) +> **Windows note:** `az acr build` may crash with a `charmap` codec error while +> streaming build logs (Unicode characters in `apt-get` output). Add `--no-logs` +> to skip log streaming — the build still runs in Azure: +> ```bash +> az acr build --registry --image git-ape-runner:latest \ +> --file ... ... --no-logs +> ``` +> Check the result with `az acr repository list --name `. -# Update the job to use the custom image -az containerapp job update --name git-ape-runner --resource-group \ - --image .azurecr.io/git-ape-runner:latest -``` +### ACR pull authentication (managed identity — recommended) -### Or pass it at deploy time +Use a **user-assigned managed identity** with the `AcrPull` role to pull images +from your ACR. This eliminates admin credentials entirely. ```bash +# Create a managed identity (one-time) +az identity create --name id-git-ape-runner --resource-group --location + +# Get the identity's principal ID and resource ID +IDENTITY_ID=$(az identity show --name id-git-ape-runner --resource-group --query id -o tsv) +PRINCIPAL_ID=$(az identity show --name id-git-ape-runner --resource-group --query principalId -o tsv) +ACR_ID=$(az acr show --name --query id -o tsv) + +# Assign AcrPull role +az role assignment create --assignee-object-id $PRINCIPAL_ID --assignee-principal-type ServicePrincipal \ + --role AcrPull --scope $ACR_ID + +# Deploy the ACA template with managed identity + ACR server az deployment group create -g -f template.json \ -p runnerImage='.azurecr.io/git-ape-runner:latest' \ + acrServer='.azurecr.io' \ + userAssignedIdentityId=$IDENTITY_ID \ githubOwnerRepo='org/repo' \ - githubAccessToken='...' + githubAccessToken='' +``` + +The ACA template's `registries` block automatically uses identity-based auth +when both `acrServer` and `userAssignedIdentityId` are set — no username/password. + +### Legacy: ACR admin credentials (not recommended) + +If you cannot use managed identity, enable admin access and configure pull +credentials manually: + +```bash +az acr update --name --admin-enabled true +az containerapp job registry set --name git-ape-runner --resource-group \ + --server .azurecr.io --username \ + --password $(az acr credential show -n --query "passwords[0].value" -o tsv) ``` ### Tools included in the custom image @@ -316,6 +356,11 @@ queued jobs and spins up a runner. During this window, GitHub shows the job as still use **OIDC federation** for `az` actions, so the managed identity only needs what the runtime requires. Do not put subscription keys or connection strings on the runner. +- **ACR image pull uses managed identity, not admin credentials.** The managed + identity assigned to the runner should have the `AcrPull` role on the ACR. + The ACA template supports identity-based registry auth natively via the + `acrServer` + `userAssignedIdentityId` parameters — no username/password. + ACR admin credentials are a legacy fallback and should be avoided. - **The GitHub registration credential is the one unavoidable secret.** GitHub requires a credential to register a runner. Order of preference: 1. **GitHub App** installation token (recommended for org-scale; ARC supports @@ -335,19 +380,23 @@ queued jobs and spins up a runner. During this window, GitHub shows the job as ```mermaid flowchart LR - A[Choose type + platform] --> B[Create ACR +
build custom image] - B --> C[Copy template into
.azure/runners/] - C --> D[Provide GitHub creds
via Key Vault] - D --> E[Deploy IaC
az deployment / helm] - E --> F[Set minExecutions=1
+ registry creds] - F --> G[Runner registers
with label git-ape-runner] - G --> H[Set GIT_APE_RUNNER_LABEL
variable] - H --> I[Workflows now run
on private runners] - I -.clean fallback.-> J[Unset variable →
back to ubuntu-latest] + A[Choose type + platform] --> B[Create ACR +
build custom image
via ACR Tasks] + B --> C[Create managed identity
+ AcrPull role] + C --> D[Copy template into
.azure/runners/] + D --> E[Provide GitHub creds
via Key Vault] + E --> F[Deploy IaC
az deployment / helm] + F --> G[Set minExecutions=1
runner registers] + G --> H[Runner registers
with label git-ape-runner] + H --> I[Set GIT_APE_RUNNER_LABEL
variable] + I --> J[Workflows now run
on private runners] + J -.clean fallback.-> K[Unset variable →
back to ubuntu-latest] ``` 1. **Choose** the runner type and platform (the `/git-ape-onboarding` flow asks). -2. **Create an ACR** and build the custom runner image (see above). +2. **Create an ACR** and build the custom runner image using ACR Tasks (cloud + build — avoids CRLF issues and requires no local Docker). Create a + **user-assigned managed identity** with `AcrPull` role for image pulls — do + not use ACR admin credentials. 3. **Copy** the chosen platform folder into your repo under `.azure/runners//` and edit parameters for your repo/org, region, labels, image, and (for VNet-injected) the target `subnetId`. diff --git a/.github/skills/git-ape-onboarding/templates/runners/aca/parameters.json b/.github/skills/git-ape-onboarding/templates/runners/aca/parameters.json index 135541b..7697307 100644 --- a/.github/skills/git-ape-onboarding/templates/runners/aca/parameters.json +++ b/.github/skills/git-ape-onboarding/templates/runners/aca/parameters.json @@ -32,6 +32,10 @@ }, "maxRunners": { "value": 10 + }, + "_comment_acrServer": "Set to your ACR login server (e.g. myacr.azurecr.io) when using a custom runner image. When combined with userAssignedIdentityId, the job authenticates to ACR via managed identity (AcrPull role required) — no admin credentials needed.", + "acrServer": { + "value": "" } } } diff --git a/.github/skills/git-ape-onboarding/templates/runners/aca/template.json b/.github/skills/git-ape-onboarding/templates/runners/aca/template.json index 2ab7e54..6ad8293 100644 --- a/.github/skills/git-ape-onboarding/templates/runners/aca/template.json +++ b/.github/skills/git-ape-onboarding/templates/runners/aca/template.json @@ -3,7 +3,7 @@ "contentVersion": "1.0.0.0", "metadata": { "_generator": "git-ape-onboarding", - "description": "Git-Ape self-hosted runner on Azure Container Apps (ACA). Provisions a managed environment and an event-driven Container Apps Job that scales ephemeral GitHub Actions runners on demand with the KEDA 'github-runner' scaler (scale-to-zero between jobs). Runners register with the label 'git-ape-runner' (override via parameter). Leave infrastructureSubnetId empty for a self-hosted (subscription) environment; set it for VNet injection (Consumption needs a /23 or larger). The GitHub credential is the only secret and must be sourced from Key Vault.\n\nDeploy:\n az group create -n rg-git-ape-runners -l eastus\n az deployment group create -g rg-git-ape-runners -f template.json -p @parameters.json" + "description": "Git-Ape self-hosted runner on Azure Container Apps (ACA). Provisions a managed environment and an event-driven Container Apps Job that scales ephemeral GitHub Actions runners on demand with the KEDA 'github-runner' scaler (scale-to-zero between jobs). Runners register with the label 'git-ape-runner' (override via parameter). Leave infrastructureSubnetId empty for a self-hosted (subscription) environment; set it for VNet injection (Consumption needs a /23 or larger). The GitHub credential is the only secret and must be sourced from Key Vault. ACR image pull uses managed identity (AcrPull role) when both acrServer and userAssignedIdentityId are set — no admin credentials needed.\n\nDeploy:\n az group create -n rg-git-ape-runners -l eastus\n az deployment group create -g rg-git-ape-runners -f template.json -p @parameters.json" }, "parameters": { "location": { @@ -91,11 +91,19 @@ "metadata": { "description": "Memory for the runner container (e.g. 2Gi)." } + }, + "acrServer": { + "type": "string", + "defaultValue": "", + "metadata": { + "description": "ACR login server (e.g. myacr.azurecr.io). When set together with userAssignedIdentityId, the job pulls images via managed identity (AcrPull role required) — no admin credentials needed. Leave empty if using a public image or configuring registry credentials out-of-band." + } } }, "variables": { "isOrgScope": "[equals(parameters('runnerScope'), 'org')]", "hasIdentity": "[not(empty(parameters('userAssignedIdentityId')))]", + "hasAcr": "[not(empty(parameters('acrServer')))]", "isVnet": "[not(empty(parameters('infrastructureSubnetId')))]", "envName": "[format('{0}-env', parameters('runnerName'))]", "ownerName": "[if(variables('isOrgScope'), parameters('githubOwnerRepo'), first(split(parameters('githubOwnerRepo'), '/')))]", @@ -137,6 +145,7 @@ ], "containerEnv": "[concat(variables('scopeEnv'), variables('baseEnv'))]", "identityBlock": "[if(variables('hasIdentity'), createObject('type', 'UserAssigned', 'userAssignedIdentities', createObject(parameters('userAssignedIdentityId'), createObject())), createObject('type', 'None'))]", + "registriesBlock": "[if(and(variables('hasAcr'), variables('hasIdentity')), createArray(createObject('server', parameters('acrServer'), 'identity', parameters('userAssignedIdentityId'))), if(variables('hasAcr'), createArray(createObject('server', parameters('acrServer'))), createArray()))]", "vnetConfiguration": "[if(variables('isVnet'), createObject('infrastructureSubnetId', parameters('infrastructureSubnetId'), 'internal', false()), json('null'))]" }, "resources": [ @@ -164,6 +173,7 @@ "triggerType": "Event", "replicaTimeout": 1800, "replicaRetryLimit": 1, + "registries": "[if(empty(variables('registriesBlock')), json('null'), variables('registriesBlock'))]", "secrets": [ { "name": "github-pat", From 5e23f735ee1799c50808d9cd3d6d9f2c3d273146 Mon Sep 17 00:00:00 2001 From: Arnaud Lheureux Date: Wed, 17 Jun 2026 19:05:13 +0800 Subject: [PATCH 7/8] fix: require user-provided PAT instead of registration token for self-hosted runners Registration tokens from the GitHub API expire in ~1 hour, causing KEDA polling and ephemeral runner registration to fail with 401. The agent now asks the user for a long-lived PAT before deploying ACA/ACI runners. - Add Step 4 (Collect GitHub PAT) to SKILL.md Step 12b - Add Known Gotcha documenting registration token failure mode - Update parameter comments and descriptions in ACA/ACI templates - Update Suggested Agent Flow to mention PAT collection Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .github/skills/git-ape-onboarding/SKILL.md | 76 ++++++++++++++++--- .../templates/runners/aca/parameters.json | 2 +- .../templates/runners/aca/template.json | 2 +- .../templates/runners/aci/parameters.json | 2 +- .../templates/runners/aci/template.json | 2 +- 5 files changed, 69 insertions(+), 15 deletions(-) diff --git a/.github/skills/git-ape-onboarding/SKILL.md b/.github/skills/git-ape-onboarding/SKILL.md index 8ee8416..20dabf7 100644 --- a/.github/skills/git-ape-onboarding/SKILL.md +++ b/.github/skills/git-ape-onboarding/SKILL.md @@ -487,20 +487,54 @@ scaling, and networking. **Do NOT use ACR admin credentials** (`--admin-enabled true` + username/password). Managed identity is the secure, recommended approach. -4. **Deploy the runner infrastructure** using the chosen platform template. - Pass the custom image, ACR server, and managed identity: +4. **Collect a GitHub PAT from the user.** The ACA/ACI runner needs a + **long-lived GitHub Personal Access Token (PAT)** — NOT a short-lived + registration token from `POST /actions/runners/registration-token`. + Registration tokens expire in ~1 hour, but the KEDA `github-runner` scaler + continuously polls the Actions queue AND each ephemeral runner re-registers + on every scale-up, so a long-lived PAT is required. + + **Ask the user to create a PAT** before deploying: + ``` + The self-hosted runner needs a GitHub Personal Access Token (PAT) for + continuous queue polling and runner registration. + + Please create a fine-grained PAT at: + https://github.com/settings/tokens?type=beta + + Required permissions (scoped to the target repo): + - Actions: Read & Write + - Administration: Read & Write (for runner registration) + + Alternatively, a classic PAT with the `repo` scope works. + + Paste the token when prompted — it will only be passed to the deployment + and will not be stored or displayed. + ``` + + **Do NOT generate a registration token** via the GitHub API + (`POST repos///actions/runners/registration-token`). These are + short-lived (~1 hour) and will cause the runner to fail with a 401 error + once expired. The KEDA scaler and ephemeral runner registration both need + a token that does not expire. + + Never print the token value in chat output (see Safe-Execution Rules). + +5. **Deploy the runner infrastructure** using the chosen platform template. + Pass the custom image, ACR server, managed identity, and user-provided PAT: ```bash az deployment group create -g -f template.json \ -p runnerImage='.azurecr.io/git-ape-runner:latest' \ acrServer='.azurecr.io' \ userAssignedIdentityId=$IDENTITY_ID \ githubOwnerRepo='/' \ - githubAccessToken='' + githubAccessToken='' ``` - The ACA template's `registries` block automatically uses identity-based auth when both `acrServer` and `userAssignedIdentityId` are set. - - The GitHub registration credential is the only secret — source it from Key - Vault, never inline it. + - The GitHub PAT is the only secret — for production, store it in Key Vault + and reference it; for initial setup, pass it directly at deploy time. + Never inline it in a committed `parameters.json`. - For private networking, set the subnet parameter (`subnetId` for ACI, `infrastructureSubnetId` for ACA, or a VNet node pool for AKS). - For AKS, use `helm install` instead of ARM. @@ -509,7 +543,7 @@ scaling, and networking. environment's `provisioningState` to reach `Succeeded` before creating the job. -5. **Set `minExecutions=1`** (recommended) so at least one runner is always +6. **Set `minExecutions=1`** (recommended) so at least one runner is always warm and visible in GitHub Settings. Without this, KEDA scale-from-zero can take 1–3 minutes on cold start, during which GitHub shows "No runners configured": @@ -519,11 +553,11 @@ scaling, and networking. Leave at `0` only if you prefer true scale-to-zero and can tolerate cold-start delays. -6. **Confirm the runner is online** in *GitHub → Settings → Actions → Runners* +7. **Confirm the runner is online** in *GitHub → Settings → Actions → Runners* with the `git-ape-runner` label. (With `minExecutions=1`, a runner should appear within 30–60 seconds of deployment.) -7. **Set the variable** so workflows target it (repo-wide or per environment): +8. **Set the variable** so workflows target it (repo-wide or per environment): ```bash gh variable set GIT_APE_RUNNER_LABEL --repo / --body "git-ape-runner" # per environment instead: @@ -531,10 +565,10 @@ scaling, and networking. ``` Clean fallback to GitHub-hosted runners is `gh variable delete GIT_APE_RUNNER_LABEL`. -8. **Verify** by triggering `Git-Ape: Verify Setup` and confirming all steps +9. **Verify** by triggering `Git-Ape: Verify Setup` and confirming all steps pass on the private runner (especially "Test OIDC login" which requires `az`). -9. **Continuous drift detection** (`git-ape-drift.lock.yml`) is a compiled gh-aw +10. **Continuous drift detection** (`git-ape-drift.lock.yml`) is a compiled gh-aw workflow and does NOT honor `GIT_APE_RUNNER_LABEL`. To move drift onto a private runner, set `runs-on:` in the source `git-ape-drift.md` frontmatter and recompile with `gh aw compile` — never hand-edit the `.lock.yml` (it @@ -568,12 +602,32 @@ scaling, and networking. 7. *(Optional)* Offer to onboard the drift detector workflow by provisioning `COPILOT_GITHUB_TOKEN` (Step 10 in playbook). Skip if the user does not want scheduled drift detection. 8. Ask compliance framework and enforcement mode preferences (Step 11 in playbook). 9. Update `copilot-instructions.md` with compliance preferences — or, if the file was skipped by the scaffold step, surface the preferences in chat for manual integration. -10. Ask the runner type (and platform/scope if private), and — if private runners are chosen — provision the full stack. For **hosted compute networking**: consolidate gh auth scopes → ask org vs enterprise scope → provision Azure VNet + subnet → create GitHub.Network/networkSettings → create network config + runner group + hosted runner → assign repo → set `GIT_APE_RUNNER_LABEL` (Step 12a). For **self-hosted**: ACR (no admin) + cloud build via ACR Tasks (`--no-logs` on Windows) + managed identity with `AcrPull` role + ACA/ACI deployment with identity-based registry auth + `minExecutions=1` + `GIT_APE_RUNNER_LABEL` (Step 12b). +10. Ask the runner type (and platform/scope if private), and — if private runners are chosen — provision the full stack. For **hosted compute networking**: consolidate gh auth scopes → ask org vs enterprise scope → provision Azure VNet + subnet → create GitHub.Network/networkSettings → create network config + runner group + hosted runner → assign repo → set `GIT_APE_RUNNER_LABEL` (Step 12a). For **self-hosted**: ask the user for a GitHub PAT (never generate a registration token) → ACR (no admin) + cloud build via ACR Tasks (`--no-logs` on Windows) + managed identity with `AcrPull` role + ACA/ACI deployment with identity-based registry auth using user-provided PAT + `minExecutions=1` + `GIT_APE_RUNNER_LABEL` (Step 12b). 11. **Verify** by triggering `Git-Ape: Verify Setup` and confirming ALL steps pass on the private runner. 12. Summarize outcome (including scaffolded file counts and the chosen runner type) and suggest verification commands. ## Known Gotchas +### Self-hosted: registration tokens don't work for KEDA-based runners + +**Never use `POST repos///actions/runners/registration-token`** to +generate the `githubAccessToken` for ACA/ACI runners. Registration tokens are +short-lived (~1 hour) and expire silently. Once expired: +- The KEDA `github-runner` scaler can no longer poll the Actions queue +- Each ephemeral runner fails to register on scale-up with a **401 Unauthorized** +- Runners appear as `offline` in GitHub Settings + +The `githubAccessToken` parameter requires a **long-lived GitHub PAT** because: +1. KEDA continuously polls the GitHub API every 30 seconds to detect queued jobs +2. Each ephemeral runner re-registers itself on every scale-up event +3. Both operations need a token that outlives any single job + +**Fix:** Always **ask the user** to create a fine-grained PAT +(`https://github.com/settings/tokens?type=beta`) with **Actions (Read & Write)** +and **Administration (Read & Write)** permissions scoped to the target repo. A +classic PAT with the `repo` scope also works. Never generate a registration +token programmatically — it will always fail after ~1 hour. + ### Hosted compute: `network_settings_ids` expects the GitHubId tag, not the Azure resource ID When creating a GitHub network configuration, the `network_settings_ids` field diff --git a/.github/skills/git-ape-onboarding/templates/runners/aca/parameters.json b/.github/skills/git-ape-onboarding/templates/runners/aca/parameters.json index 7697307..2d3b97c 100644 --- a/.github/skills/git-ape-onboarding/templates/runners/aca/parameters.json +++ b/.github/skills/git-ape-onboarding/templates/runners/aca/parameters.json @@ -14,7 +14,7 @@ "runnerLabels": { "value": "git-ape-runner" }, - "_comment_githubAccessToken": "Do NOT inline the token. Reference it from Key Vault as shown, or pass it at deploy time with -p githubAccessToken=$(az keyvault secret show ...).", + "_comment_githubAccessToken": "Requires a long-lived GitHub PAT (fine-grained with Actions + Administration R/W, or classic with repo scope). Do NOT use a short-lived registration token from the GitHub API — it expires in ~1h and breaks KEDA polling and ephemeral runner registration. Reference the PAT from Key Vault as shown, or pass it at deploy time with -p githubAccessToken=.", "githubAccessToken": { "reference": { "keyVault": { diff --git a/.github/skills/git-ape-onboarding/templates/runners/aca/template.json b/.github/skills/git-ape-onboarding/templates/runners/aca/template.json index 6ad8293..edd8d8b 100644 --- a/.github/skills/git-ape-onboarding/templates/runners/aca/template.json +++ b/.github/skills/git-ape-onboarding/templates/runners/aca/template.json @@ -47,7 +47,7 @@ "githubAccessToken": { "type": "securestring", "metadata": { - "description": "GitHub credential used to register the runner and poll the queue: a fine-grained PAT or GitHub App token. Source from Key Vault - never commit it." + "description": "Long-lived GitHub PAT used to register ephemeral runners and poll the Actions queue via KEDA. Requires a fine-grained PAT with Actions + Administration (Read & Write) permissions, or a classic PAT with repo scope. Do NOT use a short-lived registration token — it expires in ~1h. Source from Key Vault - never commit it." } }, "runnerImage": { diff --git a/.github/skills/git-ape-onboarding/templates/runners/aci/parameters.json b/.github/skills/git-ape-onboarding/templates/runners/aci/parameters.json index fd2f7c0..e767aa2 100644 --- a/.github/skills/git-ape-onboarding/templates/runners/aci/parameters.json +++ b/.github/skills/git-ape-onboarding/templates/runners/aci/parameters.json @@ -14,7 +14,7 @@ "runnerLabels": { "value": "git-ape-runner" }, - "_comment_githubAccessToken": "Do NOT inline the token. Reference it from Key Vault as shown below, or pass it at deploy time with -p githubAccessToken=$(az keyvault secret show ...).", + "_comment_githubAccessToken": "Requires a long-lived GitHub PAT (fine-grained with Actions + Administration R/W, or classic with repo scope). Do NOT use a short-lived registration token from the GitHub API — it expires in ~1h and breaks runner registration. Reference the PAT from Key Vault as shown below, or pass it at deploy time with -p githubAccessToken=.", "githubAccessToken": { "reference": { "keyVault": { diff --git a/.github/skills/git-ape-onboarding/templates/runners/aci/template.json b/.github/skills/git-ape-onboarding/templates/runners/aci/template.json index 40cc041..ccddb7d 100644 --- a/.github/skills/git-ape-onboarding/templates/runners/aci/template.json +++ b/.github/skills/git-ape-onboarding/templates/runners/aci/template.json @@ -47,7 +47,7 @@ "githubAccessToken": { "type": "securestring", "metadata": { - "description": "GitHub credential used to register the runner: a fine-grained PAT (administration:write) or GitHub App token. Source from Key Vault - never commit it." + "description": "Long-lived GitHub PAT used to register the runner. Requires a fine-grained PAT with Actions + Administration (Read & Write) permissions, or a classic PAT with repo scope. Do NOT use a short-lived registration token — it expires in ~1h. Source from Key Vault - never commit it." } }, "runnerImage": { From ca1888edf470084aab01739568a00d6d755b08b0 Mon Sep 17 00:00:00 2001 From: Arnaud Lheureux Date: Wed, 1 Jul 2026 09:43:58 +0800 Subject: [PATCH 8/8] Deploy ACA runners as a first-class Git-Ape deployment MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Convert the ACA self-hosted runner path from imperative `az deployment group create` into a managed Git-Ape deployment artifact so the runner infrastructure flows through Git-Ape's own pipeline — architecture diagram, cost estimate, stack deploy, and single-command destroy. Git-Ape deploying Git-Ape. - Add templates/deployments/git-ape-runners/ (subscription-scoped Deployment Stack): RG + nested inner-scope deployment provisioning a user-assigned identity, ACR, AcrPull role, Key Vault (RBAC), Key Vault Secrets User role, ACA managed environment, and a KEDA-scaled ACA runner Job. The PAT is a Key Vault secret reference — never in git, ARM parameters, or deployment history. Validated with `az deployment sub validate` (Succeeded). - Rewire SKILL.md Step 12b: ACA now routes through the managed stack flow (/azure-stack-deploy -> az acr build -> az keyvault secret set -> GIT_APE_RUNNER_LABEL); ACI/AKS remain imperative. - Document the self-hosting bootstrap loop: the first deploy runs on ubuntu-latest/local, then subsequent runs (including updates to the runner stack itself) execute on the private runner it created. - Add cross-references from templates/runners/README.md and update getting-started/onboarding.md; regenerate the onboarding skill doc. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .github/skills/git-ape-onboarding/SKILL.md | 102 +++- .../deployments/git-ape-runners/README.md | 93 ++++ .../git-ape-runners/architecture.md | 78 +++ .../deployments/git-ape-runners/metadata.json | 14 + .../git-ape-runners/parameters.json | 47 ++ .../deployments/git-ape-runners/template.json | 494 ++++++++++++++++++ .../templates/runners/README.md | 22 +- website/docs/getting-started/onboarding.md | 26 +- website/docs/skills/git-ape-onboarding.md | 102 +++- 9 files changed, 951 insertions(+), 27 deletions(-) create mode 100644 .github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/README.md create mode 100644 .github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/architecture.md create mode 100644 .github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/metadata.json create mode 100644 .github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/parameters.json create mode 100644 .github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/template.json diff --git a/.github/skills/git-ape-onboarding/SKILL.md b/.github/skills/git-ape-onboarding/SKILL.md index 2ba995c..f5bc430 100644 --- a/.github/skills/git-ape-onboarding/SKILL.md +++ b/.github/skills/git-ape-onboarding/SKILL.md @@ -54,7 +54,7 @@ This skill configures: 5. Required GitHub secrets (`AZURE_CLIENT_ID`, `AZURE_TENANT_ID`) and the `AZURE_SUBSCRIPTION_ID` variable, plus the optional `GIT_APE_RUNNER_LABEL` variable that selects private runners 6. Scaffolded GitHub Actions workflow files (`git-ape-plan.yml`, `-deploy.yml`, `-destroy.yml`, `-verify.yml`, `-drift.{md,lock.yml}`) and deployment standards (`.github/copilot-instructions.md`) into the user's working copy 7. *(Optional)* The `COPILOT_GITHUB_TOKEN` repository secret that powers the agentic drift-detection workflow (`git-ape-drift.lock.yml`) — only when the user opts into scheduled drift detection -8. The GitHub Actions **runner type** the workflows run on — public GitHub-hosted (default), **hosted compute networking** (GitHub-managed runners with Azure private networking, requires GHEC), or self-hosted runners in your Azure subscription (ACI / ACA / AKS). On-demand IaC for private runners ships at `./templates/runners/`. +8. The GitHub Actions **runner type** the workflows run on — public GitHub-hosted (default), **hosted compute networking** (GitHub-managed runners with Azure private networking, requires GHEC), or self-hosted runners in your Azure subscription (ACI / ACA / AKS). On-demand IaC for private runners ships at `./templates/runners/`. ACA runners are deployed as a first-class **Git-Ape deployment** (`./templates/deployments/git-ape-runners/`) — Git-Ape deploying Git-Ape — so they get an architecture diagram, cost estimate, managed deploy, and single-command destroy. ## Prerequisites @@ -470,12 +470,88 @@ scaling, and networking. 1. **Ask the platform:** ``` Which Azure platform should host the runners? + - ACA — Azure Container Apps (event-driven, ephemeral, scale-to-zero) — RECOMMENDED - ACI — Azure Container Instances (simplest; a handful of runners) - - ACA — Azure Container Apps (event-driven, ephemeral, scale-to-zero) - AKS — Azure Kubernetes Service (Actions Runner Controller; large scale) ``` -2. **Build the custom runner image using ACR Tasks (cloud build).** The base + **ACA is deployed as a Git-Ape deployment** (Git-Ape deploying Git-Ape), so the + runner infrastructure gets an architecture diagram, cost estimate, managed + deploy, and single-command destroy — see the ACA path directly below. **ACI** + and **AKS** use the imperative provisioning steps further down. + +#### ACA — deploy runners as a Git-Ape deployment (recommended) + +Instead of an imperative `az deployment group create`, scaffold the runner +infrastructure as a first-class Git-Ape deployment and deploy it through the +normal Git-Ape stack flow. The template is a subscription-scoped Deployment Stack, +so destroy is a single idempotent command and the PAT lives in Key Vault (never in +git or ARM parameters). + +1. **Scaffold the deployment artifact** into the working copy and set inputs: + ```bash + mkdir -p .azure/deployments/git-ape-runners + cp -R .github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/. \ + .azure/deployments/git-ape-runners/ + # set githubOwnerRepo (+ any overrides) — NEVER put the PAT here: + $EDITOR .azure/deployments/git-ape-runners/parameters.json + ``` + `template.json` creates, in one self-contained stack: the resource group, a + user-assigned identity, an ACR, an `AcrPull` role assignment, a Key Vault, a + `Key Vault Secrets User` role assignment, an ACA managed environment, and the + event-driven ACA Job. + +2. **Deploy the stack.** The first deploy runs on a public runner or locally, + because the private runner does not exist yet: + ```bash + /azure-stack-deploy git-ape-runners # local (VS Code / terminal) + ``` + In CI, open a PR that adds `.azure/deployments/git-ape-runners/`; the + `git-ape-deploy.yml` workflow deploys it on `ubuntu-latest` and writes + `state.json` plus the architecture/cost artifacts, exactly like any other + Git-Ape deployment. + +3. **Build & push the runner image** into the ACR the stack just created (the + stock `actions-runner` image lacks `az`/`gh`/`jq` and self-registration — see + the Dockerfile note in the imperative path below): + ```bash + ACR=$(jq -r '.acrLoginServer.value' .azure/deployments/git-ape-runners/state.json) + az acr build --registry "${ACR%%.*}" --image git-ape-runner:latest \ + --file .github/skills/git-ape-onboarding/templates/runners/Dockerfile \ + .github/skills/git-ape-onboarding/templates/runners/ --no-logs + ``` + +4. **Write the GitHub PAT into Key Vault** — never in git, ARM params, or chat + output. Collect a long-lived fine-grained PAT exactly as in the imperative + "Collect a GitHub PAT" step below (never a short-lived registration token): + ```bash + KV=$(jq -r '.keyVaultName.value' .azure/deployments/git-ape-runners/state.json) + az keyvault secret set --vault-name "$KV" --name github-pat \ + --value '' --output none + ``` + The ACA Job reads it at runtime through a Key Vault secret reference + (`keyVaultUrl` + user-assigned `identity`), enabled by the in-template + `Key Vault Secrets User` role assignment. + +5. **Point workflows at the runner** — after this, re-deploys of this very stack + run on it (the self-hosting loop): + ```bash + gh variable set GIT_APE_RUNNER_LABEL --repo / --body "git-ape-runner" + ``` + +6. **Destroy** with one command when no longer needed (tears down the whole stack + and purges the soft-deleted Key Vault): + ```bash + /azure-stack-destroy git-ape-runners + ``` + +See `templates/deployments/git-ape-runners/README.md` for the full walkthrough and +`templates/deployments/git-ape-runners/architecture.md` for the topology and +bootstrap sequence. + +#### ACI / AKS — imperative provisioning + +1. **Build the custom runner image using ACR Tasks (cloud build).** The base `ghcr.io/actions/actions-runner:latest` (GitHub's official runner image) does **NOT** include `az`, `gh`, or `jq`, and ships no registration entrypoint. Workflows fail with `Unable to locate executable file: az` — and on ACI/ACA @@ -504,7 +580,7 @@ scaling, and networking. az acr repository list --name -o table ``` -3. **Create a managed identity and assign `AcrPull` role** for image pulls: +2. **Create a managed identity and assign `AcrPull` role** for image pulls: ```bash # Create identity az identity create --name id-git-ape-runner --resource-group --location @@ -521,7 +597,7 @@ scaling, and networking. **Do NOT use ACR admin credentials** (`--admin-enabled true` + username/password). Managed identity is the secure, recommended approach. -4. **Collect a GitHub PAT from the user.** The ACA/ACI runner needs a +3. **Collect a GitHub PAT from the user.** The ACA/ACI runner needs a **long-lived GitHub Personal Access Token (PAT)** — NOT a short-lived registration token from `POST /actions/runners/registration-token`. Registration tokens expire in ~1 hour, but the KEDA `github-runner` scaler @@ -554,10 +630,12 @@ scaling, and networking. Never print the token value in chat output (see Safe-Execution Rules). -5. **Deploy the runner infrastructure** using the chosen platform template. - Pass the custom image, ACR server, managed identity, and user-provided PAT: +4. **Deploy the runner infrastructure (ACI).** Use the ACI template + (`templates/runners/aci/template.json`) — ACA is covered by the Git-Ape-managed + path above. Pass the custom image, ACR server, managed identity, and + user-provided PAT: ```bash - az deployment group create -g -f template.json \ + az deployment group create -g -f ./templates/runners/aci/template.json \ -p runnerImage='.azurecr.io/git-ape-runner:latest' \ acrServer='.azurecr.io' \ userAssignedIdentityId=$IDENTITY_ID \ @@ -577,7 +655,7 @@ scaling, and networking. environment's `provisioningState` to reach `Succeeded` before creating the job. -6. **Set `minExecutions=1`** (recommended) so at least one runner is always +5. **Set `minExecutions=1`** (recommended) so at least one runner is always warm and visible in GitHub Settings. Without this, KEDA scale-from-zero can take 1–3 minutes on cold start, during which GitHub shows "No runners configured": @@ -587,11 +665,11 @@ scaling, and networking. Leave at `0` only if you prefer true scale-to-zero and can tolerate cold-start delays. -7. **Confirm the runner is online** in *GitHub → Settings → Actions → Runners* +6. **Confirm the runner is online** in *GitHub → Settings → Actions → Runners* with the `git-ape-runner` label. (With `minExecutions=1`, a runner should appear within 30–60 seconds of deployment.) -8. **Set the variable** so workflows target it (repo-wide or per environment): +7. **Set the variable** so workflows target it (repo-wide or per environment): ```bash gh variable set GIT_APE_RUNNER_LABEL --repo / --body "git-ape-runner" # per environment instead: @@ -765,7 +843,7 @@ OIDC, RBAC, environments, and workflows. 7. *(Optional)* Offer to onboard the drift detector workflow by provisioning `COPILOT_GITHUB_TOKEN` (Step 10 in playbook). Skip if the user does not want scheduled drift detection. 8. Ask compliance framework and enforcement mode preferences (Step 11 in playbook). 9. Update `copilot-instructions.md` with compliance preferences — or, if the file was skipped by the scaffold step, surface the preferences in chat for manual integration. -10. Ask the runner type (and platform/scope if private), and — if private runners are chosen — provision the full stack. For **hosted compute networking**: consolidate gh auth scopes → ask org vs enterprise scope → provision Azure VNet + subnet → create GitHub.Network/networkSettings → create network config + runner group + hosted runner → assign repo → set `GIT_APE_RUNNER_LABEL` (Step 12a). For **self-hosted**: ask the user for a GitHub PAT (never generate a registration token) → ACR (no admin) + cloud build via ACR Tasks (`--no-logs` on Windows) + managed identity with `AcrPull` role + ACA/ACI deployment with identity-based registry auth using user-provided PAT + `minExecutions=1` + `GIT_APE_RUNNER_LABEL` (Step 12b). +10. Ask the runner type (and platform/scope if private), and — if private runners are chosen — provision the full stack. For **hosted compute networking**: consolidate gh auth scopes → ask org vs enterprise scope → provision Azure VNet + subnet → create GitHub.Network/networkSettings → create network config + runner group + hosted runner → assign repo → set `GIT_APE_RUNNER_LABEL` (Step 12a). For **self-hosted ACA (recommended)**: scaffold `.azure/deployments/git-ape-runners/` → deploy the subscription-scoped stack via `/azure-stack-deploy git-ape-runners` (first deploy on `ubuntu-latest`) → `az acr build` the runner image into the stack's ACR → `az keyvault secret set` the PAT (never a registration token) → set `GIT_APE_RUNNER_LABEL` (Step 12b, ACA path) — Git-Ape deploying Git-Ape. For **self-hosted ACI/AKS**: ask the user for a GitHub PAT → ACR (no admin) + cloud build via ACR Tasks (`--no-logs` on Windows) + managed identity with `AcrPull` role + ACI deployment with identity-based registry auth using user-provided PAT + `minExecutions=1` + `GIT_APE_RUNNER_LABEL` (Step 12b, imperative path). 11. **Verify** by triggering `Git-Ape: Verify Setup` and confirming ALL steps pass on the private runner. 12. Summarize outcome (including scaffolded file counts and the chosen runner type) and suggest verification commands. diff --git a/.github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/README.md b/.github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/README.md new file mode 100644 index 0000000..2e27a06 --- /dev/null +++ b/.github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/README.md @@ -0,0 +1,93 @@ +# git-ape-runners — Git-Ape deploying Git-Ape + +This is a **Git-Ape deployment artifact** for the private GitHub Actions runners +that run Git-Ape's own workflows. Instead of provisioning runners with imperative +`az deployment group create`, this folder is scaffolded into your working copy at +`.azure/deployments/git-ape-runners/` and deployed through the normal Git-Ape +flow — so you get an **architecture diagram, cost estimate, managed deploy, and +single-command destroy** for the runner infrastructure itself. + +| File | Purpose | +|------|---------| +| `template.json` | Subscription-scoped Deployment Stack: RG + nested inner deployment (UAMI, ACR, AcrPull, Key Vault, KV Secrets User, ACA managed env, ACA job) | +| `parameters.json` | Non-secret parameters. Set `githubOwnerRepo`. **Never** put the PAT here. | +| `architecture.md` | Mermaid topology + bootstrap sequence | +| `metadata.json` | Git-Ape deployment metadata (`deploymentId: git-ape-runners`) | + +## Why this exists + +The runner infra is the one part of onboarding that is pure Azure IaC, so it is +the natural thing to hand to Git-Ape. What stays imperative (and why): + +- **Entra app + OIDC + subscription RBAC** for the deploy identity — this is the + identity Git-Ape's own OIDC login uses, so it must exist *before* Git-Ape can + deploy anything (circular dependency). +- **GitHub environments / secrets / variables** — GitHub API, not ARM. +- **`az acr build`** — a build action, not IaC. Runs *after* the stack creates + the ACR. +- **AKS runners** — stay Helm / Actions Runner Controller managed. + +## Deploy (Git-Ape flow) + +> Prereq: onboarding has already created the Entra app, OIDC, RBAC, and GitHub +> environments/secrets. This deployment only provisions the runner compute. + +1. **Set `githubOwnerRepo`** in `parameters.json` (e.g. `Azure/git-ape`). Adjust + `runnerScope`, `location`, `maxRunners`, `minRunners` as needed. Defaults for + `acrName` / `keyVaultName` are deterministic and globally unique. + +2. **Deploy the stack.** The *first* deploy runs on a public runner or locally, + because the private runner doesn't exist yet. + + Local: + ```bash + /azure-stack-deploy git-ape-runners + ``` + CI: merge a PR that adds `.azure/deployments/git-ape-runners/` — the + `git-ape-deploy.yml` workflow deploys it (keep `GIT_APE_RUNNER_LABEL` unset or + on `ubuntu-latest` for this first run). + +3. **Build & push the runner image** into the ACR the stack just created + (the stock `actions-runner` image lacks `az`/`gh`/`jq` and a registration + entrypoint — see `../../runners/Dockerfile`): + ```bash + ACR=$(jq -r '.acrLoginServer.value' .azure/deployments/git-ape-runners/state.json 2>/dev/null || echo '.azurecr.io') + az acr build --registry "${ACR%%.*}" --image git-ape-runner:latest \ + --file ../../runners/Dockerfile ../../runners/ --no-logs + ``` + +4. **Set the GitHub PAT into Key Vault** (never committed, never in ARM params). + Use a fine-grained PAT with Actions + Administration (Read & Write), or a + classic PAT with `repo` scope: + ```bash + KV=$(jq -r '.keyVaultName.value' .azure/deployments/git-ape-runners/state.json) + az keyvault secret set --vault-name "$KV" --name github-pat --value "" --output none + ``` + +5. **Point Git-Ape workflows at the runner:** + ```bash + gh variable set GIT_APE_RUNNER_LABEL --repo / --body "git-ape-runner" + ``` + From here, every Git-Ape deploy — including re-deploys of *this* stack — runs + on the private runner. That is Git-Ape deploying Git-Ape. + +## Destroy + +```bash +/azure-stack-destroy git-ape-runners +``` +Runs `az stack sub delete --action-on-unmanage deleteAll` (removes RG, ACA job, +environment, ACR, identity, role assignments, Key Vault in one call) and purges +the soft-deleted Key Vault. Then clear the variable to fall back to hosted +runners: `gh variable delete GIT_APE_RUNNER_LABEL`. + +## Notes + +- **Key Vault** uses RBAC authorization, soft-delete on, and purge protection + **off** so the destroy flow can fully purge it between deploy/destroy cycles. +- **`minRunners: 1`** keeps one runner warm and visible in GitHub (no cold-start + gap). Set `0` for true scale-to-zero. +- The template validates with `az deployment sub validate` and deploys with + `az stack sub create` — identical to every other Git-Ape deployment. +- ACI remains available as a raw template under `../../runners/aci/` for users + who don't want the managed-stack flow. diff --git a/.github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/architecture.md b/.github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/architecture.md new file mode 100644 index 0000000..6fea003 --- /dev/null +++ b/.github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/architecture.md @@ -0,0 +1,78 @@ +# Architecture — Git-Ape self-hosted runners + +This deployment provisions the private GitHub Actions runner that runs **future +Git-Ape workflows** — i.e. Git-Ape deploying Git-Ape. It is a single +subscription-scoped **Azure Deployment Stack** (`template.json`), so it gets an +architecture diagram, cost estimate, managed deploy, and single-command destroy +like any other Git-Ape deployment. + +## Resource topology + +```mermaid +%%{init: {'theme':'base','themeVariables':{'fontSize':'13px'}}}%% +flowchart TB + subgraph SUB["Subscription scope — Deployment Stack: git-ape-runners"] + RG["Resource Group
rg-git-ape-runners"] + subgraph INNER["Nested inner-scope deployment (rg-git-ape-runners)"] + UAMI["User-Assigned Managed Identity
id-git-ape-runner"] + ACR["Container Registry (Basic)
holds git-ape-runner:latest"] + KV["Key Vault (RBAC)
secret: github-pat"] + ENV["ACA Managed Environment
git-ape-runner-env"] + JOB["ACA Job (Event/KEDA github-runner)
ephemeral runners, scale-to-zero"] + RA1(["roleAssignment: AcrPull
UAMI → ACR"]) + RA2(["roleAssignment: Key Vault Secrets User
UAMI → KV"]) + end + end + + RG --> INNER + UAMI -. AcrPull .-> ACR + UAMI -. Secrets User .-> KV + RA1 --- UAMI + RA1 --- ACR + RA2 --- UAMI + RA2 --- KV + JOB -->|environmentId| ENV + JOB -->|image pull via identity| ACR + JOB -->|secret ref via identity| KV + JOB -->|registers ephemeral runners| GH["GitHub Actions
label: git-ape-runner"] +``` + +## Bootstrap ordering (self-hosting) + +Git-Ape workflows run on `${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }}`. +The **first** deploy of this stack can't run on the private runner (it doesn't +exist yet), so it runs on a public runner or locally. Once it's up and +`GIT_APE_RUNNER_LABEL` is set, every later Git-Ape run — **including updates to +this runner stack itself** — executes on the private runner it created. + +```mermaid +%%{init: {'theme':'base','themeVariables':{'fontSize':'13px'}}}%% +sequenceDiagram + participant U as Operator / Onboarding + participant GA as Git-Ape (stack deploy) + participant AZ as Azure (rg-git-ape-runners) + participant GH as GitHub Actions + + U->>GA: 1. Deploy git-ape-runners (on ubuntu-latest / local) + GA->>AZ: az stack sub create — RG, UAMI, ACR, KV, ACA job + U->>AZ: 2. az acr build (push git-ape-runner:latest) + U->>AZ: 3. az keyvault secret set --name github-pat + AZ->>GH: 4. Job registers ephemeral runner (label git-ape-runner) + U->>GH: 5. gh variable set GIT_APE_RUNNER_LABEL=git-ape-runner + Note over GA,GH: 6. All later Git-Ape deploys run on the private runner +``` + +## Secrets + +The GitHub PAT is **never** in git, ARM parameters, or deployment history. The +stack creates an empty Key Vault; the PAT is written post-deploy with +`az keyvault secret set`. The ACA Job reads it at runtime through a Key Vault +secret reference (`keyVaultUrl` + user-assigned `identity`), which requires the +in-template **Key Vault Secrets User** role assignment. + +## Destroy + +`/azure-stack-destroy git-ape-runners` (or the `git-ape-destroy.yml` flow) runs +`az stack sub delete --action-on-unmanage deleteAll`, removing the RG, ACA job, +environment, ACR, identity, role assignments, and Key Vault in one call, then +purges the soft-deleted Key Vault (purge protection is intentionally off). diff --git a/.github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/metadata.json b/.github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/metadata.json new file mode 100644 index 0000000..7d451ca --- /dev/null +++ b/.github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/metadata.json @@ -0,0 +1,14 @@ +{ + "deploymentId": "git-ape-runners", + "timestamp": "", + "status": "initialized", + "scope": "subscription", + "region": "eastus", + "project": "git-ape", + "environment": "prod", + "deployMethod": "stack", + "resourceGroup": "rg-git-ape-runners", + "resourceGroups": ["rg-git-ape-runners"], + "description": "Git-Ape self-hosted GitHub Actions runners (ACA event-driven job) managed as a first-class Git-Ape deployment. Provides the private runner that executes future Git-Ape workflows — Git-Ape deploying Git-Ape.", + "createdBy": "git-ape-onboarding" +} diff --git a/.github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/parameters.json b/.github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/parameters.json new file mode 100644 index 0000000..d2ae5bf --- /dev/null +++ b/.github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/parameters.json @@ -0,0 +1,47 @@ +{ + "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#", + "contentVersion": "1.0.0.0", + "parameters": { + "location": { + "value": "eastus" + }, + "resourceGroupName": { + "value": "rg-git-ape-runners" + }, + "runnerName": { + "value": "git-ape-runner" + }, + "githubOwnerRepo": { + "value": "/" + }, + "runnerScope": { + "value": "repo" + }, + "runnerLabels": { + "value": "git-ape-runner" + }, + "maxRunners": { + "value": 10 + }, + "minRunners": { + "value": 1 + }, + "cpuCores": { + "value": "1.0" + }, + "memorySize": { + "value": "2Gi" + }, + "infrastructureSubnetId": { + "value": "" + }, + "tags": { + "value": { + "ManagedBy": "git-ape", + "Component": "git-ape-runners", + "Environment": "prod", + "Project": "git-ape" + } + } + } +} diff --git a/.github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/template.json b/.github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/template.json new file mode 100644 index 0000000..2a364c5 --- /dev/null +++ b/.github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/template.json @@ -0,0 +1,494 @@ +{ + "$schema": "https://schema.management.azure.com/schemas/2018-05-01/subscriptionDeploymentTemplate.json#", + "contentVersion": "1.0.0.0", + "metadata": { + "_generator": "git-ape-onboarding", + "description": "Git-Ape self-hosted runner infrastructure as a first-class Git-Ape deployment. Subscription-scoped Deployment Stack template that creates the resource group and, via a nested inner-scope deployment, a fully self-contained runner stack: a user-assigned managed identity, an Azure Container Registry (for the custom runner image), an AcrPull role assignment, a Key Vault (for the GitHub PAT), a Key Vault Secrets User role assignment, an Azure Container Apps managed environment, and an event-driven Container Apps Job that scales ephemeral GitHub Actions runners with the KEDA 'github-runner' scaler (scale-to-zero). The GitHub PAT is referenced from Key Vault (never inline, never in git); set it post-deploy with `az keyvault secret set`. Deploy through the Git-Ape stack flow so you get an architecture diagram, cost estimate, managed deploy, and single-command destroy.\n\nDeploy (local): /azure-stack-deploy git-ape-runners\nDeploy (CI): merge a PR that adds .azure/deployments/git-ape-runners/ -> git-ape-deploy.yml\nDestroy: /azure-stack-destroy git-ape-runners" + }, + "parameters": { + "location": { + "type": "string", + "defaultValue": "eastus", + "metadata": { + "description": "Azure region for the resource group and all runner resources." + } + }, + "resourceGroupName": { + "type": "string", + "defaultValue": "rg-git-ape-runners", + "metadata": { + "description": "Resource group that will own the runner stack. Created by this template." + } + }, + "runnerName": { + "type": "string", + "defaultValue": "git-ape-runner", + "metadata": { + "description": "Base name for the managed environment and the Container Apps Job." + } + }, + "githubOwnerRepo": { + "type": "string", + "metadata": { + "description": "GitHub target in / form for repo-scoped runners, or the org/owner name for org-scoped runners." + } + }, + "runnerScope": { + "type": "string", + "defaultValue": "repo", + "allowedValues": [ + "repo", + "org" + ], + "metadata": { + "description": "Runner registration scope." + } + }, + "runnerLabels": { + "type": "string", + "defaultValue": "git-ape-runner", + "metadata": { + "description": "Comma-separated runner labels. Must include the value used for the GIT_APE_RUNNER_LABEL workflow variable." + } + }, + "acrName": { + "type": "string", + "defaultValue": "[toLower(format('acrgitape{0}', uniqueString(subscription().id, parameters('resourceGroupName'))))]", + "metadata": { + "description": "Azure Container Registry name (5-50 alphanumeric, globally unique). Holds the custom runner image built with `az acr build`. Defaults to a deterministic unique name." + } + }, + "keyVaultName": { + "type": "string", + "defaultValue": "[format('kv-gitape-{0}', substring(uniqueString(subscription().id, parameters('resourceGroupName')), 0, 8))]", + "metadata": { + "description": "Key Vault name (3-24 chars) that stores the GitHub PAT secret 'github-pat'. Defaults to a deterministic unique name." + } + }, + "runnerImage": { + "type": "string", + "defaultValue": "[format('{0}.azurecr.io/git-ape-runner:latest', parameters('acrName'))]", + "metadata": { + "description": "Runner container image. Defaults to the custom image in the ACR created by this template. Build it with `az acr build` after the stack deploys (see README)." + } + }, + "maxRunners": { + "type": "int", + "defaultValue": 10, + "metadata": { + "description": "Maximum concurrent runner executions (KEDA maxExecutions)." + } + }, + "minRunners": { + "type": "int", + "defaultValue": 0, + "metadata": { + "description": "Minimum warm runner executions (KEDA minExecutions). Set to 1 to keep one runner always visible in GitHub and avoid cold-start delays; 0 for true scale-to-zero." + } + }, + "cpuCores": { + "type": "string", + "defaultValue": "1.0", + "metadata": { + "description": "vCPU for the runner container." + } + }, + "memorySize": { + "type": "string", + "defaultValue": "2Gi", + "metadata": { + "description": "Memory for the runner container (e.g. 2Gi)." + } + }, + "infrastructureSubnetId": { + "type": "string", + "defaultValue": "", + "metadata": { + "description": "Resource ID of an infrastructure subnet for VNet injection of the ACA environment. Leave empty for a non-VNet environment (Consumption needs a /23 or larger)." + } + }, + "tags": { + "type": "object", + "defaultValue": { + "ManagedBy": "git-ape", + "Component": "git-ape-runners" + }, + "metadata": { + "description": "Tags applied to the resource group and all runner resources." + } + } + }, + "variables": { + "runnerIdentityName": "[format('id-{0}', parameters('runnerName'))]" + }, + "resources": [ + { + "type": "Microsoft.Resources/resourceGroups", + "apiVersion": "2021-04-01", + "name": "[parameters('resourceGroupName')]", + "location": "[parameters('location')]", + "tags": "[parameters('tags')]" + }, + { + "type": "Microsoft.Resources/deployments", + "apiVersion": "2022-09-01", + "name": "git-ape-runners-inner", + "resourceGroup": "[parameters('resourceGroupName')]", + "dependsOn": [ + "[subscriptionResourceId('Microsoft.Resources/resourceGroups', parameters('resourceGroupName'))]" + ], + "properties": { + "mode": "Incremental", + "expressionEvaluationOptions": { + "scope": "inner" + }, + "parameters": { + "location": { + "value": "[parameters('location')]" + }, + "runnerName": { + "value": "[parameters('runnerName')]" + }, + "runnerIdentityName": { + "value": "[variables('runnerIdentityName')]" + }, + "githubOwnerRepo": { + "value": "[parameters('githubOwnerRepo')]" + }, + "runnerScope": { + "value": "[parameters('runnerScope')]" + }, + "runnerLabels": { + "value": "[parameters('runnerLabels')]" + }, + "acrName": { + "value": "[parameters('acrName')]" + }, + "keyVaultName": { + "value": "[parameters('keyVaultName')]" + }, + "runnerImage": { + "value": "[parameters('runnerImage')]" + }, + "maxRunners": { + "value": "[parameters('maxRunners')]" + }, + "minRunners": { + "value": "[parameters('minRunners')]" + }, + "cpuCores": { + "value": "[parameters('cpuCores')]" + }, + "memorySize": { + "value": "[parameters('memorySize')]" + }, + "infrastructureSubnetId": { + "value": "[parameters('infrastructureSubnetId')]" + }, + "tags": { + "value": "[parameters('tags')]" + } + }, + "template": { + "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#", + "contentVersion": "1.0.0.0", + "parameters": { + "location": { + "type": "string" + }, + "runnerName": { + "type": "string" + }, + "runnerIdentityName": { + "type": "string" + }, + "githubOwnerRepo": { + "type": "string" + }, + "runnerScope": { + "type": "string" + }, + "runnerLabels": { + "type": "string" + }, + "acrName": { + "type": "string" + }, + "keyVaultName": { + "type": "string" + }, + "runnerImage": { + "type": "string" + }, + "maxRunners": { + "type": "int" + }, + "minRunners": { + "type": "int" + }, + "cpuCores": { + "type": "string" + }, + "memorySize": { + "type": "string" + }, + "infrastructureSubnetId": { + "type": "string" + }, + "tags": { + "type": "object" + } + }, + "variables": { + "isOrgScope": "[equals(parameters('runnerScope'), 'org')]", + "isVnet": "[not(empty(parameters('infrastructureSubnetId')))]", + "envName": "[format('{0}-env', parameters('runnerName'))]", + "acrPullRoleId": "7f951dda-4ed3-4680-a7ca-43fe172d538d", + "keyVaultSecretsUserRoleId": "4633458b-17de-408a-b874-0445c86b69e6", + "identityResourceId": "[resourceId('Microsoft.ManagedIdentity/userAssignedIdentities', parameters('runnerIdentityName'))]", + "ownerName": "[if(variables('isOrgScope'), parameters('githubOwnerRepo'), first(split(parameters('githubOwnerRepo'), '/')))]", + "repoName": "[if(variables('isOrgScope'), '', last(split(parameters('githubOwnerRepo'), '/')))]", + "scalerMetadataBase": { + "owner": "[variables('ownerName')]", + "runnerScope": "[parameters('runnerScope')]", + "labels": "[parameters('runnerLabels')]", + "targetWorkflowQueueLength": "1", + "applicationID": "" + }, + "scalerMetadata": "[if(variables('isOrgScope'), variables('scalerMetadataBase'), union(variables('scalerMetadataBase'), createObject('repos', variables('repoName'))))]", + "scopeEnv": "[if(variables('isOrgScope'), createArray(createObject('name', 'ORG_NAME', 'value', parameters('githubOwnerRepo'))), createArray(createObject('name', 'REPO_URL', 'value', concat('https://github.com/', parameters('githubOwnerRepo')))))]", + "baseEnv": [ + { + "name": "RUNNER_SCOPE", + "value": "[parameters('runnerScope')]" + }, + { + "name": "RUNNER_NAME_PREFIX", + "value": "[parameters('runnerName')]" + }, + { + "name": "LABELS", + "value": "[parameters('runnerLabels')]" + }, + { + "name": "EPHEMERAL", + "value": "true" + }, + { + "name": "DISABLE_AUTO_UPDATE", + "value": "true" + }, + { + "name": "ACCESS_TOKEN", + "secretRef": "github-pat" + } + ], + "containerEnv": "[concat(variables('scopeEnv'), variables('baseEnv'))]", + "vnetConfiguration": "[if(variables('isVnet'), createObject('infrastructureSubnetId', parameters('infrastructureSubnetId'), 'internal', false()), json('null'))]" + }, + "resources": [ + { + "type": "Microsoft.ManagedIdentity/userAssignedIdentities", + "apiVersion": "2023-01-31", + "name": "[parameters('runnerIdentityName')]", + "location": "[parameters('location')]", + "tags": "[parameters('tags')]" + }, + { + "type": "Microsoft.ContainerRegistry/registries", + "apiVersion": "2023-07-01", + "name": "[parameters('acrName')]", + "location": "[parameters('location')]", + "tags": "[parameters('tags')]", + "sku": { + "name": "Basic" + }, + "properties": { + "adminUserEnabled": false, + "anonymousPullEnabled": false + } + }, + { + "type": "Microsoft.Authorization/roleAssignments", + "apiVersion": "2022-04-01", + "name": "[guid(resourceId('Microsoft.ContainerRegistry/registries', parameters('acrName')), variables('identityResourceId'), variables('acrPullRoleId'))]", + "scope": "[format('Microsoft.ContainerRegistry/registries/{0}', parameters('acrName'))]", + "dependsOn": [ + "[resourceId('Microsoft.ContainerRegistry/registries', parameters('acrName'))]", + "[variables('identityResourceId')]" + ], + "properties": { + "roleDefinitionId": "[subscriptionResourceId('Microsoft.Authorization/roleDefinitions', variables('acrPullRoleId'))]", + "principalId": "[reference(variables('identityResourceId'), '2023-01-31').principalId]", + "principalType": "ServicePrincipal" + } + }, + { + "type": "Microsoft.KeyVault/vaults", + "apiVersion": "2023-07-01", + "name": "[parameters('keyVaultName')]", + "location": "[parameters('location')]", + "tags": "[parameters('tags')]", + "properties": { + "tenantId": "[subscription().tenantId]", + "sku": { + "family": "A", + "name": "standard" + }, + "enableRbacAuthorization": true, + "enableSoftDelete": true, + "softDeleteRetentionInDays": 7, + "publicNetworkAccess": "Enabled" + } + }, + { + "type": "Microsoft.Authorization/roleAssignments", + "apiVersion": "2022-04-01", + "name": "[guid(resourceId('Microsoft.KeyVault/vaults', parameters('keyVaultName')), variables('identityResourceId'), variables('keyVaultSecretsUserRoleId'))]", + "scope": "[format('Microsoft.KeyVault/vaults/{0}', parameters('keyVaultName'))]", + "dependsOn": [ + "[resourceId('Microsoft.KeyVault/vaults', parameters('keyVaultName'))]", + "[variables('identityResourceId')]" + ], + "properties": { + "roleDefinitionId": "[subscriptionResourceId('Microsoft.Authorization/roleDefinitions', variables('keyVaultSecretsUserRoleId'))]", + "principalId": "[reference(variables('identityResourceId'), '2023-01-31').principalId]", + "principalType": "ServicePrincipal" + } + }, + { + "type": "Microsoft.App/managedEnvironments", + "apiVersion": "2024-03-01", + "name": "[variables('envName')]", + "location": "[parameters('location')]", + "tags": "[parameters('tags')]", + "properties": { + "vnetConfiguration": "[variables('vnetConfiguration')]" + } + }, + { + "type": "Microsoft.App/jobs", + "apiVersion": "2024-03-01", + "name": "[parameters('runnerName')]", + "location": "[parameters('location')]", + "tags": "[parameters('tags')]", + "identity": { + "type": "UserAssigned", + "userAssignedIdentities": { + "[variables('identityResourceId')]": {} + } + }, + "dependsOn": [ + "[resourceId('Microsoft.App/managedEnvironments', variables('envName'))]", + "[resourceId('Microsoft.KeyVault/vaults', parameters('keyVaultName'))]", + "[resourceId('Microsoft.ContainerRegistry/registries', parameters('acrName'))]", + "[guid(resourceId('Microsoft.KeyVault/vaults', parameters('keyVaultName')), variables('identityResourceId'), variables('keyVaultSecretsUserRoleId'))]", + "[guid(resourceId('Microsoft.ContainerRegistry/registries', parameters('acrName')), variables('identityResourceId'), variables('acrPullRoleId'))]" + ], + "properties": { + "environmentId": "[resourceId('Microsoft.App/managedEnvironments', variables('envName'))]", + "configuration": { + "triggerType": "Event", + "replicaTimeout": 1800, + "replicaRetryLimit": 1, + "registries": [ + { + "server": "[format('{0}.azurecr.io', parameters('acrName'))]", + "identity": "[variables('identityResourceId')]" + } + ], + "secrets": [ + { + "name": "github-pat", + "keyVaultUrl": "[format('{0}secrets/github-pat', reference(resourceId('Microsoft.KeyVault/vaults', parameters('keyVaultName')), '2023-07-01').vaultUri)]", + "identity": "[variables('identityResourceId')]" + } + ], + "eventTriggerConfig": { + "parallelism": 1, + "replicaCompletionCount": 1, + "scale": { + "minExecutions": "[parameters('minRunners')]", + "maxExecutions": "[parameters('maxRunners')]", + "pollingInterval": 30, + "rules": [ + { + "name": "github-runner", + "type": "github-runner", + "metadata": "[variables('scalerMetadata')]", + "auth": [ + { + "secretRef": "github-pat", + "triggerParameter": "personalAccessToken" + } + ] + } + ] + } + } + }, + "template": { + "containers": [ + { + "name": "[parameters('runnerName')]", + "image": "[parameters('runnerImage')]", + "env": "[variables('containerEnv')]", + "resources": { + "cpu": "[json(parameters('cpuCores'))]", + "memory": "[parameters('memorySize')]" + } + } + ] + } + } + } + ], + "outputs": { + "runnerJobId": { + "type": "string", + "value": "[resourceId('Microsoft.App/jobs', parameters('runnerName'))]" + }, + "acrLoginServer": { + "type": "string", + "value": "[format('{0}.azurecr.io', parameters('acrName'))]" + }, + "keyVaultName": { + "type": "string", + "value": "[parameters('keyVaultName')]" + }, + "identityClientId": { + "type": "string", + "value": "[reference(variables('identityResourceId'), '2023-01-31').clientId]" + }, + "runnerLabel": { + "type": "string", + "value": "[parameters('runnerLabels')]" + } + } + } + } + } + ], + "outputs": { + "resourceGroupName": { + "type": "string", + "value": "[parameters('resourceGroupName')]" + }, + "acrLoginServer": { + "type": "string", + "value": "[reference('git-ape-runners-inner').outputs.acrLoginServer.value]" + }, + "keyVaultName": { + "type": "string", + "value": "[reference('git-ape-runners-inner').outputs.keyVaultName.value]" + }, + "runnerJobId": { + "type": "string", + "value": "[reference('git-ape-runners-inner').outputs.runnerJobId.value]" + }, + "runnerLabel": { + "type": "string", + "value": "[reference('git-ape-runners-inner').outputs.runnerLabel.value]" + } + } +} diff --git a/.github/skills/git-ape-onboarding/templates/runners/README.md b/.github/skills/git-ape-onboarding/templates/runners/README.md index f0c4141..9aae0c3 100644 --- a/.github/skills/git-ape-onboarding/templates/runners/README.md +++ b/.github/skills/git-ape-onboarding/templates/runners/README.md @@ -220,11 +220,22 @@ gh auth refresh -h github.com -s admin:org,admin:enterprise,manage_runners:org,r Self-hosted runners run in **your** Azure subscription. You manage the compute, image, scaling, and networking. +> **⭐ ACA is deployed as a first-class Git-Ape deployment — Git-Ape deploying +> Git-Ape.** Instead of an imperative `az deployment group create`, the ACA +> runner stack ships as a subscription-scoped deployment artifact at +> [`../deployments/git-ape-runners/`](../deployments/git-ape-runners). Onboarding +> scaffolds it to your repo's `.azure/deployments/git-ape-runners/`, so the +> runner infrastructure flows through Git-Ape's managed pipeline: **architecture +> diagram, cost estimate, `az stack sub create` deploy, and single-command +> destroy**. The raw `aca/` template below remains as the reference IaC that the +> deployment artifact is built from. See that folder's `README.md` for the +> deploy/destroy walkthrough and bootstrap ordering. + ### Platform matrix | | **Azure Container Instances (ACI)** | **Azure Container Apps (ACA)** | **Azure Kubernetes Service (AKS)** | |---|---|---|---| -| **Basic** | [`aci/`](./aci) — single container group, simplest | [`aca/`](./aca) — KEDA-scaled ephemeral jobs | [`aks/`](./aks) — Actions Runner Controller (ARC) | +| **Basic** | [`aci/`](./aci) — single container group, simplest | [`aca/`](./aca) — KEDA-scaled ephemeral jobs · **deploy via [`deployments/git-ape-runners/`](../deployments/git-ape-runners)** | [`aks/`](./aks) — Actions Runner Controller (ARC) | | **With private networking** | [`aci/`](./aci) with `subnetId` set | [`aca/`](./aca) with `infrastructureSubnetId` set | [`aks/`](./aks) — runners on cluster node subnet | ### Which platform? @@ -232,7 +243,7 @@ image, scaling, and networking. | Choose | When | |--------|------| | **ACI** | Fewest moving parts. A handful of runners, simple scaling, fast to stand up. | -| **ACA** | You want **event-driven, ephemeral** runners that scale to zero between jobs (KEDA `github-runner` scaler). Best cost/utilization. | +| **ACA** (recommended) | You want **event-driven, ephemeral** runners that scale to zero between jobs (KEDA `github-runner` scaler). Best cost/utilization — and it's deployed as a managed Git-Ape deployment (diagram/cost/deploy/destroy). | | **AKS** | You already run AKS, need large-scale autoscaling, or want ARC's ephemeral runner pods and fine-grained scheduling. | ## Custom runner image (required) @@ -277,6 +288,13 @@ az acr build --registry --image git-ape-runner:latest \ ### ACR pull authentication (managed identity — recommended) +> **For ACA, prefer the managed Git-Ape deployment** at +> [`../deployments/git-ape-runners/`](../deployments/git-ape-runners), which +> provisions the managed identity, `AcrPull` role assignment, Key Vault, and ACA +> job for you inside one subscription-scoped stack (`az stack sub create`). The +> manual `az` steps below are the imperative equivalent, kept for reference and +> for the ACI path. + Use a **user-assigned managed identity** with the `AcrPull` role to pull images from your ACR. This eliminates admin credentials entirely. diff --git a/website/docs/getting-started/onboarding.md b/website/docs/getting-started/onboarding.md index cac65ce..b0118cf 100644 --- a/website/docs/getting-started/onboarding.md +++ b/website/docs/getting-started/onboarding.md @@ -524,9 +524,33 @@ runners and set one variable. | Platform | What it provisions | |----------|--------------------| | **ACI** | ARM `template.json` — a container group running an ephemeral runner. | -| **ACA** | ARM `template.json` — a KEDA `github-runner`-scaled Container Apps Job (scale-to-zero). | +| **ACA** (recommended) | Deployed as a **Git-Ape deployment** — see below. A KEDA `github-runner`-scaled Container Apps Job (scale-to-zero) with managed identity, ACR, and Key Vault. | | **AKS** | Actions Runner Controller (ARC) via Helm `values.yaml`. | +#### Git-Ape deploying Git-Ape (ACA runners) + +The ACA runner path is itself a **first-class Git-Ape deployment**. Rather than an +imperative `az deployment group create`, onboarding scaffolds a subscription-scoped +deployment artifact to `.azure/deployments/git-ape-runners/` +(source: [`templates/deployments/git-ape-runners/`](https://github.com/Azure/git-ape/tree/main/.github/skills/git-ape-onboarding/templates/deployments/git-ape-runners)). +The runner infrastructure then flows through Git-Ape's own managed pipeline: + +- **Architecture diagram** — `architecture.md` (topology + bootstrap sequence). +- **Cost estimate** — via the `azure-cost-estimator` skill on the template. +- **Deploy** — `az stack sub create --action-on-unmanage deleteAll` (the same + Deployment Stack primitive every Git-Ape deployment uses), producing `state.json`. +- **Destroy** — single-command teardown via the `azure-stack-destroy` skill. + +The PAT is never in git, ARM parameters, or deployment history — the stack creates +an empty Key Vault and the token is set post-deploy with `az keyvault secret set`; +the ACA Job reads it as a Key Vault secret reference. + +**Bootstrap ordering (the self-hosting loop):** the first `git-ape-runners` deploy +runs on `ubuntu-latest` (or your local machine) because the private runner does not +exist yet. Once it is online and `GIT_APE_RUNNER_LABEL` is set, every subsequent +Git-Ape run — including updates to the runner stack itself — executes on the private +runner. Git-Ape ends up deploying and maintaining the very runners that run Git-Ape. + Once a runner is online (with the `git-ape-runner` label), flip the switch: ```bash diff --git a/website/docs/skills/git-ape-onboarding.md b/website/docs/skills/git-ape-onboarding.md index 0dcd02c..e284dac 100644 --- a/website/docs/skills/git-ape-onboarding.md +++ b/website/docs/skills/git-ape-onboarding.md @@ -71,7 +71,7 @@ This skill configures: 5. Required GitHub secrets (`AZURE_CLIENT_ID`, `AZURE_TENANT_ID`) and the `AZURE_SUBSCRIPTION_ID` variable, plus the optional `GIT_APE_RUNNER_LABEL` variable that selects private runners 6. Scaffolded GitHub Actions workflow files (`git-ape-plan.yml`, `-deploy.yml`, `-destroy.yml`, `-verify.yml`, `-drift.{md,lock.yml}`) and deployment standards (`.github/copilot-instructions.md`) into the user's working copy 7. *(Optional)* The `COPILOT_GITHUB_TOKEN` repository secret that powers the agentic drift-detection workflow (`git-ape-drift.lock.yml`) — only when the user opts into scheduled drift detection -8. The GitHub Actions **runner type** the workflows run on — public GitHub-hosted (default), **hosted compute networking** (GitHub-managed runners with Azure private networking, requires GHEC), or self-hosted runners in your Azure subscription (ACI / ACA / AKS). On-demand IaC for private runners ships at `./templates/runners/`. +8. The GitHub Actions **runner type** the workflows run on — public GitHub-hosted (default), **hosted compute networking** (GitHub-managed runners with Azure private networking, requires GHEC), or self-hosted runners in your Azure subscription (ACI / ACA / AKS). On-demand IaC for private runners ships at `./templates/runners/`. ACA runners are deployed as a first-class **Git-Ape deployment** (`./templates/deployments/git-ape-runners/`) — Git-Ape deploying Git-Ape — so they get an architecture diagram, cost estimate, managed deploy, and single-command destroy. ## Prerequisites @@ -487,12 +487,88 @@ scaling, and networking. 1. **Ask the platform:** ``` Which Azure platform should host the runners? + - ACA — Azure Container Apps (event-driven, ephemeral, scale-to-zero) — RECOMMENDED - ACI — Azure Container Instances (simplest; a handful of runners) - - ACA — Azure Container Apps (event-driven, ephemeral, scale-to-zero) - AKS — Azure Kubernetes Service (Actions Runner Controller; large scale) ``` -2. **Build the custom runner image using ACR Tasks (cloud build).** The base + **ACA is deployed as a Git-Ape deployment** (Git-Ape deploying Git-Ape), so the + runner infrastructure gets an architecture diagram, cost estimate, managed + deploy, and single-command destroy — see the ACA path directly below. **ACI** + and **AKS** use the imperative provisioning steps further down. + +#### ACA — deploy runners as a Git-Ape deployment (recommended) + +Instead of an imperative `az deployment group create`, scaffold the runner +infrastructure as a first-class Git-Ape deployment and deploy it through the +normal Git-Ape stack flow. The template is a subscription-scoped Deployment Stack, +so destroy is a single idempotent command and the PAT lives in Key Vault (never in +git or ARM parameters). + +1. **Scaffold the deployment artifact** into the working copy and set inputs: + ```bash + mkdir -p .azure/deployments/git-ape-runners + cp -R .github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/. \ + .azure/deployments/git-ape-runners/ + # set githubOwnerRepo (+ any overrides) — NEVER put the PAT here: + $EDITOR .azure/deployments/git-ape-runners/parameters.json + ``` + `template.json` creates, in one self-contained stack: the resource group, a + user-assigned identity, an ACR, an `AcrPull` role assignment, a Key Vault, a + `Key Vault Secrets User` role assignment, an ACA managed environment, and the + event-driven ACA Job. + +2. **Deploy the stack.** The first deploy runs on a public runner or locally, + because the private runner does not exist yet: + ```bash + /azure-stack-deploy git-ape-runners # local (VS Code / terminal) + ``` + In CI, open a PR that adds `.azure/deployments/git-ape-runners/`; the + `git-ape-deploy.yml` workflow deploys it on `ubuntu-latest` and writes + `state.json` plus the architecture/cost artifacts, exactly like any other + Git-Ape deployment. + +3. **Build & push the runner image** into the ACR the stack just created (the + stock `actions-runner` image lacks `az`/`gh`/`jq` and self-registration — see + the Dockerfile note in the imperative path below): + ```bash + ACR=$(jq -r '.acrLoginServer.value' .azure/deployments/git-ape-runners/state.json) + az acr build --registry "${ACR%%.*}" --image git-ape-runner:latest \ + --file .github/skills/git-ape-onboarding/templates/runners/Dockerfile \ + .github/skills/git-ape-onboarding/templates/runners/ --no-logs + ``` + +4. **Write the GitHub PAT into Key Vault** — never in git, ARM params, or chat + output. Collect a long-lived fine-grained PAT exactly as in the imperative + "Collect a GitHub PAT" step below (never a short-lived registration token): + ```bash + KV=$(jq -r '.keyVaultName.value' .azure/deployments/git-ape-runners/state.json) + az keyvault secret set --vault-name "$KV" --name github-pat \ + --value '' --output none + ``` + The ACA Job reads it at runtime through a Key Vault secret reference + (`keyVaultUrl` + user-assigned `identity`), enabled by the in-template + `Key Vault Secrets User` role assignment. + +5. **Point workflows at the runner** — after this, re-deploys of this very stack + run on it (the self-hosting loop): + ```bash + gh variable set GIT_APE_RUNNER_LABEL --repo / --body "git-ape-runner" + ``` + +6. **Destroy** with one command when no longer needed (tears down the whole stack + and purges the soft-deleted Key Vault): + ```bash + /azure-stack-destroy git-ape-runners + ``` + +See `templates/deployments/git-ape-runners/README.md` for the full walkthrough and +`templates/deployments/git-ape-runners/architecture.md` for the topology and +bootstrap sequence. + +#### ACI / AKS — imperative provisioning + +1. **Build the custom runner image using ACR Tasks (cloud build).** The base `ghcr.io/actions/actions-runner:latest` (GitHub's official runner image) does **NOT** include `az`, `gh`, or `jq`, and ships no registration entrypoint. Workflows fail with `Unable to locate executable file: az` — and on ACI/ACA @@ -521,7 +597,7 @@ scaling, and networking. az acr repository list --name -o table ``` -3. **Create a managed identity and assign `AcrPull` role** for image pulls: +2. **Create a managed identity and assign `AcrPull` role** for image pulls: ```bash # Create identity az identity create --name id-git-ape-runner --resource-group --location @@ -538,7 +614,7 @@ scaling, and networking. **Do NOT use ACR admin credentials** (`--admin-enabled true` + username/password). Managed identity is the secure, recommended approach. -4. **Collect a GitHub PAT from the user.** The ACA/ACI runner needs a +3. **Collect a GitHub PAT from the user.** The ACA/ACI runner needs a **long-lived GitHub Personal Access Token (PAT)** — NOT a short-lived registration token from `POST /actions/runners/registration-token`. Registration tokens expire in ~1 hour, but the KEDA `github-runner` scaler @@ -571,10 +647,12 @@ scaling, and networking. Never print the token value in chat output (see Safe-Execution Rules). -5. **Deploy the runner infrastructure** using the chosen platform template. - Pass the custom image, ACR server, managed identity, and user-provided PAT: +4. **Deploy the runner infrastructure (ACI).** Use the ACI template + (`templates/runners/aci/template.json`) — ACA is covered by the Git-Ape-managed + path above. Pass the custom image, ACR server, managed identity, and + user-provided PAT: ```bash - az deployment group create -g -f template.json \ + az deployment group create -g -f ./templates/runners/aci/template.json \ -p runnerImage='.azurecr.io/git-ape-runner:latest' \ acrServer='.azurecr.io' \ userAssignedIdentityId=$IDENTITY_ID \ @@ -594,7 +672,7 @@ scaling, and networking. environment's `provisioningState` to reach `Succeeded` before creating the job. -6. **Set `minExecutions=1`** (recommended) so at least one runner is always +5. **Set `minExecutions=1`** (recommended) so at least one runner is always warm and visible in GitHub Settings. Without this, KEDA scale-from-zero can take 1–3 minutes on cold start, during which GitHub shows "No runners configured": @@ -604,11 +682,11 @@ scaling, and networking. Leave at `0` only if you prefer true scale-to-zero and can tolerate cold-start delays. -7. **Confirm the runner is online** in *GitHub → Settings → Actions → Runners* +6. **Confirm the runner is online** in *GitHub → Settings → Actions → Runners* with the `git-ape-runner` label. (With `minExecutions=1`, a runner should appear within 30–60 seconds of deployment.) -8. **Set the variable** so workflows target it (repo-wide or per environment): +7. **Set the variable** so workflows target it (repo-wide or per environment): ```bash gh variable set GIT_APE_RUNNER_LABEL --repo / --body "git-ape-runner" # per environment instead: @@ -782,7 +860,7 @@ OIDC, RBAC, environments, and workflows. 7. *(Optional)* Offer to onboard the drift detector workflow by provisioning `COPILOT_GITHUB_TOKEN` (Step 10 in playbook). Skip if the user does not want scheduled drift detection. 8. Ask compliance framework and enforcement mode preferences (Step 11 in playbook). 9. Update `copilot-instructions.md` with compliance preferences — or, if the file was skipped by the scaffold step, surface the preferences in chat for manual integration. -10. Ask the runner type (and platform/scope if private), and — if private runners are chosen — provision the full stack. For **hosted compute networking**: consolidate gh auth scopes → ask org vs enterprise scope → provision Azure VNet + subnet → create GitHub.Network/networkSettings → create network config + runner group + hosted runner → assign repo → set `GIT_APE_RUNNER_LABEL` (Step 12a). For **self-hosted**: ask the user for a GitHub PAT (never generate a registration token) → ACR (no admin) + cloud build via ACR Tasks (`--no-logs` on Windows) + managed identity with `AcrPull` role + ACA/ACI deployment with identity-based registry auth using user-provided PAT + `minExecutions=1` + `GIT_APE_RUNNER_LABEL` (Step 12b). +10. Ask the runner type (and platform/scope if private), and — if private runners are chosen — provision the full stack. For **hosted compute networking**: consolidate gh auth scopes → ask org vs enterprise scope → provision Azure VNet + subnet → create GitHub.Network/networkSettings → create network config + runner group + hosted runner → assign repo → set `GIT_APE_RUNNER_LABEL` (Step 12a). For **self-hosted ACA (recommended)**: scaffold `.azure/deployments/git-ape-runners/` → deploy the subscription-scoped stack via `/azure-stack-deploy git-ape-runners` (first deploy on `ubuntu-latest`) → `az acr build` the runner image into the stack's ACR → `az keyvault secret set` the PAT (never a registration token) → set `GIT_APE_RUNNER_LABEL` (Step 12b, ACA path) — Git-Ape deploying Git-Ape. For **self-hosted ACI/AKS**: ask the user for a GitHub PAT → ACR (no admin) + cloud build via ACR Tasks (`--no-logs` on Windows) + managed identity with `AcrPull` role + ACI deployment with identity-based registry auth using user-provided PAT + `minExecutions=1` + `GIT_APE_RUNNER_LABEL` (Step 12b, imperative path). 11. **Verify** by triggering `Git-Ape: Verify Setup` and confirming ALL steps pass on the private runner. 12. Summarize outcome (including scaffolded file counts and the chosen runner type) and suggest verification commands.