diff --git a/.github/agents/git-ape-onboarding.agent.md b/.github/agents/git-ape-onboarding.agent.md index 709b98f..75a3842 100644 --- a/.github/agents/git-ape-onboarding.agent.md +++ b/.github/agents/git-ape-onboarding.agent.md @@ -43,7 +43,7 @@ Always use the `/git-ape-onboarding` skill for procedure and command patterns. ## Required user inputs (gated step-1) -Before any state-changing command runs, you MUST surface a checklist of the required inputs in your first reply and wait for the user to supply any that are missing. Even when the user's opening prompt already names a few (e.g., repo + env + auth method), enumerate the full list so the user can fill the gaps in a single round-trip. At minimum, request the following **six** inputs (rendered as a numbered list, table, or explicit question block — never inferred silently): +Before any state-changing command runs, you MUST surface a checklist of the required inputs in your first reply and wait for the user to supply any that are missing. Even when the user's opening prompt already names a few (e.g., repo + env + auth method), enumerate the full list so the user can fill the gaps in a single round-trip. At minimum, request the following **seven** inputs (rendered as a numbered list, table, or explicit question block — never inferred silently): 1. **Target GitHub repository** — `/` plus confirmation of the default branch (assume `main`; only change if the user explicitly says otherwise — never silently substitute `master`). 2. **Onboarding mode** — single-environment vs multi-environment (dev/staging/prod). Even if the prompt names one, restate it explicitly for confirmation. @@ -51,14 +51,15 @@ Before any state-changing command runs, you MUST surface a checklist of the requ 4. **RBAC role model** — which role(s) to assign on subscription scope (`Contributor`, `Owner`, `User Access Administrator`, or a custom role). Default suggestion: `Contributor`. 5. **Default Azure region** — primary region for the workload (e.g., `eastus`, `westus2`). Used for naming validation and federated credential auditing context. 6. **Project / deployment name** — short slug used to name the App Registration (`sp--`), federated credentials (`fc---main-branch`), and downstream Git-Ape deployments. +7. **Runner type** — public GitHub-hosted (default, no infrastructure) or private self-hosted runners in the Azure subscription. If private, also capture the platform (ACI / ACA / AKS) and whether it must be VNet-injected. Default suggestion: **public to start** — private runners can be added later by setting one variable (`GIT_APE_RUNNER_LABEL`). -Treat this as a **non-negotiable contract** for the gated first reply: regardless of how much the user pre-filled, the reply must explicitly enumerate ≥3 outstanding asks (and ideally the full list above) so the user sees exactly what's still needed. Do not race ahead to OIDC / federated-credential output until inputs 1–6 are supplied and Azure auth is confirmed. +Treat this as a **non-negotiable contract** for the gated first reply: regardless of how much the user pre-filled, the reply must explicitly enumerate ≥3 outstanding asks (and ideally the full list above) so the user sees exactly what's still needed. Do not race ahead to OIDC / federated-credential output until inputs 1–7 are supplied and Azure auth is confirmed. ## Workflow 1. Confirm target repository URL **and default branch** (input #1 above). 2. Ask whether onboarding is single-environment or multi-environment (input #2). -3. Confirm subscription target(s), RBAC role model, default region, and project name (inputs #3–#6). +3. Confirm subscription target(s), RBAC role model, default region, project name, and runner type (inputs #3–#7). 4. Validate prerequisites: - `az`, `gh`, `jq` installed - Azure authenticated (`az account show`) @@ -71,9 +72,10 @@ Treat this as a **non-negotiable contract** for the gated first reply: regardles - macOS / Linux / WSL: `./scripts/scaffold-repo.sh` - Windows (PowerShell 7+): `pwsh ./scripts/scaffold-repo.ps1` Both scripts produce byte-identical output. Report which files were created vs skipped. -9. Ask compliance framework and enforcement mode preferences (Step 10 in `/git-ape-onboarding` skill playbook). +9. Ask compliance framework and enforcement mode preferences (Step 11 in `/git-ape-onboarding` skill playbook). 10. Update the `## Compliance & Azure Policy` section in `.github/copilot-instructions.md` with the user's choices. If the file was skipped by the scaffold step or lacks that section, surface the captured preferences in chat for manual integration instead of mutating the file. -11. Summarize created/updated artifacts and next checks. +11. Select the runner type (input #7). If private runners were chosen, point the user at `./templates/runners//` for the reference IaC, have them provision it (sourcing the GitHub credential from Key Vault, never inlined), confirm the runner is online, and set the `GIT_APE_RUNNER_LABEL` variable. If public, leave the variable unset. (Step 12 in `/git-ape-onboarding` skill playbook.) +12. Summarize created/updated artifacts and next checks. ## Output Requirements diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index a6fe66a..a943f2b 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -298,6 +298,76 @@ Create two GitHub environments for protection rules: - Required reviewers (recommended — destructive action) - Deployment branches: `main` only (triggered on PR merge) +## GitHub Actions Runners + +Git-Ape workflows run on **public GitHub-hosted runners by default** and can be +switched to **private self-hosted runners** in your Azure subscription with a +single repository variable — no workflow edits required. + +### The runner switch: `GIT_APE_RUNNER_LABEL` + +Every scaffolded workflow (`git-ape-plan`, `-deploy`, `-destroy`, `-verify`) +resolves its runner like this: + +```yaml +runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} +``` + +| `GIT_APE_RUNNER_LABEL` | Effect | +|------------------------|--------| +| **unset** (default) | Jobs run on GitHub-hosted `ubuntu-latest`. No infrastructure. | +| set to a label (default `git-ape-runner`) | Jobs target your self-hosted runners registered with that label. | + +```bash +# Switch to private runners (after they are provisioned and online) +gh variable set GIT_APE_RUNNER_LABEL --repo / --body "git-ape-runner" +# Clean fallback to GitHub-hosted runners +gh variable delete GIT_APE_RUNNER_LABEL --repo / +``` + +In multi-environment mode, set the variable per environment (`--env azure-deploy`) +so only the environments that need private runners use them. + +### Runner types and platforms + +- **Public GitHub-hosted** — default; nothing to provision. +- **Self-hosted (subscription)** — runners are Azure resources with outbound + internet. Control over image, region, and identity without a VNet. +- **VNet-injected** — runners run inside a subnet you manage, for private + connectivity to Azure resources (private endpoints, no public egress except + to GitHub). + +Private runners are provisioned from on-demand reference IaC shipped with the +onboarding skill at `templates/runners/`: + +| Platform | What it provisions | +|----------|--------------------| +| **ACI** | ARM `template.json` — a container group running an ephemeral runner. | +| **ACA** | ARM `template.json` — a KEDA `github-runner`-scaled Container Apps Job (scale-to-zero). | +| **AKS** | Actions Runner Controller (ARC) via Helm `values.yaml` (cluster created with a standard Git-Ape ARM deployment). | + +These templates are **not** auto-scaffolded — the public bootstrap stays the +default. Run `/git-ape-onboarding` (Runner Selection step) to choose and +provision them. + +### Runner security model + +- **Azure access uses a user-assigned managed identity, never stored keys.** +- **The GitHub registration credential is the only secret** — source it from + Key Vault (`securestring` parameter or pre-created Kubernetes secret), never + inline it in a committed `parameters.json` or `values.yaml`. +- **Ephemeral runners by default** — one job per registration, so no state leaks + between deployments. +- The runner label (`git-ape-runner`) must equal `GIT_APE_RUNNER_LABEL`. + +### Drift workflow caveat + +`git-ape-drift.lock.yml` is a compiled GitHub Agentic Workflow (gh-aw) and does +**not** honor `GIT_APE_RUNNER_LABEL`. To run continuous drift detection on a +private runner, set `runs-on:` in the source `git-ape-drift.md` frontmatter and +recompile with `gh aw compile` — never hand-edit the `.lock.yml` (it carries an +integrity hash). + ## Security Baseline - Enable HTTPS-only for all web-facing resources diff --git a/.github/evals/git-ape-onboarding/tasks/positive-private-runner.yaml b/.github/evals/git-ape-onboarding/tasks/positive-private-runner.yaml new file mode 100644 index 0000000..09090c4 --- /dev/null +++ b/.github/evals/git-ape-onboarding/tasks/positive-private-runner.yaml @@ -0,0 +1,62 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/microsoft/waza/main/schemas/task.schema.json + +id: positive-private-runner +name: Positive — Onboard with private VNet-injected runner +description: Onboard request that selects private self-hosted runners (ACA, VNet-injected) — clear trigger plus runner-selection handling. +tags: [trigger, positive, runner] +inputs: + prompt: "Onboard this repository for Git-Ape, but I don't want to use GitHub-hosted runners — the deployment jobs must run on private self-hosted runners inside my Azure subscription, VNet-injected, on Azure Container Apps. Set up OIDC, RBAC, the GitHub environments, and tell me how the runner side works. My repo is a fresh local Git repo." +graders: + - name: trigger_relevance_positive + type: trigger + config: + skill_path: .github/skills/git-ape-onboarding/SKILL.md + mode: positive + threshold: 0.5 + + # answer_quality (LLM-as-judge): scoped per-task on positives so a flaky + # judge call only zeroes out this task, not the whole leg. See eval.yaml. + # + # IMPORTANT: waza prompt graders are binary (set_waza_grade_pass = 1.0, + # set_waza_grade_fail = 0.0). They are NOT 1–5 rubrics. The judge has NO + # access to the agent's response unless continue_session: true is set — it + # resumes the agent's own session so it can read the response. + - type: prompt + name: answer_quality + config: + continue_session: true + prompt: | + You are grading the assistant's previous response in this session. + The user asked to onboard a fresh repo for Git-Ape AND to run the + deployment workflows on PRIVATE self-hosted runners (VNet-injected, + on Azure Container Apps) instead of GitHub-hosted runners. + + The skill GATES on prerequisites and required user inputs before + executing any state-changing commands. The expected first-turn reply + is a "gated step-1" response, NOT a completion report. Runner + selection is one of the required inputs. + + PASS criteria — the response must satisfy ALL FOUR of: + 1. It is a gated handoff: prereq/auth status is shown OR required + inputs are requested before any execution (a numbered list or + explicit question block counts). + 2. It requests at least THREE required inputs from: target GitHub + repository, Azure subscription ID, RBAC role, default region, + project / deployment name, onboarding mode. + 3. It correctly handles the runner request — satisfy ANY ONE of: + (a) acknowledges the chosen private runner (self-hosted / + VNet-injected and/or Azure Container Apps / ACA) as the runner + type to configure; (b) explains the GIT_APE_RUNNER_LABEL + variable switch (public ubuntu-latest by default → private when + the label is set); or (c) points the user at the runner + reference IaC under templates/runners/ (ACI/ACA/AKS). It must + NOT ignore the runner request or claim GitHub-hosted is the only + option. + 4. It does NOT claim to have already provisioned runners, set + GIT_APE_RUNNER_LABEL, configured OIDC, created federated + credentials/environments, or assigned RBAC. The reply waits for + user input + auth before continuing. (Fabricated "I've + configured X" / "I provisioned Y" before inputs → FAIL.) + + If ALL FOUR are met, call `set_waza_grade_pass`. + Otherwise, call `set_waza_grade_fail` and list which criteria are missing. diff --git a/.github/skills/git-ape-onboarding/SKILL.md b/.github/skills/git-ape-onboarding/SKILL.md index 93a6751..f5bc430 100644 --- a/.github/skills/git-ape-onboarding/SKILL.md +++ b/.github/skills/git-ape-onboarding/SKILL.md @@ -51,9 +51,10 @@ This skill configures: 2. OIDC federated credentials for GitHub Actions 3. RBAC role assignment(s) on subscription scope 4. GitHub environments (`azure-deploy*`, `azure-destroy`) -5. Required GitHub secrets (`AZURE_CLIENT_ID`, `AZURE_TENANT_ID`) and the `AZURE_SUBSCRIPTION_ID` variable +5. Required GitHub secrets (`AZURE_CLIENT_ID`, `AZURE_TENANT_ID`) and the `AZURE_SUBSCRIPTION_ID` variable, plus the optional `GIT_APE_RUNNER_LABEL` variable that selects private runners 6. Scaffolded GitHub Actions workflow files (`git-ape-plan.yml`, `-deploy.yml`, `-destroy.yml`, `-verify.yml`, `-drift.{md,lock.yml}`) and deployment standards (`.github/copilot-instructions.md`) into the user's working copy 7. *(Optional)* The `COPILOT_GITHUB_TOKEN` repository secret that powers the agentic drift-detection workflow (`git-ape-drift.lock.yml`) — only when the user opts into scheduled drift detection +8. The GitHub Actions **runner type** the workflows run on — public GitHub-hosted (default), **hosted compute networking** (GitHub-managed runners with Azure private networking, requires GHEC), or self-hosted runners in your Azure subscription (ACI / ACA / AKS). On-demand IaC for private runners ships at `./templates/runners/`. ACA runners are deployed as a first-class **Git-Ape deployment** (`./templates/deployments/git-ape-runners/`) — Git-Ape deploying Git-Ape — so they get an architecture diagram, cost estimate, managed deploy, and single-command destroy. ## Prerequisites @@ -143,12 +144,13 @@ OIDC_PREFIX="repository_owner_id::repository_id:" - `fc-azure-deploy` subject `"$OIDC_PREFIX:environment:azure-deploy"` (one per environment in multi-env mode) - `fc-azure-destroy` subject `"$OIDC_PREFIX:environment:azure-destroy"` 6. Assign RBAC on each target subscription. -7. Set GitHub repo or environment secrets (`AZURE_CLIENT_ID`, `AZURE_TENANT_ID`) and the `AZURE_SUBSCRIPTION_ID` variable. +7. Set GitHub repo or environment secrets (`AZURE_CLIENT_ID`, `AZURE_TENANT_ID`) and the `AZURE_SUBSCRIPTION_ID` variable. (The `GIT_APE_RUNNER_LABEL` variable is set later in Step 12 only if private runners are chosen.) 8. Create GitHub environments and branch policies when permissions allow. 9. Scaffold workflow files and deployment standards into the user's working copy (see below). 10. *(Optional)* Provision the drift detector engine credential (`COPILOT_GITHUB_TOKEN`) so the agentic drift workflow can run (see below). 11. Capture compliance and Azure Policy preferences (see below). -12. Verify federated credentials, role assignments, and secrets. +12. Select the GitHub Actions runner type (public / hosted compute networking / self-hosted) and, if private runners are chosen, provision them and set `GIT_APE_RUNNER_LABEL` (see below). +13. Verify federated credentials, role assignments, and secrets. ### Step 9: Scaffold workflow files and deployment standards @@ -275,6 +277,415 @@ After RBAC and environment setup, ask the user about compliance requirements and preferences and a suggested patch in chat so the user can apply it. - In all cases, leave changes unstaged and let the user commit them. +### Step 12: Runner Selection & Provisioning (optional) + +Git-Ape workflows resolve their runner from a single variable: + +```yaml +runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} +``` + +Unset → public GitHub-hosted `ubuntu-latest` (the default; no infrastructure). +Set to a label → private runners with that label. This is the **bootstrap model: +start public, switch to private later with one variable.** + +1. **Ask the runner type:** + ``` + What runner should the Git-Ape workflows run on? + - Public GitHub-hosted (recommended to start — no infrastructure) + - Hosted compute networking (GitHub-managed runners in your Azure VNet — requires GHEC) + - Self-hosted in my Azure subscription (you manage compute, image, scaling) + ``` + +2. **If public (default):** do nothing. Leave `GIT_APE_RUNNER_LABEL` unset. + Onboarding is complete; the user can switch to private runners any time by + repeating this step. + +3. **If hosted compute networking:** + Follow the hosted compute sub-flow (Step 12a below). + +4. **If self-hosted:** + Follow the self-hosted sub-flow (Step 12b below). + +--- + +### Step 12a: Hosted Compute Networking (GitHub-managed, Azure private networking) + +GitHub-hosted runners with Azure private networking. GitHub manages the compute +(full Ubuntu images with `az`, `gh`, `jq`, `git` pre-installed), runners execute +inside your Azure VNet for private connectivity. + +**Prerequisites:** GitHub Enterprise Cloud. + +**Reference:** [About networking for hosted compute products](https://docs.github.com/en/enterprise-cloud@latest/admin/configuring-settings/configuring-private-networking-for-hosted-compute-products/about-networking-for-hosted-compute-products-in-your-enterprise) + +#### a. Consolidate GitHub auth scopes first + +Before starting provisioning, authenticate with **all required scopes in one +call** to avoid repeated auth prompts: + +```bash +gh auth refresh -h github.com -s admin:org,admin:enterprise,manage_runners:org,read:enterprise,write:network_configurations +``` + +| Scope | Purpose | +|-------|---------| +| `admin:org` | Create org-level runner groups, assign repos | +| `admin:enterprise` | Enterprise-level runner groups and hosted runners | +| `manage_runners:org` | Create/manage hosted runners | +| `read:enterprise` | Query enterprise metadata (databaseId, org membership) | +| `write:network_configurations` | Create network configurations | + +#### b. Ask scope: organization or enterprise + +``` +Where should the network configuration live? +- Enterprise level (shared across all orgs in the enterprise) +- Organization level (scoped to this org only) +``` + +| Scope | `businessId` value | UI location | +|-------|-------------------|-------------| +| **Enterprise** | Enterprise `databaseId` (from GraphQL) | Enterprise Settings → Hosted compute networking | +| **Organization** | Org numeric ID (REST: `.id` field) | Org Settings → Hosted compute networking | + +Query the needed ID: +```bash +# Enterprise databaseId (for enterprise scope): +gh api graphql -f query='{enterprise(slug: "") { databaseId }}' --jq '.data.enterprise.databaseId' + +# Org numeric ID (for org scope): +gh api orgs/ --jq '.id' +``` + +#### c. Provision Azure networking + +1. Create resource group and VNet with a `/28` subnet (minimum 16 IPs): + ```bash + az group create --name --location + az network vnet create --name --resource-group \ + --address-prefix 10.0.0.0/16 --subnet-name snet-runners --subnet-prefix 10.0.0.0/28 + ``` + +2. Delegate subnet to `GitHub.Network/networkSettings`: + ```bash + az network vnet subnet update --name snet-runners --vnet-name \ + --resource-group --delegations GitHub.Network/networkSettings + ``` + +3. Register `GitHub.Network` resource provider: + ```bash + az provider register --namespace GitHub.Network + # Wait until Registered: + az provider show --namespace GitHub.Network --query "registrationState" -o tsv + ``` + +4. Create `GitHub.Network/networkSettings` resource: + ```bash + az rest --method PUT \ + --url "https://management.azure.com/subscriptions//resourceGroups//providers/GitHub.Network/networkSettings/?api-version=2024-04-02" \ + --body '{ + "location": "", + "properties": { + "businessId": "", + "subnetId": "" + } + }' + ``` + ⚠️ **`businessId` is immutable.** If wrong, you must delete and recreate. + +5. Extract the `GitHubId` tag from the resource — this is the ID GitHub uses: + ```bash + az rest --method GET \ + --url "https://management.azure.com/subscriptions//resourceGroups//providers/GitHub.Network/networkSettings/?api-version=2024-04-02" \ + --query "tags.GitHubId" -o tsv + ``` + +#### d. Create GitHub network configuration + +Use the **`GitHubId` tag value** (NOT the Azure resource ID): + +```bash +# Enterprise scope: +gh api --method POST enterprises//network-configurations \ + -f name="" -f compute_service="actions" \ + -f network_settings_ids[]="" + +# Organization scope: +gh api --method POST orgs//settings/network-configurations \ + -f name="" -f compute_service="actions" \ + -f network_settings_ids[]="" +``` + +Save the returned `id` — needed for the runner group. + +#### e. Create runner group and hosted runner + +```bash +# Enterprise scope: +gh api --method POST enterprises//actions/runner-groups \ + -f name="" -f visibility="selected" \ + -F allows_public_repositories=false \ + -f network_configuration_id="" + +# Assign the org to the enterprise runner group: +gh api --method PUT enterprises//actions/runner-groups//organizations/ + +# For enterprise groups: also assign the repo at org level (inherited group ID): +gh api orgs//actions/runner-groups --jq '.runner_groups[] | select(.name=="") | .id' +gh api --method PUT orgs//actions/runner-groups//repositories/ +``` + +```bash +# Query available images and sizes: +gh api orgs//actions/hosted-runners/images/github-owned --jq '.images[] | {id, display_name, platform}' +gh api orgs//actions/hosted-runners/machine-sizes --jq '.machine_specs[:5] | .[] | {id, cpu_cores, memory_gb}' + +# Create hosted runner (image IDs are NUMERIC, sizes are like "4-core"): +echo '{"name":"","runner_group_id":,"platform":"linux-x64","image":{"id":"","source":"github"},"size":"4-core","maximum_runners":5}' | \ + gh api --method POST enterprises//actions/hosted-runners --input - +``` + +Wait for `status: "Ready"`: +```bash +gh api enterprises//actions/hosted-runners --jq '.runners[] | {name, status}' +``` + +#### f. Set variable and verify + +```bash +gh variable set GIT_APE_RUNNER_LABEL --repo / --body "" +gh workflow run git-ape-verify.yml --repo / +``` + +Confirm all steps pass — no custom image needed, GitHub provides everything. + +--- + +### Step 12b: Self-Hosted Runners (ACI / ACA / AKS) + +Self-hosted runners run in your Azure subscription. You manage compute, image, +scaling, and networking. + +1. **Ask the platform:** + ``` + Which Azure platform should host the runners? + - ACA — Azure Container Apps (event-driven, ephemeral, scale-to-zero) — RECOMMENDED + - ACI — Azure Container Instances (simplest; a handful of runners) + - AKS — Azure Kubernetes Service (Actions Runner Controller; large scale) + ``` + + **ACA is deployed as a Git-Ape deployment** (Git-Ape deploying Git-Ape), so the + runner infrastructure gets an architecture diagram, cost estimate, managed + deploy, and single-command destroy — see the ACA path directly below. **ACI** + and **AKS** use the imperative provisioning steps further down. + +#### ACA — deploy runners as a Git-Ape deployment (recommended) + +Instead of an imperative `az deployment group create`, scaffold the runner +infrastructure as a first-class Git-Ape deployment and deploy it through the +normal Git-Ape stack flow. The template is a subscription-scoped Deployment Stack, +so destroy is a single idempotent command and the PAT lives in Key Vault (never in +git or ARM parameters). + +1. **Scaffold the deployment artifact** into the working copy and set inputs: + ```bash + mkdir -p .azure/deployments/git-ape-runners + cp -R .github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/. \ + .azure/deployments/git-ape-runners/ + # set githubOwnerRepo (+ any overrides) — NEVER put the PAT here: + $EDITOR .azure/deployments/git-ape-runners/parameters.json + ``` + `template.json` creates, in one self-contained stack: the resource group, a + user-assigned identity, an ACR, an `AcrPull` role assignment, a Key Vault, a + `Key Vault Secrets User` role assignment, an ACA managed environment, and the + event-driven ACA Job. + +2. **Deploy the stack.** The first deploy runs on a public runner or locally, + because the private runner does not exist yet: + ```bash + /azure-stack-deploy git-ape-runners # local (VS Code / terminal) + ``` + In CI, open a PR that adds `.azure/deployments/git-ape-runners/`; the + `git-ape-deploy.yml` workflow deploys it on `ubuntu-latest` and writes + `state.json` plus the architecture/cost artifacts, exactly like any other + Git-Ape deployment. + +3. **Build & push the runner image** into the ACR the stack just created (the + stock `actions-runner` image lacks `az`/`gh`/`jq` and self-registration — see + the Dockerfile note in the imperative path below): + ```bash + ACR=$(jq -r '.acrLoginServer.value' .azure/deployments/git-ape-runners/state.json) + az acr build --registry "${ACR%%.*}" --image git-ape-runner:latest \ + --file .github/skills/git-ape-onboarding/templates/runners/Dockerfile \ + .github/skills/git-ape-onboarding/templates/runners/ --no-logs + ``` + +4. **Write the GitHub PAT into Key Vault** — never in git, ARM params, or chat + output. Collect a long-lived fine-grained PAT exactly as in the imperative + "Collect a GitHub PAT" step below (never a short-lived registration token): + ```bash + KV=$(jq -r '.keyVaultName.value' .azure/deployments/git-ape-runners/state.json) + az keyvault secret set --vault-name "$KV" --name github-pat \ + --value '' --output none + ``` + The ACA Job reads it at runtime through a Key Vault secret reference + (`keyVaultUrl` + user-assigned `identity`), enabled by the in-template + `Key Vault Secrets User` role assignment. + +5. **Point workflows at the runner** — after this, re-deploys of this very stack + run on it (the self-hosting loop): + ```bash + gh variable set GIT_APE_RUNNER_LABEL --repo / --body "git-ape-runner" + ``` + +6. **Destroy** with one command when no longer needed (tears down the whole stack + and purges the soft-deleted Key Vault): + ```bash + /azure-stack-destroy git-ape-runners + ``` + +See `templates/deployments/git-ape-runners/README.md` for the full walkthrough and +`templates/deployments/git-ape-runners/architecture.md` for the topology and +bootstrap sequence. + +#### ACI / AKS — imperative provisioning + +1. **Build the custom runner image using ACR Tasks (cloud build).** The base + `ghcr.io/actions/actions-runner:latest` (GitHub's official runner image) does + **NOT** include `az`, `gh`, or `jq`, and ships no registration entrypoint. + Workflows fail with `Unable to locate executable file: az` — and on ACI/ACA + the runner never registers — without a custom image. + + Always build via **ACR Tasks** (cloud build) — never local Docker. This + avoids Windows CRLF line-ending corruption of `entrypoint.sh` and eliminates + the need for a local Docker install. + ```bash + # Create ACR (one-time) — no --admin-enabled; use managed identity for pulls + az acr create --name --resource-group --location --sku Basic + + # Build and push image (runs in Azure, ~3 min, no local Docker needed) + # On Windows, add --no-logs to avoid a Unicode encoding crash in log streaming + az acr build --registry --image git-ape-runner:latest \ + --file ./templates/runners/Dockerfile ./templates/runners/ --no-logs + ``` + The `Dockerfile` at `./templates/runners/Dockerfile` extends the base runner + with all Git-Ape prerequisites (`az`, `gh`, `jq`, `git`) and an `entrypoint.sh` + that self-registers the runner on ACI/ACA (on AKS, ARC handles registration). + It includes a `sed` safety net that strips CRLF line endings from + `entrypoint.sh` at build time. + + After the build, verify the image exists: + ```bash + az acr repository list --name -o table + ``` + +2. **Create a managed identity and assign `AcrPull` role** for image pulls: + ```bash + # Create identity + az identity create --name id-git-ape-runner --resource-group --location + + # Get IDs + IDENTITY_ID=$(az identity show --name id-git-ape-runner --resource-group --query id -o tsv) + PRINCIPAL_ID=$(az identity show --name id-git-ape-runner --resource-group --query principalId -o tsv) + ACR_ID=$(az acr show --name --query id -o tsv) + + # Assign AcrPull role (may take 30–60s to propagate) + az role assignment create --assignee-object-id $PRINCIPAL_ID --assignee-principal-type ServicePrincipal \ + --role AcrPull --scope $ACR_ID + ``` + **Do NOT use ACR admin credentials** (`--admin-enabled true` + username/password). + Managed identity is the secure, recommended approach. + +3. **Collect a GitHub PAT from the user.** The ACA/ACI runner needs a + **long-lived GitHub Personal Access Token (PAT)** — NOT a short-lived + registration token from `POST /actions/runners/registration-token`. + Registration tokens expire in ~1 hour, but the KEDA `github-runner` scaler + continuously polls the Actions queue AND each ephemeral runner re-registers + on every scale-up, so a long-lived PAT is required. + + **Ask the user to create a PAT** before deploying: + ``` + The self-hosted runner needs a GitHub Personal Access Token (PAT) for + continuous queue polling and runner registration. + + Please create a fine-grained PAT at: + https://github.com/settings/tokens?type=beta + + Required permissions (scoped to the target repo): + - Actions: Read & Write + - Administration: Read & Write (for runner registration) + + Alternatively, a classic PAT with the `repo` scope works. + + Paste the token when prompted — it will only be passed to the deployment + and will not be stored or displayed. + ``` + + **Do NOT generate a registration token** via the GitHub API + (`POST repos///actions/runners/registration-token`). These are + short-lived (~1 hour) and will cause the runner to fail with a 401 error + once expired. The KEDA scaler and ephemeral runner registration both need + a token that does not expire. + + Never print the token value in chat output (see Safe-Execution Rules). + +4. **Deploy the runner infrastructure (ACI).** Use the ACI template + (`templates/runners/aci/template.json`) — ACA is covered by the Git-Ape-managed + path above. Pass the custom image, ACR server, managed identity, and + user-provided PAT: + ```bash + az deployment group create -g -f ./templates/runners/aci/template.json \ + -p runnerImage='.azurecr.io/git-ape-runner:latest' \ + acrServer='.azurecr.io' \ + userAssignedIdentityId=$IDENTITY_ID \ + githubOwnerRepo='/' \ + githubAccessToken='' + ``` + - The ACA template's `registries` block automatically uses identity-based + auth when both `acrServer` and `userAssignedIdentityId` are set. + - The GitHub PAT is the only secret — for production, store it in Key Vault + and reference it; for initial setup, pass it directly at deploy time. + Never inline it in a committed `parameters.json`. + - For private networking, set the subnet parameter (`subnetId` for ACI, + `infrastructureSubnetId` for ACA, or a VNet node pool for AKS). + - For AKS, use `helm install` instead of ARM. + - **Note:** The ACA managed environment may take 1–2 minutes to fully + provision. If deploying step-by-step (not via ARM template), wait for the + environment's `provisioningState` to reach `Succeeded` before creating the + job. + +5. **Set `minExecutions=1`** (recommended) so at least one runner is always + warm and visible in GitHub Settings. Without this, KEDA scale-from-zero can + take 1–3 minutes on cold start, during which GitHub shows "No runners + configured": + ```bash + az containerapp job update --name git-ape-runner --resource-group --min-executions 1 + ``` + Leave at `0` only if you prefer true scale-to-zero and can tolerate cold-start + delays. + +6. **Confirm the runner is online** in *GitHub → Settings → Actions → Runners* + with the `git-ape-runner` label. (With `minExecutions=1`, a runner should + appear within 30–60 seconds of deployment.) + +7. **Set the variable** so workflows target it (repo-wide or per environment): + ```bash + gh variable set GIT_APE_RUNNER_LABEL --repo / --body "git-ape-runner" + # per environment instead: + gh variable set GIT_APE_RUNNER_LABEL --repo / --env azure-deploy --body "git-ape-runner" + ``` + Clean fallback to GitHub-hosted runners is `gh variable delete GIT_APE_RUNNER_LABEL`. + +9. **Verify** by triggering `Git-Ape: Verify Setup` and confirming all steps + pass on the private runner (especially "Test OIDC login" which requires `az`). + +10. **Continuous drift detection** (`git-ape-drift.lock.yml`) is a compiled gh-aw + workflow and does NOT honor `GIT_APE_RUNNER_LABEL`. To move drift onto a + private runner, set `runs-on:` in the source `git-ape-drift.md` frontmatter + and recompile with `gh aw compile` — never hand-edit the `.lock.yml` (it + carries an integrity hash). The other four workflows need no recompile. + ## Mode: Enterprise Distribution (`.github-private`) Use this mode to distribute Git-Ape to **everyone on an organization's or @@ -432,7 +843,9 @@ OIDC, RBAC, environments, and workflows. 7. *(Optional)* Offer to onboard the drift detector workflow by provisioning `COPILOT_GITHUB_TOKEN` (Step 10 in playbook). Skip if the user does not want scheduled drift detection. 8. Ask compliance framework and enforcement mode preferences (Step 11 in playbook). 9. Update `copilot-instructions.md` with compliance preferences — or, if the file was skipped by the scaffold step, surface the preferences in chat for manual integration. -10. Summarize outcome (including scaffolded file counts) and suggest verification commands. +10. Ask the runner type (and platform/scope if private), and — if private runners are chosen — provision the full stack. For **hosted compute networking**: consolidate gh auth scopes → ask org vs enterprise scope → provision Azure VNet + subnet → create GitHub.Network/networkSettings → create network config + runner group + hosted runner → assign repo → set `GIT_APE_RUNNER_LABEL` (Step 12a). For **self-hosted ACA (recommended)**: scaffold `.azure/deployments/git-ape-runners/` → deploy the subscription-scoped stack via `/azure-stack-deploy git-ape-runners` (first deploy on `ubuntu-latest`) → `az acr build` the runner image into the stack's ACR → `az keyvault secret set` the PAT (never a registration token) → set `GIT_APE_RUNNER_LABEL` (Step 12b, ACA path) — Git-Ape deploying Git-Ape. For **self-hosted ACI/AKS**: ask the user for a GitHub PAT → ACR (no admin) + cloud build via ACR Tasks (`--no-logs` on Windows) + managed identity with `AcrPull` role + ACI deployment with identity-based registry auth using user-provided PAT + `minExecutions=1` + `GIT_APE_RUNNER_LABEL` (Step 12b, imperative path). +11. **Verify** by triggering `Git-Ape: Verify Setup` and confirming ALL steps pass on the private runner. +12. Summarize outcome (including scaffolded file counts and the chosen runner type) and suggest verification commands. ### Enterprise distribution @@ -445,6 +858,184 @@ OIDC, RBAC, environments, and workflows. ## Known Gotchas +### Self-hosted: registration tokens don't work for KEDA-based runners + +**Never use `POST repos///actions/runners/registration-token`** to +generate the `githubAccessToken` for ACA/ACI runners. Registration tokens are +short-lived (~1 hour) and expire silently. Once expired: +- The KEDA `github-runner` scaler can no longer poll the Actions queue +- Each ephemeral runner fails to register on scale-up with a **401 Unauthorized** +- Runners appear as `offline` in GitHub Settings + +The `githubAccessToken` parameter requires a **long-lived GitHub PAT** because: +1. KEDA continuously polls the GitHub API every 30 seconds to detect queued jobs +2. Each ephemeral runner re-registers itself on every scale-up event +3. Both operations need a token that outlives any single job + +**Fix:** Always **ask the user** to create a fine-grained PAT +(`https://github.com/settings/tokens?type=beta`) with **Actions (Read & Write)** +and **Administration (Read & Write)** permissions scoped to the target repo. A +classic PAT with the `repo` scope also works. Never generate a registration +token programmatically — it will always fail after ~1 hour. + +### Hosted compute: `network_settings_ids` expects the GitHubId tag, not the Azure resource ID + +When creating a GitHub network configuration, the `network_settings_ids` field +expects the **`GitHubId` tag value** (a SHA-256 hash assigned by GitHub to the +Azure `GitHub.Network/networkSettings` resource), NOT the Azure resource ID path. + +```bash +# ❌ WRONG — Azure resource ID +-f network_settings_ids[]="/subscriptions/.../providers/GitHub.Network/networkSettings/my-resource" + +# ✅ CORRECT — GitHubId tag value from the Azure resource +-f network_settings_ids[]="FA1AD85973374477AF8C49119ADEA731EFD4B9BD6B7764A8FCD6B036CBA796F3" +``` + +Extract the GitHubId after creating the Azure resource: +```bash +az rest --method GET \ + --url "https://management.azure.com/subscriptions//resourceGroups//providers/GitHub.Network/networkSettings/?api-version=2024-04-02" \ + --query "tags.GitHubId" -o tsv +``` + +### Hosted compute: `businessId` is immutable and scope-specific + +The `businessId` on `GitHub.Network/networkSettings` determines whether the +resource works at enterprise or organization scope: +- **Enterprise scope:** use the enterprise `databaseId` (query via GraphQL) +- **Organization scope:** use the org's numeric ID (query via REST `.id` field) + +If wrong, the GitHub API returns `"The business ID is invalid or does not match"`. +The property is **immutable** — you cannot update it; you must delete and recreate. + +### Hosted compute: repeated auth prompts from missing scopes + +The hosted compute provisioning flow requires **5 distinct GitHub token scopes** +(`admin:org`, `admin:enterprise`, `manage_runners:org`, `read:enterprise`, +`write:network_configurations`). If not collected upfront, each missing scope +triggers a separate `gh auth refresh` device-code flow. + +**Fix:** Always consolidate auth at the start of Step 12a: +```bash +gh auth refresh -h github.com -s admin:org,admin:enterprise,manage_runners:org,read:enterprise,write:network_configurations +``` + +### Hosted compute: image and size IDs are GitHub-specific + +The hosted runners API uses **numeric image IDs** (e.g., `"2295"` = Ubuntu 24.04) +and **GitHub-specific size IDs** (e.g., `"4-core"`, `"8-core"`), not Azure VM SKU +names or Ubuntu version strings. + +Always query available options first: +```bash +gh api orgs//actions/hosted-runners/images/github-owned --jq '.images[] | {id, display_name}' +gh api orgs//actions/hosted-runners/machine-sizes --jq '.machine_specs[:10] | .[] | {id, cpu_cores, memory_gb}' +``` + +### Default runner image lacks required tools (self-hosted only) + +The base image `ghcr.io/actions/actions-runner:latest` (GitHub's official runner) +is a **minimal** self-hosted runner — it does NOT include `az`, `gh`, or `jq`, and +ships no registration entrypoint. If you deploy without the custom image, the +runner never registers on ACI/ACA and workflows fail with: + +``` +Error: Unable to locate executable file: az +``` + +**Fix:** Always build and use the custom image from `./templates/runners/Dockerfile`. +The onboarding flow must: +1. Create an ACR (`az acr create` — no `--admin-enabled`) +2. Build the image via ACR Tasks (`az acr build --no-logs` on Windows) +3. Create a managed identity with `AcrPull` role on the ACR +4. Deploy the template with `acrServer`, `userAssignedIdentityId`, and `runnerImage` + +### KEDA scale-from-zero cold start + +With `minExecutions=0` (the default), KEDA's `github-runner` scaler polls the +GitHub Actions queue every 30 seconds. On a fresh deployment or after long idle +periods, the first job can wait 1–3 minutes before a runner spins up. During +this time: +- GitHub shows the job as "Waiting for a runner to pick up this job" +- The Settings → Runners page shows "No runners configured" (ephemeral runners + only register while executing) + +**Fix:** Set `minExecutions=1` to keep one runner always warm. This costs +~$30–50/month on the Consumption plan but eliminates cold-start delays and +ensures a runner is always visible in GitHub Settings. + +### Windows CRLF corrupts `entrypoint.sh` (self-hosted only) + +When the `Dockerfile` build context is uploaded from a Windows checkout (where +`git autocrlf` converts LF to CRLF), `entrypoint.sh` gets `\r\n` line endings. +Linux interprets the shebang as `#!/usr/bin/env bash\r`, failing with: + +``` +'bash\r': No such file or directory +``` + +The runner container starts but never registers, and all executions fail +immediately. + +**Fix (belt-and-suspenders):** +1. The `Dockerfile` includes a `sed -i 's/\r$//'` line after `COPY entrypoint.sh` + that strips CRLF at build time — this is always safe and is a no-op on clean + LF files. +2. Prefer **ACR Tasks** (cloud build) over local `docker build` — ACR Tasks run + in Linux and handle the context correctly. +3. If building locally on Windows, ensure `.gitattributes` marks `*.sh` as + `text eol=lf`, or run `dos2unix entrypoint.sh` before building. + +### `az acr build` crashes on Windows (Unicode encoding) + +On Windows, `az acr build` may crash while streaming build logs with: + +``` +UnicodeEncodeError: 'charmap' codec can't encode character '\u2192' +``` + +This is a known Azure CLI bug — the `colorama` library on Windows can't encode +Unicode characters (like `→`) in `apt-get` output. The build itself may or may +not have completed in Azure before the crash. + +**Fix:** Always use `--no-logs` when running `az acr build` on Windows: +```bash +az acr build --registry --image git-ape-runner:latest \ + --file ... ... --no-logs +``` +The build runs in Azure regardless; `--no-logs` just skips the local log +streaming. Verify success with `az acr repository list --name `. + +### ACA managed environment provisioning delay + +The `Microsoft.App/managedEnvironments` resource can take 1–2 minutes to +provision. If you create the ACA job immediately after the environment, the +deployment may fail with `ManagedEnvironmentNotProvisioned`. + +**Fix:** When deploying via ARM template (`az deployment group create`), the +`dependsOn` in the template handles ordering automatically. When deploying +step-by-step (e.g., `az containerapp env create` followed by +`az containerapp job create`), poll the environment status first: +```bash +az containerapp env show --name --resource-group \ + --query "properties.provisioningState" -o tsv +# Wait until "Succeeded" before creating the job +``` + +### Stale workflow files in target repos + +If the target repo was onboarded before the `GIT_APE_RUNNER_LABEL` pattern was +introduced, its workflow files may have hardcoded `runs-on: ubuntu-latest`. The +private runner will never pick up jobs because workflows don't request its label. + +**Fix:** The scaffold helper (`scaffold-repo.sh` / `.ps1`) skips existing files. +To update stale workflows, the agent must either: +1. Detect the stale pattern (`grep 'runs-on: ubuntu-latest'`) and offer to + update all 4 workflow files with the dynamic pattern, OR +2. Advise the user to manually replace `runs-on: ubuntu-latest` with + `runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }}` in each job. + ### GitHub Org Custom OIDC Subject Template (e.g. Azure org) Some GitHub organizations (notably the `Azure` org) override the default OIDC subject diff --git a/.github/skills/git-ape-onboarding/templates/copilot-instructions.md b/.github/skills/git-ape-onboarding/templates/copilot-instructions.md index a6fe66a..a943f2b 100644 --- a/.github/skills/git-ape-onboarding/templates/copilot-instructions.md +++ b/.github/skills/git-ape-onboarding/templates/copilot-instructions.md @@ -298,6 +298,76 @@ Create two GitHub environments for protection rules: - Required reviewers (recommended — destructive action) - Deployment branches: `main` only (triggered on PR merge) +## GitHub Actions Runners + +Git-Ape workflows run on **public GitHub-hosted runners by default** and can be +switched to **private self-hosted runners** in your Azure subscription with a +single repository variable — no workflow edits required. + +### The runner switch: `GIT_APE_RUNNER_LABEL` + +Every scaffolded workflow (`git-ape-plan`, `-deploy`, `-destroy`, `-verify`) +resolves its runner like this: + +```yaml +runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} +``` + +| `GIT_APE_RUNNER_LABEL` | Effect | +|------------------------|--------| +| **unset** (default) | Jobs run on GitHub-hosted `ubuntu-latest`. No infrastructure. | +| set to a label (default `git-ape-runner`) | Jobs target your self-hosted runners registered with that label. | + +```bash +# Switch to private runners (after they are provisioned and online) +gh variable set GIT_APE_RUNNER_LABEL --repo / --body "git-ape-runner" +# Clean fallback to GitHub-hosted runners +gh variable delete GIT_APE_RUNNER_LABEL --repo / +``` + +In multi-environment mode, set the variable per environment (`--env azure-deploy`) +so only the environments that need private runners use them. + +### Runner types and platforms + +- **Public GitHub-hosted** — default; nothing to provision. +- **Self-hosted (subscription)** — runners are Azure resources with outbound + internet. Control over image, region, and identity without a VNet. +- **VNet-injected** — runners run inside a subnet you manage, for private + connectivity to Azure resources (private endpoints, no public egress except + to GitHub). + +Private runners are provisioned from on-demand reference IaC shipped with the +onboarding skill at `templates/runners/`: + +| Platform | What it provisions | +|----------|--------------------| +| **ACI** | ARM `template.json` — a container group running an ephemeral runner. | +| **ACA** | ARM `template.json` — a KEDA `github-runner`-scaled Container Apps Job (scale-to-zero). | +| **AKS** | Actions Runner Controller (ARC) via Helm `values.yaml` (cluster created with a standard Git-Ape ARM deployment). | + +These templates are **not** auto-scaffolded — the public bootstrap stays the +default. Run `/git-ape-onboarding` (Runner Selection step) to choose and +provision them. + +### Runner security model + +- **Azure access uses a user-assigned managed identity, never stored keys.** +- **The GitHub registration credential is the only secret** — source it from + Key Vault (`securestring` parameter or pre-created Kubernetes secret), never + inline it in a committed `parameters.json` or `values.yaml`. +- **Ephemeral runners by default** — one job per registration, so no state leaks + between deployments. +- The runner label (`git-ape-runner`) must equal `GIT_APE_RUNNER_LABEL`. + +### Drift workflow caveat + +`git-ape-drift.lock.yml` is a compiled GitHub Agentic Workflow (gh-aw) and does +**not** honor `GIT_APE_RUNNER_LABEL`. To run continuous drift detection on a +private runner, set `runs-on:` in the source `git-ape-drift.md` frontmatter and +recompile with `gh aw compile` — never hand-edit the `.lock.yml` (it carries an +integrity hash). + ## Security Baseline - Enable HTTPS-only for all web-facing resources diff --git a/.github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/README.md b/.github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/README.md new file mode 100644 index 0000000..2e27a06 --- /dev/null +++ b/.github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/README.md @@ -0,0 +1,93 @@ +# git-ape-runners — Git-Ape deploying Git-Ape + +This is a **Git-Ape deployment artifact** for the private GitHub Actions runners +that run Git-Ape's own workflows. Instead of provisioning runners with imperative +`az deployment group create`, this folder is scaffolded into your working copy at +`.azure/deployments/git-ape-runners/` and deployed through the normal Git-Ape +flow — so you get an **architecture diagram, cost estimate, managed deploy, and +single-command destroy** for the runner infrastructure itself. + +| File | Purpose | +|------|---------| +| `template.json` | Subscription-scoped Deployment Stack: RG + nested inner deployment (UAMI, ACR, AcrPull, Key Vault, KV Secrets User, ACA managed env, ACA job) | +| `parameters.json` | Non-secret parameters. Set `githubOwnerRepo`. **Never** put the PAT here. | +| `architecture.md` | Mermaid topology + bootstrap sequence | +| `metadata.json` | Git-Ape deployment metadata (`deploymentId: git-ape-runners`) | + +## Why this exists + +The runner infra is the one part of onboarding that is pure Azure IaC, so it is +the natural thing to hand to Git-Ape. What stays imperative (and why): + +- **Entra app + OIDC + subscription RBAC** for the deploy identity — this is the + identity Git-Ape's own OIDC login uses, so it must exist *before* Git-Ape can + deploy anything (circular dependency). +- **GitHub environments / secrets / variables** — GitHub API, not ARM. +- **`az acr build`** — a build action, not IaC. Runs *after* the stack creates + the ACR. +- **AKS runners** — stay Helm / Actions Runner Controller managed. + +## Deploy (Git-Ape flow) + +> Prereq: onboarding has already created the Entra app, OIDC, RBAC, and GitHub +> environments/secrets. This deployment only provisions the runner compute. + +1. **Set `githubOwnerRepo`** in `parameters.json` (e.g. `Azure/git-ape`). Adjust + `runnerScope`, `location`, `maxRunners`, `minRunners` as needed. Defaults for + `acrName` / `keyVaultName` are deterministic and globally unique. + +2. **Deploy the stack.** The *first* deploy runs on a public runner or locally, + because the private runner doesn't exist yet. + + Local: + ```bash + /azure-stack-deploy git-ape-runners + ``` + CI: merge a PR that adds `.azure/deployments/git-ape-runners/` — the + `git-ape-deploy.yml` workflow deploys it (keep `GIT_APE_RUNNER_LABEL` unset or + on `ubuntu-latest` for this first run). + +3. **Build & push the runner image** into the ACR the stack just created + (the stock `actions-runner` image lacks `az`/`gh`/`jq` and a registration + entrypoint — see `../../runners/Dockerfile`): + ```bash + ACR=$(jq -r '.acrLoginServer.value' .azure/deployments/git-ape-runners/state.json 2>/dev/null || echo '.azurecr.io') + az acr build --registry "${ACR%%.*}" --image git-ape-runner:latest \ + --file ../../runners/Dockerfile ../../runners/ --no-logs + ``` + +4. **Set the GitHub PAT into Key Vault** (never committed, never in ARM params). + Use a fine-grained PAT with Actions + Administration (Read & Write), or a + classic PAT with `repo` scope: + ```bash + KV=$(jq -r '.keyVaultName.value' .azure/deployments/git-ape-runners/state.json) + az keyvault secret set --vault-name "$KV" --name github-pat --value "" --output none + ``` + +5. **Point Git-Ape workflows at the runner:** + ```bash + gh variable set GIT_APE_RUNNER_LABEL --repo / --body "git-ape-runner" + ``` + From here, every Git-Ape deploy — including re-deploys of *this* stack — runs + on the private runner. That is Git-Ape deploying Git-Ape. + +## Destroy + +```bash +/azure-stack-destroy git-ape-runners +``` +Runs `az stack sub delete --action-on-unmanage deleteAll` (removes RG, ACA job, +environment, ACR, identity, role assignments, Key Vault in one call) and purges +the soft-deleted Key Vault. Then clear the variable to fall back to hosted +runners: `gh variable delete GIT_APE_RUNNER_LABEL`. + +## Notes + +- **Key Vault** uses RBAC authorization, soft-delete on, and purge protection + **off** so the destroy flow can fully purge it between deploy/destroy cycles. +- **`minRunners: 1`** keeps one runner warm and visible in GitHub (no cold-start + gap). Set `0` for true scale-to-zero. +- The template validates with `az deployment sub validate` and deploys with + `az stack sub create` — identical to every other Git-Ape deployment. +- ACI remains available as a raw template under `../../runners/aci/` for users + who don't want the managed-stack flow. diff --git a/.github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/architecture.md b/.github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/architecture.md new file mode 100644 index 0000000..6fea003 --- /dev/null +++ b/.github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/architecture.md @@ -0,0 +1,78 @@ +# Architecture — Git-Ape self-hosted runners + +This deployment provisions the private GitHub Actions runner that runs **future +Git-Ape workflows** — i.e. Git-Ape deploying Git-Ape. It is a single +subscription-scoped **Azure Deployment Stack** (`template.json`), so it gets an +architecture diagram, cost estimate, managed deploy, and single-command destroy +like any other Git-Ape deployment. + +## Resource topology + +```mermaid +%%{init: {'theme':'base','themeVariables':{'fontSize':'13px'}}}%% +flowchart TB + subgraph SUB["Subscription scope — Deployment Stack: git-ape-runners"] + RG["Resource Group
rg-git-ape-runners"] + subgraph INNER["Nested inner-scope deployment (rg-git-ape-runners)"] + UAMI["User-Assigned Managed Identity
id-git-ape-runner"] + ACR["Container Registry (Basic)
holds git-ape-runner:latest"] + KV["Key Vault (RBAC)
secret: github-pat"] + ENV["ACA Managed Environment
git-ape-runner-env"] + JOB["ACA Job (Event/KEDA github-runner)
ephemeral runners, scale-to-zero"] + RA1(["roleAssignment: AcrPull
UAMI → ACR"]) + RA2(["roleAssignment: Key Vault Secrets User
UAMI → KV"]) + end + end + + RG --> INNER + UAMI -. AcrPull .-> ACR + UAMI -. Secrets User .-> KV + RA1 --- UAMI + RA1 --- ACR + RA2 --- UAMI + RA2 --- KV + JOB -->|environmentId| ENV + JOB -->|image pull via identity| ACR + JOB -->|secret ref via identity| KV + JOB -->|registers ephemeral runners| GH["GitHub Actions
label: git-ape-runner"] +``` + +## Bootstrap ordering (self-hosting) + +Git-Ape workflows run on `${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }}`. +The **first** deploy of this stack can't run on the private runner (it doesn't +exist yet), so it runs on a public runner or locally. Once it's up and +`GIT_APE_RUNNER_LABEL` is set, every later Git-Ape run — **including updates to +this runner stack itself** — executes on the private runner it created. + +```mermaid +%%{init: {'theme':'base','themeVariables':{'fontSize':'13px'}}}%% +sequenceDiagram + participant U as Operator / Onboarding + participant GA as Git-Ape (stack deploy) + participant AZ as Azure (rg-git-ape-runners) + participant GH as GitHub Actions + + U->>GA: 1. Deploy git-ape-runners (on ubuntu-latest / local) + GA->>AZ: az stack sub create — RG, UAMI, ACR, KV, ACA job + U->>AZ: 2. az acr build (push git-ape-runner:latest) + U->>AZ: 3. az keyvault secret set --name github-pat + AZ->>GH: 4. Job registers ephemeral runner (label git-ape-runner) + U->>GH: 5. gh variable set GIT_APE_RUNNER_LABEL=git-ape-runner + Note over GA,GH: 6. All later Git-Ape deploys run on the private runner +``` + +## Secrets + +The GitHub PAT is **never** in git, ARM parameters, or deployment history. The +stack creates an empty Key Vault; the PAT is written post-deploy with +`az keyvault secret set`. The ACA Job reads it at runtime through a Key Vault +secret reference (`keyVaultUrl` + user-assigned `identity`), which requires the +in-template **Key Vault Secrets User** role assignment. + +## Destroy + +`/azure-stack-destroy git-ape-runners` (or the `git-ape-destroy.yml` flow) runs +`az stack sub delete --action-on-unmanage deleteAll`, removing the RG, ACA job, +environment, ACR, identity, role assignments, and Key Vault in one call, then +purges the soft-deleted Key Vault (purge protection is intentionally off). diff --git a/.github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/metadata.json b/.github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/metadata.json new file mode 100644 index 0000000..7d451ca --- /dev/null +++ b/.github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/metadata.json @@ -0,0 +1,14 @@ +{ + "deploymentId": "git-ape-runners", + "timestamp": "", + "status": "initialized", + "scope": "subscription", + "region": "eastus", + "project": "git-ape", + "environment": "prod", + "deployMethod": "stack", + "resourceGroup": "rg-git-ape-runners", + "resourceGroups": ["rg-git-ape-runners"], + "description": "Git-Ape self-hosted GitHub Actions runners (ACA event-driven job) managed as a first-class Git-Ape deployment. Provides the private runner that executes future Git-Ape workflows — Git-Ape deploying Git-Ape.", + "createdBy": "git-ape-onboarding" +} diff --git a/.github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/parameters.json b/.github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/parameters.json new file mode 100644 index 0000000..d2ae5bf --- /dev/null +++ b/.github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/parameters.json @@ -0,0 +1,47 @@ +{ + "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#", + "contentVersion": "1.0.0.0", + "parameters": { + "location": { + "value": "eastus" + }, + "resourceGroupName": { + "value": "rg-git-ape-runners" + }, + "runnerName": { + "value": "git-ape-runner" + }, + "githubOwnerRepo": { + "value": "/" + }, + "runnerScope": { + "value": "repo" + }, + "runnerLabels": { + "value": "git-ape-runner" + }, + "maxRunners": { + "value": 10 + }, + "minRunners": { + "value": 1 + }, + "cpuCores": { + "value": "1.0" + }, + "memorySize": { + "value": "2Gi" + }, + "infrastructureSubnetId": { + "value": "" + }, + "tags": { + "value": { + "ManagedBy": "git-ape", + "Component": "git-ape-runners", + "Environment": "prod", + "Project": "git-ape" + } + } + } +} diff --git a/.github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/template.json b/.github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/template.json new file mode 100644 index 0000000..2a364c5 --- /dev/null +++ b/.github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/template.json @@ -0,0 +1,494 @@ +{ + "$schema": "https://schema.management.azure.com/schemas/2018-05-01/subscriptionDeploymentTemplate.json#", + "contentVersion": "1.0.0.0", + "metadata": { + "_generator": "git-ape-onboarding", + "description": "Git-Ape self-hosted runner infrastructure as a first-class Git-Ape deployment. Subscription-scoped Deployment Stack template that creates the resource group and, via a nested inner-scope deployment, a fully self-contained runner stack: a user-assigned managed identity, an Azure Container Registry (for the custom runner image), an AcrPull role assignment, a Key Vault (for the GitHub PAT), a Key Vault Secrets User role assignment, an Azure Container Apps managed environment, and an event-driven Container Apps Job that scales ephemeral GitHub Actions runners with the KEDA 'github-runner' scaler (scale-to-zero). The GitHub PAT is referenced from Key Vault (never inline, never in git); set it post-deploy with `az keyvault secret set`. Deploy through the Git-Ape stack flow so you get an architecture diagram, cost estimate, managed deploy, and single-command destroy.\n\nDeploy (local): /azure-stack-deploy git-ape-runners\nDeploy (CI): merge a PR that adds .azure/deployments/git-ape-runners/ -> git-ape-deploy.yml\nDestroy: /azure-stack-destroy git-ape-runners" + }, + "parameters": { + "location": { + "type": "string", + "defaultValue": "eastus", + "metadata": { + "description": "Azure region for the resource group and all runner resources." + } + }, + "resourceGroupName": { + "type": "string", + "defaultValue": "rg-git-ape-runners", + "metadata": { + "description": "Resource group that will own the runner stack. Created by this template." + } + }, + "runnerName": { + "type": "string", + "defaultValue": "git-ape-runner", + "metadata": { + "description": "Base name for the managed environment and the Container Apps Job." + } + }, + "githubOwnerRepo": { + "type": "string", + "metadata": { + "description": "GitHub target in / form for repo-scoped runners, or the org/owner name for org-scoped runners." + } + }, + "runnerScope": { + "type": "string", + "defaultValue": "repo", + "allowedValues": [ + "repo", + "org" + ], + "metadata": { + "description": "Runner registration scope." + } + }, + "runnerLabels": { + "type": "string", + "defaultValue": "git-ape-runner", + "metadata": { + "description": "Comma-separated runner labels. Must include the value used for the GIT_APE_RUNNER_LABEL workflow variable." + } + }, + "acrName": { + "type": "string", + "defaultValue": "[toLower(format('acrgitape{0}', uniqueString(subscription().id, parameters('resourceGroupName'))))]", + "metadata": { + "description": "Azure Container Registry name (5-50 alphanumeric, globally unique). Holds the custom runner image built with `az acr build`. Defaults to a deterministic unique name." + } + }, + "keyVaultName": { + "type": "string", + "defaultValue": "[format('kv-gitape-{0}', substring(uniqueString(subscription().id, parameters('resourceGroupName')), 0, 8))]", + "metadata": { + "description": "Key Vault name (3-24 chars) that stores the GitHub PAT secret 'github-pat'. Defaults to a deterministic unique name." + } + }, + "runnerImage": { + "type": "string", + "defaultValue": "[format('{0}.azurecr.io/git-ape-runner:latest', parameters('acrName'))]", + "metadata": { + "description": "Runner container image. Defaults to the custom image in the ACR created by this template. Build it with `az acr build` after the stack deploys (see README)." + } + }, + "maxRunners": { + "type": "int", + "defaultValue": 10, + "metadata": { + "description": "Maximum concurrent runner executions (KEDA maxExecutions)." + } + }, + "minRunners": { + "type": "int", + "defaultValue": 0, + "metadata": { + "description": "Minimum warm runner executions (KEDA minExecutions). Set to 1 to keep one runner always visible in GitHub and avoid cold-start delays; 0 for true scale-to-zero." + } + }, + "cpuCores": { + "type": "string", + "defaultValue": "1.0", + "metadata": { + "description": "vCPU for the runner container." + } + }, + "memorySize": { + "type": "string", + "defaultValue": "2Gi", + "metadata": { + "description": "Memory for the runner container (e.g. 2Gi)." + } + }, + "infrastructureSubnetId": { + "type": "string", + "defaultValue": "", + "metadata": { + "description": "Resource ID of an infrastructure subnet for VNet injection of the ACA environment. Leave empty for a non-VNet environment (Consumption needs a /23 or larger)." + } + }, + "tags": { + "type": "object", + "defaultValue": { + "ManagedBy": "git-ape", + "Component": "git-ape-runners" + }, + "metadata": { + "description": "Tags applied to the resource group and all runner resources." + } + } + }, + "variables": { + "runnerIdentityName": "[format('id-{0}', parameters('runnerName'))]" + }, + "resources": [ + { + "type": "Microsoft.Resources/resourceGroups", + "apiVersion": "2021-04-01", + "name": "[parameters('resourceGroupName')]", + "location": "[parameters('location')]", + "tags": "[parameters('tags')]" + }, + { + "type": "Microsoft.Resources/deployments", + "apiVersion": "2022-09-01", + "name": "git-ape-runners-inner", + "resourceGroup": "[parameters('resourceGroupName')]", + "dependsOn": [ + "[subscriptionResourceId('Microsoft.Resources/resourceGroups', parameters('resourceGroupName'))]" + ], + "properties": { + "mode": "Incremental", + "expressionEvaluationOptions": { + "scope": "inner" + }, + "parameters": { + "location": { + "value": "[parameters('location')]" + }, + "runnerName": { + "value": "[parameters('runnerName')]" + }, + "runnerIdentityName": { + "value": "[variables('runnerIdentityName')]" + }, + "githubOwnerRepo": { + "value": "[parameters('githubOwnerRepo')]" + }, + "runnerScope": { + "value": "[parameters('runnerScope')]" + }, + "runnerLabels": { + "value": "[parameters('runnerLabels')]" + }, + "acrName": { + "value": "[parameters('acrName')]" + }, + "keyVaultName": { + "value": "[parameters('keyVaultName')]" + }, + "runnerImage": { + "value": "[parameters('runnerImage')]" + }, + "maxRunners": { + "value": "[parameters('maxRunners')]" + }, + "minRunners": { + "value": "[parameters('minRunners')]" + }, + "cpuCores": { + "value": "[parameters('cpuCores')]" + }, + "memorySize": { + "value": "[parameters('memorySize')]" + }, + "infrastructureSubnetId": { + "value": "[parameters('infrastructureSubnetId')]" + }, + "tags": { + "value": "[parameters('tags')]" + } + }, + "template": { + "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#", + "contentVersion": "1.0.0.0", + "parameters": { + "location": { + "type": "string" + }, + "runnerName": { + "type": "string" + }, + "runnerIdentityName": { + "type": "string" + }, + "githubOwnerRepo": { + "type": "string" + }, + "runnerScope": { + "type": "string" + }, + "runnerLabels": { + "type": "string" + }, + "acrName": { + "type": "string" + }, + "keyVaultName": { + "type": "string" + }, + "runnerImage": { + "type": "string" + }, + "maxRunners": { + "type": "int" + }, + "minRunners": { + "type": "int" + }, + "cpuCores": { + "type": "string" + }, + "memorySize": { + "type": "string" + }, + "infrastructureSubnetId": { + "type": "string" + }, + "tags": { + "type": "object" + } + }, + "variables": { + "isOrgScope": "[equals(parameters('runnerScope'), 'org')]", + "isVnet": "[not(empty(parameters('infrastructureSubnetId')))]", + "envName": "[format('{0}-env', parameters('runnerName'))]", + "acrPullRoleId": "7f951dda-4ed3-4680-a7ca-43fe172d538d", + "keyVaultSecretsUserRoleId": "4633458b-17de-408a-b874-0445c86b69e6", + "identityResourceId": "[resourceId('Microsoft.ManagedIdentity/userAssignedIdentities', parameters('runnerIdentityName'))]", + "ownerName": "[if(variables('isOrgScope'), parameters('githubOwnerRepo'), first(split(parameters('githubOwnerRepo'), '/')))]", + "repoName": "[if(variables('isOrgScope'), '', last(split(parameters('githubOwnerRepo'), '/')))]", + "scalerMetadataBase": { + "owner": "[variables('ownerName')]", + "runnerScope": "[parameters('runnerScope')]", + "labels": "[parameters('runnerLabels')]", + "targetWorkflowQueueLength": "1", + "applicationID": "" + }, + "scalerMetadata": "[if(variables('isOrgScope'), variables('scalerMetadataBase'), union(variables('scalerMetadataBase'), createObject('repos', variables('repoName'))))]", + "scopeEnv": "[if(variables('isOrgScope'), createArray(createObject('name', 'ORG_NAME', 'value', parameters('githubOwnerRepo'))), createArray(createObject('name', 'REPO_URL', 'value', concat('https://github.com/', parameters('githubOwnerRepo')))))]", + "baseEnv": [ + { + "name": "RUNNER_SCOPE", + "value": "[parameters('runnerScope')]" + }, + { + "name": "RUNNER_NAME_PREFIX", + "value": "[parameters('runnerName')]" + }, + { + "name": "LABELS", + "value": "[parameters('runnerLabels')]" + }, + { + "name": "EPHEMERAL", + "value": "true" + }, + { + "name": "DISABLE_AUTO_UPDATE", + "value": "true" + }, + { + "name": "ACCESS_TOKEN", + "secretRef": "github-pat" + } + ], + "containerEnv": "[concat(variables('scopeEnv'), variables('baseEnv'))]", + "vnetConfiguration": "[if(variables('isVnet'), createObject('infrastructureSubnetId', parameters('infrastructureSubnetId'), 'internal', false()), json('null'))]" + }, + "resources": [ + { + "type": "Microsoft.ManagedIdentity/userAssignedIdentities", + "apiVersion": "2023-01-31", + "name": "[parameters('runnerIdentityName')]", + "location": "[parameters('location')]", + "tags": "[parameters('tags')]" + }, + { + "type": "Microsoft.ContainerRegistry/registries", + "apiVersion": "2023-07-01", + "name": "[parameters('acrName')]", + "location": "[parameters('location')]", + "tags": "[parameters('tags')]", + "sku": { + "name": "Basic" + }, + "properties": { + "adminUserEnabled": false, + "anonymousPullEnabled": false + } + }, + { + "type": "Microsoft.Authorization/roleAssignments", + "apiVersion": "2022-04-01", + "name": "[guid(resourceId('Microsoft.ContainerRegistry/registries', parameters('acrName')), variables('identityResourceId'), variables('acrPullRoleId'))]", + "scope": "[format('Microsoft.ContainerRegistry/registries/{0}', parameters('acrName'))]", + "dependsOn": [ + "[resourceId('Microsoft.ContainerRegistry/registries', parameters('acrName'))]", + "[variables('identityResourceId')]" + ], + "properties": { + "roleDefinitionId": "[subscriptionResourceId('Microsoft.Authorization/roleDefinitions', variables('acrPullRoleId'))]", + "principalId": "[reference(variables('identityResourceId'), '2023-01-31').principalId]", + "principalType": "ServicePrincipal" + } + }, + { + "type": "Microsoft.KeyVault/vaults", + "apiVersion": "2023-07-01", + "name": "[parameters('keyVaultName')]", + "location": "[parameters('location')]", + "tags": "[parameters('tags')]", + "properties": { + "tenantId": "[subscription().tenantId]", + "sku": { + "family": "A", + "name": "standard" + }, + "enableRbacAuthorization": true, + "enableSoftDelete": true, + "softDeleteRetentionInDays": 7, + "publicNetworkAccess": "Enabled" + } + }, + { + "type": "Microsoft.Authorization/roleAssignments", + "apiVersion": "2022-04-01", + "name": "[guid(resourceId('Microsoft.KeyVault/vaults', parameters('keyVaultName')), variables('identityResourceId'), variables('keyVaultSecretsUserRoleId'))]", + "scope": "[format('Microsoft.KeyVault/vaults/{0}', parameters('keyVaultName'))]", + "dependsOn": [ + "[resourceId('Microsoft.KeyVault/vaults', parameters('keyVaultName'))]", + "[variables('identityResourceId')]" + ], + "properties": { + "roleDefinitionId": "[subscriptionResourceId('Microsoft.Authorization/roleDefinitions', variables('keyVaultSecretsUserRoleId'))]", + "principalId": "[reference(variables('identityResourceId'), '2023-01-31').principalId]", + "principalType": "ServicePrincipal" + } + }, + { + "type": "Microsoft.App/managedEnvironments", + "apiVersion": "2024-03-01", + "name": "[variables('envName')]", + "location": "[parameters('location')]", + "tags": "[parameters('tags')]", + "properties": { + "vnetConfiguration": "[variables('vnetConfiguration')]" + } + }, + { + "type": "Microsoft.App/jobs", + "apiVersion": "2024-03-01", + "name": "[parameters('runnerName')]", + "location": "[parameters('location')]", + "tags": "[parameters('tags')]", + "identity": { + "type": "UserAssigned", + "userAssignedIdentities": { + "[variables('identityResourceId')]": {} + } + }, + "dependsOn": [ + "[resourceId('Microsoft.App/managedEnvironments', variables('envName'))]", + "[resourceId('Microsoft.KeyVault/vaults', parameters('keyVaultName'))]", + "[resourceId('Microsoft.ContainerRegistry/registries', parameters('acrName'))]", + "[guid(resourceId('Microsoft.KeyVault/vaults', parameters('keyVaultName')), variables('identityResourceId'), variables('keyVaultSecretsUserRoleId'))]", + "[guid(resourceId('Microsoft.ContainerRegistry/registries', parameters('acrName')), variables('identityResourceId'), variables('acrPullRoleId'))]" + ], + "properties": { + "environmentId": "[resourceId('Microsoft.App/managedEnvironments', variables('envName'))]", + "configuration": { + "triggerType": "Event", + "replicaTimeout": 1800, + "replicaRetryLimit": 1, + "registries": [ + { + "server": "[format('{0}.azurecr.io', parameters('acrName'))]", + "identity": "[variables('identityResourceId')]" + } + ], + "secrets": [ + { + "name": "github-pat", + "keyVaultUrl": "[format('{0}secrets/github-pat', reference(resourceId('Microsoft.KeyVault/vaults', parameters('keyVaultName')), '2023-07-01').vaultUri)]", + "identity": "[variables('identityResourceId')]" + } + ], + "eventTriggerConfig": { + "parallelism": 1, + "replicaCompletionCount": 1, + "scale": { + "minExecutions": "[parameters('minRunners')]", + "maxExecutions": "[parameters('maxRunners')]", + "pollingInterval": 30, + "rules": [ + { + "name": "github-runner", + "type": "github-runner", + "metadata": "[variables('scalerMetadata')]", + "auth": [ + { + "secretRef": "github-pat", + "triggerParameter": "personalAccessToken" + } + ] + } + ] + } + } + }, + "template": { + "containers": [ + { + "name": "[parameters('runnerName')]", + "image": "[parameters('runnerImage')]", + "env": "[variables('containerEnv')]", + "resources": { + "cpu": "[json(parameters('cpuCores'))]", + "memory": "[parameters('memorySize')]" + } + } + ] + } + } + } + ], + "outputs": { + "runnerJobId": { + "type": "string", + "value": "[resourceId('Microsoft.App/jobs', parameters('runnerName'))]" + }, + "acrLoginServer": { + "type": "string", + "value": "[format('{0}.azurecr.io', parameters('acrName'))]" + }, + "keyVaultName": { + "type": "string", + "value": "[parameters('keyVaultName')]" + }, + "identityClientId": { + "type": "string", + "value": "[reference(variables('identityResourceId'), '2023-01-31').clientId]" + }, + "runnerLabel": { + "type": "string", + "value": "[parameters('runnerLabels')]" + } + } + } + } + } + ], + "outputs": { + "resourceGroupName": { + "type": "string", + "value": "[parameters('resourceGroupName')]" + }, + "acrLoginServer": { + "type": "string", + "value": "[reference('git-ape-runners-inner').outputs.acrLoginServer.value]" + }, + "keyVaultName": { + "type": "string", + "value": "[reference('git-ape-runners-inner').outputs.keyVaultName.value]" + }, + "runnerJobId": { + "type": "string", + "value": "[reference('git-ape-runners-inner').outputs.runnerJobId.value]" + }, + "runnerLabel": { + "type": "string", + "value": "[reference('git-ape-runners-inner').outputs.runnerLabel.value]" + } + } +} diff --git a/.github/skills/git-ape-onboarding/templates/runners/Dockerfile b/.github/skills/git-ape-onboarding/templates/runners/Dockerfile new file mode 100644 index 0000000..1792567 --- /dev/null +++ b/.github/skills/git-ape-onboarding/templates/runners/Dockerfile @@ -0,0 +1,66 @@ +# Git-Ape self-hosted runner image +# +# Extends the official GitHub Actions runner with the tools Git-Ape workflows need +# plus a registration entrypoint so the runner self-registers on standalone +# container hosts (ACI, ACA Jobs): +# - az (Azure CLI) +# - gh (GitHub CLI) +# - jq (JSON processor) +# - git (already in base image) +# - entrypoint.sh (PAT -> registration-token exchange + config.sh/run.sh) +# +# Base image: ghcr.io/actions/actions-runner (GitHub's official runner image, +# Ubuntu-based). It includes the runner binary but does NOT include az, gh, or jq, +# and ships no registration entrypoint. Without this custom image, container hosts +# never register a runner and workflows fail with +# "Unable to locate executable file: az". +# +# Build with ACR Tasks (no local Docker required): +# az acr build --registry --image git-ape-runner:latest . +# +# Or locally: +# docker build -t git-ape-runner:latest . + +FROM ghcr.io/actions/actions-runner:latest + +USER root + +# Install Azure CLI +RUN curl -sL https://aka.ms/InstallAzureCLIDeb | bash + +# Install GitHub CLI +RUN curl -fsSL https://cli.github.com/packages/githubcli-archive-keyring.gpg \ + | dd of=/usr/share/keyrings/githubcli-archive-keyring.gpg \ + && chmod go+r /usr/share/keyrings/githubcli-archive-keyring.gpg \ + && echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" \ + | tee /etc/apt/sources.list.d/github-cli-stable.list > /dev/null \ + && apt-get update \ + && apt-get install -y --no-install-recommends gh \ + && apt-get clean && rm -rf /var/lib/apt/lists/* + +# Install jq plus curl/ca-certificates used by the registration entrypoint +# (git is already included in the base image). +RUN apt-get update \ + && apt-get install -y --no-install-recommends jq curl ca-certificates \ + && apt-get clean && rm -rf /var/lib/apt/lists/* + +# Registration entrypoint (self-registers the runner on ACI/ACA; bypassed on +# AKS, where ARC overrides the container command). +COPY entrypoint.sh /home/runner/entrypoint.sh +# Strip Windows CRLF line endings if present. When the build context is +# uploaded from a Windows checkout (git autocrlf) or via `az acr build` from a +# Windows host, the shebang becomes "#!/usr/bin/env bash\r" and Linux returns +# "bash\r: No such file or directory". This sed is a no-op on clean LF files. +RUN sed -i 's/\r$//' /home/runner/entrypoint.sh && chmod +x /home/runner/entrypoint.sh + +# Switch back to the runner user +USER runner +WORKDIR /home/runner + +# Verify all required tools are present +RUN az version --output table \ + && gh --version \ + && jq --version \ + && git --version + +ENTRYPOINT ["/home/runner/entrypoint.sh"] diff --git a/.github/skills/git-ape-onboarding/templates/runners/README.md b/.github/skills/git-ape-onboarding/templates/runners/README.md new file mode 100644 index 0000000..9aae0c3 --- /dev/null +++ b/.github/skills/git-ape-onboarding/templates/runners/README.md @@ -0,0 +1,440 @@ +# Git-Ape Private Runner Templates + +These are **reference templates** for provisioning private GitHub Actions +runners that execute the Git-Ape deployment workflows (`git-ape-plan`, +`-deploy`, `-destroy`, `-verify`) with private network connectivity. + +Git-Ape supports two private runner strategies: + +| Strategy | Who manages compute? | Infrastructure you manage | Best for | +|----------|---------------------|--------------------------|----------| +| **Hosted compute networking** | GitHub | Azure VNet + subnet only | Private connectivity with zero runner management | +| **Self-hosted runners** | You | Full runner stack (ACI/ACA/AKS + image + scaling) | Custom images, air-gapped, compliance constraints | + +> **Bootstrap model: Start on public runners, switch to private later — with one variable.** + +## The runner switch: `GIT_APE_RUNNER_LABEL` + +Every scaffolded Git-Ape workflow resolves its runner like this: + +```yaml +runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} +``` + +| `GIT_APE_RUNNER_LABEL` | Effect | +|------------------------|--------| +| **unset** (default) | Jobs run on GitHub-hosted `ubuntu-latest`. No infrastructure. | +| Set to a hosted runner name (e.g. `git-ape-vnet-4vcpu`) | Jobs run on GitHub-hosted compute with Azure private networking. | +| Set to a self-hosted label (e.g. `git-ape-runner`) | Jobs target your self-hosted runners registered with that label. | + +Switching is a one-line change and is fully reversible: + +```bash +# Switch to hosted compute networking runner +gh variable set GIT_APE_RUNNER_LABEL --repo / --body "git-ape-vnet-4vcpu" + +# Switch to self-hosted runners +gh variable set GIT_APE_RUNNER_LABEL --repo / --body "git-ape-runner" + +# Clean fallback to GitHub-hosted runners (public) +gh variable delete GIT_APE_RUNNER_LABEL --repo / +``` + +In multi-environment mode, set the variable per environment +(`--env azure-deploy-prod`) so only the environments that need private runners +use them. + +--- + +## Option 1: Hosted Compute Networking (recommended) + +**GitHub-hosted runners with Azure private networking.** GitHub manages the +compute (Ubuntu VMs with all standard tools pre-installed), but the runners +execute inside your Azure VNet for private connectivity to your resources. + +> **Requires:** GitHub Enterprise Cloud. No custom image, no ACR, no KEDA — +> GitHub provides full Ubuntu images with `az`, `gh`, `jq`, `git` pre-installed. + +**Reference:** +[About networking for hosted compute products](https://docs.github.com/en/enterprise-cloud@latest/admin/configuring-settings/configuring-private-networking-for-hosted-compute-products/about-networking-for-hosted-compute-products-in-your-enterprise) + +### Scope: Organization vs Enterprise + +Hosted compute network configurations can be created at two levels: + +| Scope | `businessId` value | API endpoint | UI location | +|-------|-------------------|--------------|-------------| +| **Enterprise** | Enterprise `databaseId` (from GraphQL) | `enterprises/{slug}/network-configurations` | Enterprise Settings → Hosted compute networking | +| **Organization** | Org numeric ID (from REST API) | `orgs/{org}/settings/network-configurations` | Organization Settings → Hosted compute networking | + +Enterprise-scoped configs can be shared across all orgs in the enterprise. +Organization-scoped configs are independent (requires enterprise policy to allow). + +### Provisioning flow + +```mermaid +flowchart LR + A[Create Azure VNet
+ /28 subnet] --> B[Delegate subnet to
GitHub.Network/networkSettings] + B --> C[Register GitHub.Network
resource provider] + C --> D[Create networkSettings
Azure resource] + D --> E[Create network config
via GitHub API] + E --> F[Create runner group
linked to network config] + F --> G[Create hosted runner
in runner group] + G --> H[Assign org/repo
to runner group] + H --> I[Set GIT_APE_RUNNER_LABEL
= runner name] +``` + +### Step-by-step + +1. **Create Azure VNet and subnet** (minimum `/28` — 16 IPs): + ```bash + az group create --name --location + az network vnet create --name --resource-group \ + --address-prefix 10.0.0.0/16 --subnet-name snet-runners --subnet-prefix 10.0.0.0/28 + ``` + +2. **Delegate subnet** to `GitHub.Network/networkSettings`: + ```bash + az network vnet subnet update --name snet-runners --vnet-name \ + --resource-group --delegations GitHub.Network/networkSettings + ``` + +3. **Register the `GitHub.Network` resource provider** on the subscription: + ```bash + az provider register --namespace GitHub.Network + az provider show --namespace GitHub.Network --query "registrationState" -o tsv + # Wait until "Registered" + ``` + +4. **Create the `GitHub.Network/networkSettings` resource:** + ```bash + # businessId = enterprise databaseId (enterprise scope) or org numeric ID (org scope) + az rest --method PUT \ + --url "https://management.azure.com/subscriptions//resourceGroups//providers/GitHub.Network/networkSettings/?api-version=2024-04-02" \ + --body '{ + "location": "", + "properties": { + "businessId": "", + "subnetId": "/subscriptions//resourceGroups//providers/Microsoft.Network/virtualNetworks//subnets/snet-runners" + } + }' + ``` + ⚠️ **`businessId` is immutable** — if wrong, you must delete and recreate the resource. + + The resource will have a `GitHubId` tag (a SHA-256 hash) — this is the ID + GitHub uses to reference the network settings. + +5. **Create the network configuration** on GitHub (use the `GitHubId` tag value, + NOT the Azure resource ID): + ```bash + # Enterprise scope: + gh api --method POST enterprises//network-configurations \ + -f name="" \ + -f compute_service="actions" \ + -f network_settings_ids[]="" + + # Organization scope: + gh api --method POST orgs//settings/network-configurations \ + -f name="" \ + -f compute_service="actions" \ + -f network_settings_ids[]="" + ``` + +6. **Create a runner group** linked to the network configuration: + ```bash + # Enterprise scope: + gh api --method POST enterprises//actions/runner-groups \ + -f name="" -f visibility="selected" \ + -F allows_public_repositories=false \ + -f network_configuration_id="" + + # Organization scope: + gh api --method POST orgs//actions/runner-groups \ + -f name="" -f visibility="selected" \ + -F allows_public_repositories=false \ + -f network_configuration_id="" + ``` + +7. **Assign org/repo to the runner group:** + ```bash + # Enterprise: assign org + gh api --method PUT enterprises//actions/runner-groups//organizations/ + # Org: assign repo (for inherited enterprise groups, use the inherited group ID at org level) + gh api --method PUT orgs//actions/runner-groups//repositories/ + ``` + +8. **Create a hosted runner** in the group: + ```bash + # Query available images and sizes first: + gh api orgs//actions/hosted-runners/images/github-owned + gh api orgs//actions/hosted-runners/machine-sizes + + # Create runner (image IDs are NUMERIC, sizes are like "4-core"): + echo '{"name":"","runner_group_id":,"platform":"linux-x64","image":{"id":"","source":"github"},"size":"4-core","maximum_runners":5}' | \ + gh api --method POST enterprises//actions/hosted-runners --input - + ``` + +9. **Set the variable:** + ```bash + gh variable set GIT_APE_RUNNER_LABEL --repo / --body "" + ``` + +10. **Verify** by triggering `Git-Ape: Verify Setup`. + +### Key facts + +- **No custom image needed** — GitHub's hosted compute uses full Ubuntu images + with all standard tools (`az`, `gh`, `jq`, `git`, Docker, etc.) +- **No KEDA, no cold start** — runners are always available (status: "Ready") +- **`network_settings_ids`** expects the `GitHubId` tag value (SHA-256 hash + from the Azure resource), NOT the Azure resource ID +- **Image IDs are numeric** (e.g., `"2295"` for Ubuntu 24.04) — query them via + `GET orgs/{org}/actions/hosted-runners/images/github-owned` +- **Size IDs** are GitHub-specific (e.g., `"4-core"`, `"8-core"`) — query via + `GET orgs/{org}/actions/hosted-runners/machine-sizes` +- **`businessId` is immutable** on the Azure resource — getting it wrong means + delete + recreate + +### Required GitHub token scopes + +All scopes must be present **before** starting provisioning to avoid repeated +auth prompts: + +| Scope | Purpose | +|-------|---------| +| `admin:org` | Create runner groups, assign repos | +| `admin:enterprise` | Enterprise-level runner groups and hosted runners | +| `manage_runners:org` | Create/manage hosted runners | +| `read:enterprise` | Query enterprise metadata (databaseId) | +| `write:network_configurations` | Create network configurations | + +```bash +# Authenticate once with all required scopes: +gh auth refresh -h github.com -s admin:org,admin:enterprise,manage_runners:org,read:enterprise,write:network_configurations +``` + +--- + +## Option 2: Self-Hosted Runners (ACI / ACA / AKS) + +Self-hosted runners run in **your** Azure subscription. You manage the compute, +image, scaling, and networking. + +> **⭐ ACA is deployed as a first-class Git-Ape deployment — Git-Ape deploying +> Git-Ape.** Instead of an imperative `az deployment group create`, the ACA +> runner stack ships as a subscription-scoped deployment artifact at +> [`../deployments/git-ape-runners/`](../deployments/git-ape-runners). Onboarding +> scaffolds it to your repo's `.azure/deployments/git-ape-runners/`, so the +> runner infrastructure flows through Git-Ape's managed pipeline: **architecture +> diagram, cost estimate, `az stack sub create` deploy, and single-command +> destroy**. The raw `aca/` template below remains as the reference IaC that the +> deployment artifact is built from. See that folder's `README.md` for the +> deploy/destroy walkthrough and bootstrap ordering. + +### Platform matrix + +| | **Azure Container Instances (ACI)** | **Azure Container Apps (ACA)** | **Azure Kubernetes Service (AKS)** | +|---|---|---|---| +| **Basic** | [`aci/`](./aci) — single container group, simplest | [`aca/`](./aca) — KEDA-scaled ephemeral jobs · **deploy via [`deployments/git-ape-runners/`](../deployments/git-ape-runners)** | [`aks/`](./aks) — Actions Runner Controller (ARC) | +| **With private networking** | [`aci/`](./aci) with `subnetId` set | [`aca/`](./aca) with `infrastructureSubnetId` set | [`aks/`](./aks) — runners on cluster node subnet | + +### Which platform? + +| Choose | When | +|--------|------| +| **ACI** | Fewest moving parts. A handful of runners, simple scaling, fast to stand up. | +| **ACA** (recommended) | You want **event-driven, ephemeral** runners that scale to zero between jobs (KEDA `github-runner` scaler). Best cost/utilization — and it's deployed as a managed Git-Ape deployment (diagram/cost/deploy/destroy). | +| **AKS** | You already run AKS, need large-scale autoscaling, or want ARC's ephemeral runner pods and fine-grained scheduling. | + +## Custom runner image (required) + +> **⚠️ The base `ghcr.io/actions/actions-runner:latest` (GitHub's official runner +> image) does NOT include `az`, `gh`, or `jq`, and ships no registration +> entrypoint.** Git-Ape workflows will fail with +> `Unable to locate executable file: az` — and on ACI/ACA the runner never even +> registers — if you use it directly. + +You **must** build and use the custom image from the [`Dockerfile`](./Dockerfile) +in this directory. It extends the base runner with all Git-Ape prerequisites and +an [`entrypoint.sh`](./entrypoint.sh) that self-registers the runner on ACI/ACA. + +### Build with ACR Tasks (recommended — cloud build, no local Docker) + +Always build the image in Azure using ACR Tasks. This avoids: +- Needing Docker installed locally +- **Windows CRLF line-ending corruption** — when the build context is uploaded + from a Windows checkout (`git autocrlf`), `entrypoint.sh` may have `\r\n` + endings. The Dockerfile includes a `sed` safety net, but cloud builds on ACR + Tasks run in Linux and handle this cleanly. + +```bash +# Create an ACR (one-time) — admin-enabled false; use managed identity for pulls +az acr create --name --resource-group --location --sku Basic + +# Build and push the image (runs in Azure, ~3 min) +az acr build --registry --image git-ape-runner:latest \ + --file .github/skills/git-ape-onboarding/templates/runners/Dockerfile \ + .github/skills/git-ape-onboarding/templates/runners/ +``` + +> **Windows note:** `az acr build` may crash with a `charmap` codec error while +> streaming build logs (Unicode characters in `apt-get` output). Add `--no-logs` +> to skip log streaming — the build still runs in Azure: +> ```bash +> az acr build --registry --image git-ape-runner:latest \ +> --file ... ... --no-logs +> ``` +> Check the result with `az acr repository list --name `. + +### ACR pull authentication (managed identity — recommended) + +> **For ACA, prefer the managed Git-Ape deployment** at +> [`../deployments/git-ape-runners/`](../deployments/git-ape-runners), which +> provisions the managed identity, `AcrPull` role assignment, Key Vault, and ACA +> job for you inside one subscription-scoped stack (`az stack sub create`). The +> manual `az` steps below are the imperative equivalent, kept for reference and +> for the ACI path. + +Use a **user-assigned managed identity** with the `AcrPull` role to pull images +from your ACR. This eliminates admin credentials entirely. + +```bash +# Create a managed identity (one-time) +az identity create --name id-git-ape-runner --resource-group --location + +# Get the identity's principal ID and resource ID +IDENTITY_ID=$(az identity show --name id-git-ape-runner --resource-group --query id -o tsv) +PRINCIPAL_ID=$(az identity show --name id-git-ape-runner --resource-group --query principalId -o tsv) +ACR_ID=$(az acr show --name --query id -o tsv) + +# Assign AcrPull role +az role assignment create --assignee-object-id $PRINCIPAL_ID --assignee-principal-type ServicePrincipal \ + --role AcrPull --scope $ACR_ID + +# Deploy the ACA template with managed identity + ACR server +az deployment group create -g -f template.json \ + -p runnerImage='.azurecr.io/git-ape-runner:latest' \ + acrServer='.azurecr.io' \ + userAssignedIdentityId=$IDENTITY_ID \ + githubOwnerRepo='org/repo' \ + githubAccessToken='' +``` + +The ACA template's `registries` block automatically uses identity-based auth +when both `acrServer` and `userAssignedIdentityId` are set — no username/password. + +### Legacy: ACR admin credentials (not recommended) + +If you cannot use managed identity, enable admin access and configure pull +credentials manually: + +```bash +az acr update --name --admin-enabled true +az containerapp job registry set --name git-ape-runner --resource-group \ + --server .azurecr.io --username \ + --password $(az acr credential show -n --query "passwords[0].value" -o tsv) +``` + +### Tools included in the custom image + +| Tool | Minimum version | Purpose | +|------|----------------|---------| +| `az` | 2.50+ | Azure CLI — OIDC login, deployments, resource management | +| `gh` | 2.0+ | GitHub CLI — PR comments, workflow dispatch | +| `jq` | 1.6+ | JSON processing in shell scripts | +| `git` | (any) | Checkout, commit state files | + +## KEDA cold-start considerations + +The KEDA `github-runner` scaler polls the GitHub Actions queue every 30 seconds. +On a fresh deployment, there can be a delay of 1–3 minutes before KEDA detects +queued jobs and spins up a runner. During this window, GitHub shows the job as +"Waiting for a runner" and the Settings page shows "No runners configured" +(ephemeral runners only exist during job execution). + +**Recommendations:** + +- **Set `minExecutions=1`** if you want at least one runner always warm and + visible in GitHub Settings. This eliminates cold-start delays at the cost of + one always-running container (~$30–50/month on Consumption plan). + ```bash + az containerapp job update --name git-ape-runner --resource-group --min-executions 1 + ``` +- **Leave `minExecutions=0`** (default) for true scale-to-zero if you can + tolerate 1–3 minute cold starts. Runners will appear in GitHub only while + jobs are executing. +- **Fine-grained PATs** work with the KEDA scaler but require + `administration:write` permission on the target repo. + +## Security model + +- **Azure access uses a managed identity, never secrets.** Each template + attaches a **user-assigned managed identity** to the runner so the workflows + can authenticate to Azure (the runner host identity) — but Git-Ape workflows + still use **OIDC federation** for `az` actions, so the managed identity only + needs what the runtime requires. Do not put subscription keys or connection + strings on the runner. +- **ACR image pull uses managed identity, not admin credentials.** The managed + identity assigned to the runner should have the `AcrPull` role on the ACR. + The ACA template supports identity-based registry auth natively via the + `acrServer` + `userAssignedIdentityId` parameters — no username/password. + ACR admin credentials are a legacy fallback and should be avoided. +- **The GitHub registration credential is the one unavoidable secret.** GitHub + requires a credential to register a runner. Order of preference: + 1. **GitHub App** installation token (recommended for org-scale; ARC supports + this natively). + 2. **Fine-grained PAT** with `administration:write` (repo runners) or + organization `self-hosted runners` write (org runners). + Source it from **Azure Key Vault** (`securestring` params + Key Vault + reference), never inline it in a committed `parameters.json`. +- **Ephemeral runners by default.** Templates register **ephemeral** runners + (one job per runner, then re-register). This prevents state leaking between + jobs — important when runners are shared across deployments. +- **Label scoping.** All templates register the runner with the label + `git-ape-runner` (override via parameter). That label is what + `GIT_APE_RUNNER_LABEL` must match. + +## Provisioning flow (all platforms) + +```mermaid +flowchart LR + A[Choose type + platform] --> B[Create ACR +
build custom image
via ACR Tasks] + B --> C[Create managed identity
+ AcrPull role] + C --> D[Copy template into
.azure/runners/] + D --> E[Provide GitHub creds
via Key Vault] + E --> F[Deploy IaC
az deployment / helm] + F --> G[Set minExecutions=1
runner registers] + G --> H[Runner registers
with label git-ape-runner] + H --> I[Set GIT_APE_RUNNER_LABEL
variable] + I --> J[Workflows now run
on private runners] + J -.clean fallback.-> K[Unset variable →
back to ubuntu-latest] +``` + +1. **Choose** the runner type and platform (the `/git-ape-onboarding` flow asks). +2. **Create an ACR** and build the custom runner image using ACR Tasks (cloud + build — avoids CRLF issues and requires no local Docker). Create a + **user-assigned managed identity** with `AcrPull` role for image pulls — do + not use ACR admin credentials. +3. **Copy** the chosen platform folder into your repo under + `.azure/runners//` and edit parameters for your repo/org, region, + labels, image, and (for VNet-injected) the target `subnetId`. +4. **Store the GitHub credential** in Key Vault and reference it from the + secure parameter — do not commit it. +5. **Deploy** the runner infrastructure (see each platform's README/notes). +6. **Confirm** the runner is **online** in + *GitHub → Settings → Actions → Runners* with the `git-ape-runner` label. + (With `minExecutions=0`, the runner only appears while a job is running.) +7. **Set** `GIT_APE_RUNNER_LABEL=git-ape-runner` (repo or per-environment). +8. **Verify** by running the `Git-Ape: Verify Setup` workflow — its *Runner + Configuration* step reports the active runner mode. + +## Note on the drift workflow + +`git-ape-drift.lock.yml` is a **compiled GitHub Agentic Workflow** (gh-aw). Its +runner is fixed at compile time and gh-aw only supports GitHub-hosted Ubuntu +labels for its agent job. To run continuous drift detection on a private runner, +set `runs-on:` (and optionally `runs-on-slim:`) in the **source** +`git-ape-drift.md` frontmatter to a supported label and recompile with +`gh aw compile`. Do **not** hand-edit the `.lock.yml` — it carries an integrity +hash and will fail its stale-lock check. The other four workflows honor +`GIT_APE_RUNNER_LABEL` directly with no recompile. diff --git a/.github/skills/git-ape-onboarding/templates/runners/aca/parameters.json b/.github/skills/git-ape-onboarding/templates/runners/aca/parameters.json new file mode 100644 index 0000000..2d3b97c --- /dev/null +++ b/.github/skills/git-ape-onboarding/templates/runners/aca/parameters.json @@ -0,0 +1,41 @@ +{ + "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#", + "contentVersion": "1.0.0.0", + "parameters": { + "runnerName": { + "value": "git-ape-runner" + }, + "githubOwnerRepo": { + "value": "your-org/your-repo" + }, + "runnerScope": { + "value": "repo" + }, + "runnerLabels": { + "value": "git-ape-runner" + }, + "_comment_githubAccessToken": "Requires a long-lived GitHub PAT (fine-grained with Actions + Administration R/W, or classic with repo scope). Do NOT use a short-lived registration token from the GitHub API — it expires in ~1h and breaks KEDA polling and ephemeral runner registration. Reference the PAT from Key Vault as shown, or pass it at deploy time with -p githubAccessToken=.", + "githubAccessToken": { + "reference": { + "keyVault": { + "id": "/subscriptions//resourceGroups//providers/Microsoft.KeyVault/vaults/" + }, + "secretName": "github-runner-token" + } + }, + "userAssignedIdentityId": { + "value": "" + }, + "_comment_infrastructureSubnetId": "Leave empty for a self-hosted (subscription) environment. For VNet-injected, set a subnet resource ID (Consumption needs a /23 or larger).", + "infrastructureSubnetId": { + "value": "" + }, + "maxRunners": { + "value": 10 + }, + "_comment_acrServer": "Set to your ACR login server (e.g. myacr.azurecr.io) when using a custom runner image. When combined with userAssignedIdentityId, the job authenticates to ACR via managed identity (AcrPull role required) — no admin credentials needed.", + "acrServer": { + "value": "" + } + } +} diff --git a/.github/skills/git-ape-onboarding/templates/runners/aca/template.json b/.github/skills/git-ape-onboarding/templates/runners/aca/template.json new file mode 100644 index 0000000..edd8d8b --- /dev/null +++ b/.github/skills/git-ape-onboarding/templates/runners/aca/template.json @@ -0,0 +1,232 @@ +{ + "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#", + "contentVersion": "1.0.0.0", + "metadata": { + "_generator": "git-ape-onboarding", + "description": "Git-Ape self-hosted runner on Azure Container Apps (ACA). Provisions a managed environment and an event-driven Container Apps Job that scales ephemeral GitHub Actions runners on demand with the KEDA 'github-runner' scaler (scale-to-zero between jobs). Runners register with the label 'git-ape-runner' (override via parameter). Leave infrastructureSubnetId empty for a self-hosted (subscription) environment; set it for VNet injection (Consumption needs a /23 or larger). The GitHub credential is the only secret and must be sourced from Key Vault. ACR image pull uses managed identity (AcrPull role) when both acrServer and userAssignedIdentityId are set — no admin credentials needed.\n\nDeploy:\n az group create -n rg-git-ape-runners -l eastus\n az deployment group create -g rg-git-ape-runners -f template.json -p @parameters.json" + }, + "parameters": { + "location": { + "type": "string", + "defaultValue": "[resourceGroup().location]", + "metadata": { + "description": "Azure region. Defaults to the resource group location." + } + }, + "runnerName": { + "type": "string", + "defaultValue": "git-ape-runner", + "metadata": { + "description": "Base name for the managed environment and job." + } + }, + "githubOwnerRepo": { + "type": "string", + "metadata": { + "description": "GitHub target in / form for repo-scoped runners, or the org/owner name for org-scoped runners." + } + }, + "runnerScope": { + "type": "string", + "defaultValue": "repo", + "allowedValues": [ + "repo", + "org" + ], + "metadata": { + "description": "Runner registration scope." + } + }, + "runnerLabels": { + "type": "string", + "defaultValue": "git-ape-runner", + "metadata": { + "description": "Comma-separated runner labels. Must include the value used for the GIT_APE_RUNNER_LABEL workflow variable." + } + }, + "githubAccessToken": { + "type": "securestring", + "metadata": { + "description": "Long-lived GitHub PAT used to register ephemeral runners and poll the Actions queue via KEDA. Requires a fine-grained PAT with Actions + Administration (Read & Write) permissions, or a classic PAT with repo scope. Do NOT use a short-lived registration token — it expires in ~1h. Source from Key Vault - never commit it." + } + }, + "runnerImage": { + "type": "string", + "defaultValue": "ghcr.io/actions/actions-runner:latest", + "metadata": { + "description": "Runner container image. IMPORTANT: the stock image shown here neither includes az/gh/jq nor self-registers a runner. Build the custom image from ../Dockerfile (which adds those tools and a registration entrypoint) and push it to your ACR, then set this parameter to '.azurecr.io/git-ape-runner:latest'." + } + }, + "userAssignedIdentityId": { + "type": "string", + "defaultValue": "", + "metadata": { + "description": "Resource ID of a user-assigned managed identity for the runner to access Azure. Leave empty for none." + } + }, + "infrastructureSubnetId": { + "type": "string", + "defaultValue": "", + "metadata": { + "description": "Resource ID of an infrastructure subnet for VNet injection. Leave empty for a non-VNet environment." + } + }, + "maxRunners": { + "type": "int", + "defaultValue": 10, + "metadata": { + "description": "Maximum concurrent runner executions (KEDA maxExecutions)." + } + }, + "cpuCores": { + "type": "string", + "defaultValue": "1.0", + "metadata": { + "description": "vCPU for the runner container." + } + }, + "memorySize": { + "type": "string", + "defaultValue": "2Gi", + "metadata": { + "description": "Memory for the runner container (e.g. 2Gi)." + } + }, + "acrServer": { + "type": "string", + "defaultValue": "", + "metadata": { + "description": "ACR login server (e.g. myacr.azurecr.io). When set together with userAssignedIdentityId, the job pulls images via managed identity (AcrPull role required) — no admin credentials needed. Leave empty if using a public image or configuring registry credentials out-of-band." + } + } + }, + "variables": { + "isOrgScope": "[equals(parameters('runnerScope'), 'org')]", + "hasIdentity": "[not(empty(parameters('userAssignedIdentityId')))]", + "hasAcr": "[not(empty(parameters('acrServer')))]", + "isVnet": "[not(empty(parameters('infrastructureSubnetId')))]", + "envName": "[format('{0}-env', parameters('runnerName'))]", + "ownerName": "[if(variables('isOrgScope'), parameters('githubOwnerRepo'), first(split(parameters('githubOwnerRepo'), '/')))]", + "repoName": "[if(variables('isOrgScope'), '', last(split(parameters('githubOwnerRepo'), '/')))]", + "scalerMetadataBase": { + "owner": "[variables('ownerName')]", + "runnerScope": "[parameters('runnerScope')]", + "labels": "[parameters('runnerLabels')]", + "targetWorkflowQueueLength": "1", + "applicationID": "" + }, + "scalerMetadata": "[if(variables('isOrgScope'), variables('scalerMetadataBase'), union(variables('scalerMetadataBase'), createObject('repos', variables('repoName'))))]", + "scopeEnv": "[if(variables('isOrgScope'), createArray(createObject('name', 'ORG_NAME', 'value', parameters('githubOwnerRepo'))), createArray(createObject('name', 'REPO_URL', 'value', concat('https://github.com/', parameters('githubOwnerRepo')))))]", + "baseEnv": [ + { + "name": "RUNNER_SCOPE", + "value": "[parameters('runnerScope')]" + }, + { + "name": "RUNNER_NAME_PREFIX", + "value": "[parameters('runnerName')]" + }, + { + "name": "LABELS", + "value": "[parameters('runnerLabels')]" + }, + { + "name": "EPHEMERAL", + "value": "true" + }, + { + "name": "DISABLE_AUTO_UPDATE", + "value": "true" + }, + { + "name": "ACCESS_TOKEN", + "secretRef": "github-pat" + } + ], + "containerEnv": "[concat(variables('scopeEnv'), variables('baseEnv'))]", + "identityBlock": "[if(variables('hasIdentity'), createObject('type', 'UserAssigned', 'userAssignedIdentities', createObject(parameters('userAssignedIdentityId'), createObject())), createObject('type', 'None'))]", + "registriesBlock": "[if(and(variables('hasAcr'), variables('hasIdentity')), createArray(createObject('server', parameters('acrServer'), 'identity', parameters('userAssignedIdentityId'))), if(variables('hasAcr'), createArray(createObject('server', parameters('acrServer'))), createArray()))]", + "vnetConfiguration": "[if(variables('isVnet'), createObject('infrastructureSubnetId', parameters('infrastructureSubnetId'), 'internal', false()), json('null'))]" + }, + "resources": [ + { + "type": "Microsoft.App/managedEnvironments", + "apiVersion": "2024-03-01", + "name": "[variables('envName')]", + "location": "[parameters('location')]", + "properties": { + "vnetConfiguration": "[variables('vnetConfiguration')]" + } + }, + { + "type": "Microsoft.App/jobs", + "apiVersion": "2024-03-01", + "name": "[parameters('runnerName')]", + "location": "[parameters('location')]", + "identity": "[variables('identityBlock')]", + "dependsOn": [ + "[resourceId('Microsoft.App/managedEnvironments', variables('envName'))]" + ], + "properties": { + "environmentId": "[resourceId('Microsoft.App/managedEnvironments', variables('envName'))]", + "configuration": { + "triggerType": "Event", + "replicaTimeout": 1800, + "replicaRetryLimit": 1, + "registries": "[if(empty(variables('registriesBlock')), json('null'), variables('registriesBlock'))]", + "secrets": [ + { + "name": "github-pat", + "value": "[parameters('githubAccessToken')]" + } + ], + "eventTriggerConfig": { + "parallelism": 1, + "replicaCompletionCount": 1, + "scale": { + "minExecutions": 0, + "maxExecutions": "[parameters('maxRunners')]", + "pollingInterval": 30, + "rules": [ + { + "name": "github-runner", + "type": "github-runner", + "metadata": "[variables('scalerMetadata')]", + "auth": [ + { + "secretRef": "github-pat", + "triggerParameter": "personalAccessToken" + } + ] + } + ] + } + } + }, + "template": { + "containers": [ + { + "name": "[parameters('runnerName')]", + "image": "[parameters('runnerImage')]", + "env": "[variables('containerEnv')]", + "resources": { + "cpu": "[json(parameters('cpuCores'))]", + "memory": "[parameters('memorySize')]" + } + } + ] + } + } + } + ], + "outputs": { + "runnerJobId": { + "type": "string", + "value": "[resourceId('Microsoft.App/jobs', parameters('runnerName'))]" + }, + "runnerLabel": { + "type": "string", + "value": "[parameters('runnerLabels')]" + } + } +} diff --git a/.github/skills/git-ape-onboarding/templates/runners/aci/parameters.json b/.github/skills/git-ape-onboarding/templates/runners/aci/parameters.json new file mode 100644 index 0000000..e767aa2 --- /dev/null +++ b/.github/skills/git-ape-onboarding/templates/runners/aci/parameters.json @@ -0,0 +1,37 @@ +{ + "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#", + "contentVersion": "1.0.0.0", + "parameters": { + "runnerName": { + "value": "git-ape-runner" + }, + "githubOwnerRepo": { + "value": "your-org/your-repo" + }, + "runnerScope": { + "value": "repo" + }, + "runnerLabels": { + "value": "git-ape-runner" + }, + "_comment_githubAccessToken": "Requires a long-lived GitHub PAT (fine-grained with Actions + Administration R/W, or classic with repo scope). Do NOT use a short-lived registration token from the GitHub API — it expires in ~1h and breaks runner registration. Reference the PAT from Key Vault as shown below, or pass it at deploy time with -p githubAccessToken=.", + "githubAccessToken": { + "reference": { + "keyVault": { + "id": "/subscriptions//resourceGroups//providers/Microsoft.KeyVault/vaults/" + }, + "secretName": "github-runner-token" + } + }, + "userAssignedIdentityId": { + "value": "" + }, + "_comment_subnetId": "Leave empty for a self-hosted (subscription) runner. For a VNet-injected runner, set the resource ID of a subnet delegated to Microsoft.ContainerInstance/containerGroups.", + "subnetId": { + "value": "" + }, + "ephemeral": { + "value": true + } + } +} diff --git a/.github/skills/git-ape-onboarding/templates/runners/aci/template.json b/.github/skills/git-ape-onboarding/templates/runners/aci/template.json new file mode 100644 index 0000000..ccddb7d --- /dev/null +++ b/.github/skills/git-ape-onboarding/templates/runners/aci/template.json @@ -0,0 +1,171 @@ +{ + "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#", + "contentVersion": "1.0.0.0", + "metadata": { + "_generator": "git-ape-onboarding", + "description": "Git-Ape self-hosted runner on Azure Container Instances (ACI). Provisions a container group that registers an ephemeral GitHub Actions runner with the label 'git-ape-runner' (override via parameter). Leave subnetId empty for a self-hosted (subscription) runner; set it to a delegated subnet for VNet injection. The GitHub credential is the only secret and must be sourced from Key Vault. Azure access uses an optional user-assigned managed identity, never stored keys.\n\nDeploy:\n az group create -n rg-git-ape-runners -l eastus\n az deployment group create -g rg-git-ape-runners -f template.json -p @parameters.json" + }, + "parameters": { + "location": { + "type": "string", + "defaultValue": "[resourceGroup().location]", + "metadata": { + "description": "Azure region for the runner. Defaults to the resource group location." + } + }, + "runnerName": { + "type": "string", + "defaultValue": "git-ape-runner", + "metadata": { + "description": "Name of the container group / runner instance." + } + }, + "githubOwnerRepo": { + "type": "string", + "metadata": { + "description": "GitHub target in / form for repo-scoped runners, or the org/owner name for org-scoped runners." + } + }, + "runnerScope": { + "type": "string", + "defaultValue": "repo", + "allowedValues": [ + "repo", + "org" + ], + "metadata": { + "description": "Runner registration scope." + } + }, + "runnerLabels": { + "type": "string", + "defaultValue": "git-ape-runner", + "metadata": { + "description": "Comma-separated runner labels. Must include the value used for the GIT_APE_RUNNER_LABEL workflow variable." + } + }, + "githubAccessToken": { + "type": "securestring", + "metadata": { + "description": "Long-lived GitHub PAT used to register the runner. Requires a fine-grained PAT with Actions + Administration (Read & Write) permissions, or a classic PAT with repo scope. Do NOT use a short-lived registration token — it expires in ~1h. Source from Key Vault - never commit it." + } + }, + "runnerImage": { + "type": "string", + "defaultValue": "ghcr.io/actions/actions-runner:latest", + "metadata": { + "description": "Runner container image. IMPORTANT: the stock image shown here neither includes az/gh/jq nor self-registers a runner. Build the custom image from ../Dockerfile (which adds those tools and a registration entrypoint) and push it to your ACR, then set this parameter to '.azurecr.io/git-ape-runner:latest'." + } + }, + "userAssignedIdentityId": { + "type": "string", + "defaultValue": "", + "metadata": { + "description": "Resource ID of a user-assigned managed identity for the runner to access Azure. Leave empty for none." + } + }, + "subnetId": { + "type": "string", + "defaultValue": "", + "metadata": { + "description": "Resource ID of a delegated subnet for VNet injection. Leave empty for a non-VNet (public-egress) runner." + } + }, + "ephemeral": { + "type": "bool", + "defaultValue": true, + "metadata": { + "description": "Register the runner as ephemeral (one job per registration, then re-register). Recommended." + } + }, + "cpuCores": { + "type": "int", + "defaultValue": 2, + "metadata": { + "description": "vCPU cores for the runner container." + } + }, + "memoryInGB": { + "type": "int", + "defaultValue": 4, + "metadata": { + "description": "Memory (GB) for the runner container." + } + } + }, + "variables": { + "isOrgScope": "[equals(parameters('runnerScope'), 'org')]", + "hasIdentity": "[not(empty(parameters('userAssignedIdentityId')))]", + "isVnet": "[not(empty(parameters('subnetId')))]", + "scopeEnv": "[if(variables('isOrgScope'), createArray(createObject('name', 'ORG_NAME', 'value', parameters('githubOwnerRepo'))), createArray(createObject('name', 'REPO_URL', 'value', concat('https://github.com/', parameters('githubOwnerRepo')))))]", + "baseEnv": [ + { + "name": "RUNNER_SCOPE", + "value": "[parameters('runnerScope')]" + }, + { + "name": "RUNNER_NAME_PREFIX", + "value": "[parameters('runnerName')]" + }, + { + "name": "LABELS", + "value": "[parameters('runnerLabels')]" + }, + { + "name": "EPHEMERAL", + "value": "[if(parameters('ephemeral'), 'true', 'false')]" + }, + { + "name": "DISABLE_AUTO_UPDATE", + "value": "true" + }, + { + "name": "ACCESS_TOKEN", + "secureValue": "[parameters('githubAccessToken')]" + } + ], + "environmentVariables": "[concat(variables('scopeEnv'), variables('baseEnv'))]", + "identityBlock": "[if(variables('hasIdentity'), createObject('type', 'UserAssigned', 'userAssignedIdentities', createObject(parameters('userAssignedIdentityId'), createObject())), createObject('type', 'None'))]", + "subnetIds": "[if(variables('isVnet'), createArray(createObject('id', parameters('subnetId'))), json('null'))]" + }, + "resources": [ + { + "type": "Microsoft.ContainerInstance/containerGroups", + "apiVersion": "2023-05-01", + "name": "[parameters('runnerName')]", + "location": "[parameters('location')]", + "identity": "[variables('identityBlock')]", + "properties": { + "sku": "Standard", + "osType": "Linux", + "restartPolicy": "Always", + "containers": [ + { + "name": "[parameters('runnerName')]", + "properties": { + "image": "[parameters('runnerImage')]", + "resources": { + "requests": { + "cpu": "[parameters('cpuCores')]", + "memoryInGB": "[parameters('memoryInGB')]" + } + }, + "environmentVariables": "[variables('environmentVariables')]" + } + } + ], + "subnetIds": "[variables('subnetIds')]" + } + } + ], + "outputs": { + "runnerId": { + "type": "string", + "value": "[resourceId('Microsoft.ContainerInstance/containerGroups', parameters('runnerName'))]" + }, + "runnerLabel": { + "type": "string", + "value": "[parameters('runnerLabels')]" + } + } +} diff --git a/.github/skills/git-ape-onboarding/templates/runners/aks/README.md b/.github/skills/git-ape-onboarding/templates/runners/aks/README.md new file mode 100644 index 0000000..c39090e --- /dev/null +++ b/.github/skills/git-ape-onboarding/templates/runners/aks/README.md @@ -0,0 +1,102 @@ +# Git-Ape self-hosted runners on AKS (Actions Runner Controller) + +On AKS, private runners are provisioned with the **Actions Runner Controller +(ARC)** using the official `gha-runner-scale-set` Helm chart. ARC runs +**ephemeral runner pods** that scale on demand and scale to zero between jobs. + +> There is no ARM-only path to install ARC — the controller and runner scale set +> are Kubernetes resources installed via Helm. The **AKS cluster itself** can be +> created with a standard Git-Ape ARM deployment (`/git-ape` → +> `Microsoft.ContainerService/managedClusters`); this folder covers the ARC layer +> that runs on top of it. + +## The label is the scale set name + +The runner scale set's name **is** the `runs-on` label. Set +`runnerScaleSetName: git-ape-runner` (below) and then set the repo variable +`GIT_APE_RUNNER_LABEL=git-ape-runner`. The two must match. + +## Prerequisites + +- An AKS cluster (self-hosted: any cluster; VNet-injected: a cluster on your + VNet/subnet, e.g. Azure CNI). +- `kubectl` context pointing at the cluster and `helm` installed. +- A GitHub credential (GitHub App recommended, or a fine-grained PAT) stored in + Key Vault. Do not commit it. +- A **custom runner image** (see below) pushed to a registry the cluster can pull. + +## Custom runner image (required) + +Like the ACI/ACA paths, AKS runner pods need `az`, `gh`, and `jq` — the stock +`ghcr.io/actions/actions-runner` image has none of them, so Git-Ape steps fail +with `Unable to locate executable file: az`. Build the custom image from the +shared [`Dockerfile`](../Dockerfile) and push it to your ACR: + +```bash +az acr create --name --resource-group --location --sku Basic --admin-enabled true +az acr build --registry --image git-ape-runner:latest \ + --file ../Dockerfile .. +``` + +Set `template.spec.containers[0].image` in `values.yaml` to +`.azurecr.io/git-ape-runner:latest`. ARC overrides the container +command with `run.sh`, so the image's self-register entrypoint is unused on AKS +(the controller registers pods) — but the tools are still required. + +## Install + +```bash +# 1. Install the ARC controller (once per cluster) +helm install arc \ + --namespace arc-systems --create-namespace \ + oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set-controller + +# 2. Create the GitHub credential secret from Key Vault (never commit it) +GH_TOKEN=$(az keyvault secret show --vault-name \ + --name github-runner-token --query value -o tsv) + +kubectl create namespace arc-runners +kubectl create secret generic git-ape-runner-secret \ + --namespace arc-runners \ + --from-literal=github_token="$GH_TOKEN" + +# 3. Create the ACR pull secret so pods can pull the custom image +kubectl create secret docker-registry acr-pull \ + --namespace arc-runners \ + --docker-server=.azurecr.io \ + --docker-username= \ + --docker-password="$(az acr credential show -n --query passwords[0].value -o tsv)" + +# 4. Install the runner scale set with the Git-Ape values +helm install git-ape-runner \ + --namespace arc-runners \ + -f values.yaml \ + oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set +``` + +Edit `values.yaml` first: set `githubConfigUrl` to your repo (or org) URL, set +`template.spec.containers[0].image` to your custom ACR image, and, for +VNet-injected clusters, schedule runner pods onto the VNet node pool via +`template.spec.nodeSelector`. + +## Verify + +Confirm the scale set is registered in +*GitHub → Settings → Actions → Runners → Runner scale sets* (or org-level), then +set the workflow variable: + +```bash +gh variable set GIT_APE_RUNNER_LABEL --repo / --body "git-ape-runner" +``` + +Run **Git-Ape: Verify Setup** — its *Runner Configuration* step reports the +active runner mode. + +## Security notes + +- Prefer a **GitHub App** over a PAT for org-scale (`githubConfigSecret` then + carries the App id/installation id/private key instead of a token). +- Give runner pods Azure access with **AAD Workload Identity** (federated, no + stored keys) rather than mounting credentials. Git-Ape workflows still use + OIDC for `az` actions. +- Ephemeral runners are the ARC default — no state leaks between jobs. diff --git a/.github/skills/git-ape-onboarding/templates/runners/aks/values.yaml b/.github/skills/git-ape-onboarding/templates/runners/aks/values.yaml new file mode 100644 index 0000000..8d80d0f --- /dev/null +++ b/.github/skills/git-ape-onboarding/templates/runners/aks/values.yaml @@ -0,0 +1,49 @@ +# Git-Ape runner scale set values for the ARC `gha-runner-scale-set` Helm chart. +# Install: +# helm install git-ape-runner -n arc-runners -f values.yaml \ +# oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set +# +# The release name and `runnerScaleSetName` become the runs-on label. Keep them +# equal to the value of the GIT_APE_RUNNER_LABEL workflow variable. + +# Repo-scoped runners: https://github.com// +# Org-scoped runners: https://github.com/ +githubConfigUrl: "https://github.com/your-org/your-repo" + +# Reference the pre-created secret (see README) sourced from Key Vault. +# For a GitHub App instead of a PAT, populate github_app_id / +# github_app_installation_id / github_app_private_key in that secret. +githubConfigSecret: git-ape-runner-secret + +# This name IS the runs-on label. Must match GIT_APE_RUNNER_LABEL. +runnerScaleSetName: git-ape-runner + +minRunners: 0 +maxRunners: 10 + +# Ephemeral runner pods (ARC default). "dind" enables Docker-in-Docker if your +# deployment steps build images; use "kubernetes" for rootless pod-per-job. +containerMode: + type: "dind" + +template: + spec: + # Pull the custom image from a private ACR. Create the secret once: + # kubectl create secret docker-registry acr-pull -n arc-runners \ + # --docker-server=.azurecr.io \ + # --docker-username= \ + # --docker-password="$(az acr credential show -n --query passwords[0].value -o tsv)" + imagePullSecrets: + - name: acr-pull + # VNet-injected: pin runner pods to the node pool on your VNet subnet. + # nodeSelector: + # agentpool: vnetpool + containers: + - name: runner + # MUST be the Git-Ape custom image (az/gh/jq). The stock + # ghcr.io/actions/actions-runner image lacks those tools and every + # deployment step calling az/gh/jq will fail. Build it per the README + # ("Custom runner image") and push to your ACR. ARC overrides the + # command below, so the image's self-register entrypoint is unused here. + image: .azurecr.io/git-ape-runner:latest + command: ["/home/runner/run.sh"] diff --git a/.github/skills/git-ape-onboarding/templates/runners/entrypoint.sh b/.github/skills/git-ape-onboarding/templates/runners/entrypoint.sh new file mode 100644 index 0000000..ec7bc50 --- /dev/null +++ b/.github/skills/git-ape-onboarding/templates/runners/entrypoint.sh @@ -0,0 +1,104 @@ +#!/usr/bin/env bash +# Git-Ape self-hosted runner entrypoint. +# +# The official GitHub Actions runner image (ghcr.io/actions/actions-runner) ships +# the runner binary but NO registration entrypoint — on Kubernetes the Actions +# Runner Controller supplies one, but standalone container hosts (ACI, ACA Jobs) +# have nothing to register the runner. This script is that missing layer: it +# exchanges ACCESS_TOKEN (a fine-grained PAT with administration:write, or a +# GitHub App installation token) for a short-lived registration token, configures +# an ephemeral runner, starts it, and deregisters on shutdown. +# +# It honors the same environment-variable contract the ACI/ACA templates set: +# ACCESS_TOKEN, RUNNER_SCOPE (repo|org), REPO_URL or ORG_NAME, LABELS, +# RUNNER_NAME_PREFIX, EPHEMERAL, DISABLE_AUTO_UPDATE. +# +# On AKS, ARC overrides the container command (command: ["/home/runner/run.sh"]), +# so this entrypoint is bypassed there. +set -euo pipefail + +RUNNER_HOME="${RUNNER_HOME:-/home/runner}" +cd "${RUNNER_HOME}" + +: "${ACCESS_TOKEN:?ACCESS_TOKEN (GitHub PAT or App installation token) is required}" +RUNNER_SCOPE="${RUNNER_SCOPE:-repo}" +LABELS="${LABELS:-git-ape-runner}" +EPHEMERAL="${EPHEMERAL:-true}" +DISABLE_AUTO_UPDATE="${DISABLE_AUTO_UPDATE:-true}" +GITHUB_API="${GITHUB_API_URL:-https://api.github.com}" +API_VERSION="2022-11-28" + +case "${RUNNER_SCOPE}" in + org) + : "${ORG_NAME:?ORG_NAME is required for org-scoped runners}" + REG_URL="https://github.com/${ORG_NAME}" + RUNNERS_API="${GITHUB_API}/orgs/${ORG_NAME}/actions/runners" + ;; + repo) + : "${REPO_URL:?REPO_URL is required for repo-scoped runners}" + REG_URL="${REPO_URL}" + owner_repo="${REPO_URL#https://github.com/}" + RUNNERS_API="${GITHUB_API}/repos/${owner_repo}/actions/runners" + ;; + *) + echo "Unsupported RUNNER_SCOPE '${RUNNER_SCOPE}' (expected 'repo' or 'org')" >&2 + exit 1 + ;; +esac + +# Exchange the PAT/App token for a short-lived registration or remove token. +# $1 = registration | remove +runner_token() { + curl -fsSL -X POST \ + -H "Authorization: Bearer ${ACCESS_TOKEN}" \ + -H "Accept: application/vnd.github+json" \ + -H "X-GitHub-Api-Version: ${API_VERSION}" \ + "${RUNNERS_API}/$1-token" | jq -r '.token' +} + +RUNNER_NAME="${RUNNER_NAME_PREFIX:-git-ape-runner}-$(hostname)-${RANDOM}" + +echo "Requesting registration token (${RUNNER_SCOPE} scope) ..." +REG_TOKEN="$(runner_token registration)" +if [ -z "${REG_TOKEN}" ] || [ "${REG_TOKEN}" = "null" ]; then + echo "Failed to obtain a registration token. Check that ACCESS_TOKEN has" >&2 + echo "administration:write (repo) or self-hosted runner admin (org) rights." >&2 + exit 1 +fi + +config_args=( + --url "${REG_URL}" + --token "${REG_TOKEN}" + --name "${RUNNER_NAME}" + --labels "${LABELS}" + --work _work + --unattended + --replace +) +[ "${EPHEMERAL}" = "true" ] && config_args+=(--ephemeral) +[ "${DISABLE_AUTO_UPDATE}" = "true" ] && config_args+=(--disableupdate) + +./config.sh "${config_args[@]}" + +deregister() { + echo "Removing runner registration ..." + local rm_token + rm_token="$(runner_token remove || true)" + if [ -n "${rm_token}" ] && [ "${rm_token}" != "null" ]; then + ./config.sh remove --token "${rm_token}" || true + fi +} + +./run.sh & +RUNNER_PID=$! +trap 'kill -TERM "${RUNNER_PID}" 2>/dev/null || true' INT TERM + +set +e +wait "${RUNNER_PID}" +EXIT_CODE=$? +set -e + +# Ephemeral runners deregister themselves after one job; this is a safety net for +# non-ephemeral runners and graceful-shutdown signals (idempotent on re-run). +deregister +exit "${EXIT_CODE}" diff --git a/.github/skills/git-ape-onboarding/templates/workflows/git-ape-deploy.yml b/.github/skills/git-ape-onboarding/templates/workflows/git-ape-deploy.yml index 0b2c3d3..52c7e5c 100644 --- a/.github/skills/git-ape-onboarding/templates/workflows/git-ape-deploy.yml +++ b/.github/skills/git-ape-onboarding/templates/workflows/git-ape-deploy.yml @@ -36,7 +36,7 @@ concurrency: jobs: detect-deployments: name: Detect deployments to execute - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} outputs: deployment_ids: ${{ steps.find.outputs.deployment_ids }} has_deployments: ${{ steps.find.outputs.has_deployments }} @@ -80,7 +80,7 @@ jobs: name: "Deploy: ${{ matrix.deployment_id }}" needs: [detect-deployments] if: needs.detect-deployments.outputs.has_deployments == 'true' - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} environment: azure-deploy strategy: matrix: diff --git a/.github/skills/git-ape-onboarding/templates/workflows/git-ape-destroy.yml b/.github/skills/git-ape-onboarding/templates/workflows/git-ape-destroy.yml index 651432b..fb4bec1 100644 --- a/.github/skills/git-ape-onboarding/templates/workflows/git-ape-destroy.yml +++ b/.github/skills/git-ape-onboarding/templates/workflows/git-ape-destroy.yml @@ -40,7 +40,7 @@ concurrency: jobs: detect-destroys: name: Detect destroy requests - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} outputs: deployment_ids: ${{ steps.find.outputs.deployment_ids }} has_destroys: ${{ steps.find.outputs.has_destroys }} @@ -130,7 +130,7 @@ jobs: name: "Destroy: ${{ matrix.deployment_id }}" needs: detect-destroys if: needs.detect-destroys.outputs.has_destroys == 'true' - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} environment: azure-destroy strategy: matrix: diff --git a/.github/skills/git-ape-onboarding/templates/workflows/git-ape-plan.yml b/.github/skills/git-ape-onboarding/templates/workflows/git-ape-plan.yml index c0d6d68..9a4eb5d 100644 --- a/.github/skills/git-ape-onboarding/templates/workflows/git-ape-plan.yml +++ b/.github/skills/git-ape-onboarding/templates/workflows/git-ape-plan.yml @@ -28,7 +28,7 @@ concurrency: jobs: detect-deployments: name: Detect changed deployments - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} outputs: deployment_ids: ${{ steps.find.outputs.deployment_ids }} has_deployments: ${{ steps.find.outputs.has_deployments }} @@ -77,7 +77,7 @@ jobs: name: "Plan Local: ${{ matrix.deployment_id }}" needs: detect-deployments if: needs.detect-deployments.outputs.has_deployments == 'true' - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} strategy: matrix: deployment_id: ${{ fromJson(needs.detect-deployments.outputs.deployment_ids) }} @@ -365,7 +365,7 @@ jobs: name: "Plan Azure: ${{ matrix.deployment_id }}" needs: detect-deployments if: needs.detect-deployments.outputs.has_deployments == 'true' - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} strategy: matrix: deployment_id: ${{ fromJson(needs.detect-deployments.outputs.deployment_ids) }} @@ -549,7 +549,7 @@ jobs: name: "Plan Comment: ${{ matrix.deployment_id }}" needs: [detect-deployments, plan-local, plan-azure] if: always() && needs.detect-deployments.outputs.has_deployments == 'true' - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} strategy: matrix: deployment_id: ${{ fromJson(needs.detect-deployments.outputs.deployment_ids) }} diff --git a/.github/skills/git-ape-onboarding/templates/workflows/git-ape-verify.yml b/.github/skills/git-ape-onboarding/templates/workflows/git-ape-verify.yml index b838f60..72a81fb 100644 --- a/.github/skills/git-ape-onboarding/templates/workflows/git-ape-verify.yml +++ b/.github/skills/git-ape-onboarding/templates/workflows/git-ape-verify.yml @@ -13,7 +13,7 @@ permissions: jobs: verify: name: Verify Git-Ape configuration - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} steps: - uses: actions/checkout@v6 @@ -151,6 +151,24 @@ jobs: fi done + - name: Check runner configuration + env: + RUNNER_LABEL: ${{ vars.GIT_APE_RUNNER_LABEL }} + run: | + echo "## Runner Configuration" + echo "" + # Git-Ape workflows resolve `runs-on` from the GIT_APE_RUNNER_LABEL + # variable, falling back to GitHub-hosted `ubuntu-latest` when unset. + if [[ -z "$RUNNER_LABEL" ]]; then + echo "ℹ️ GIT_APE_RUNNER_LABEL is unset — jobs run on GitHub-hosted runners (ubuntu-latest)." + echo " To switch to private/self-hosted runners, provision them via" + echo " @Git-Ape Onboarding and set the GIT_APE_RUNNER_LABEL variable." + else + echo "✅ GIT_APE_RUNNER_LABEL is set to '$RUNNER_LABEL' — jobs target self-hosted runners with this label." + echo " Ensure at least one online runner is registered with the '$RUNNER_LABEL' label," + echo " otherwise deployment jobs will queue indefinitely." + fi + - name: Print summary if: always() run: | diff --git a/website/docs/agents/git-ape-onboarding.md b/website/docs/agents/git-ape-onboarding.md index eef2c6e..882ae69 100644 --- a/website/docs/agents/git-ape-onboarding.md +++ b/website/docs/agents/git-ape-onboarding.md @@ -70,7 +70,7 @@ Always use the `/git-ape-onboarding` skill for procedure and command patterns. ## Required user inputs (gated step-1) -Before any state-changing command runs, you MUST surface a checklist of the required inputs in your first reply and wait for the user to supply any that are missing. Even when the user's opening prompt already names a few (e.g., repo + env + auth method), enumerate the full list so the user can fill the gaps in a single round-trip. At minimum, request the following **six** inputs (rendered as a numbered list, table, or explicit question block — never inferred silently): +Before any state-changing command runs, you MUST surface a checklist of the required inputs in your first reply and wait for the user to supply any that are missing. Even when the user's opening prompt already names a few (e.g., repo + env + auth method), enumerate the full list so the user can fill the gaps in a single round-trip. At minimum, request the following **seven** inputs (rendered as a numbered list, table, or explicit question block — never inferred silently): 1. **Target GitHub repository** — `/` plus confirmation of the default branch (assume `main`; only change if the user explicitly says otherwise — never silently substitute `master`). 2. **Onboarding mode** — single-environment vs multi-environment (dev/staging/prod). Even if the prompt names one, restate it explicitly for confirmation. @@ -78,14 +78,15 @@ Before any state-changing command runs, you MUST surface a checklist of the requ 4. **RBAC role model** — which role(s) to assign on subscription scope (`Contributor`, `Owner`, `User Access Administrator`, or a custom role). Default suggestion: `Contributor`. 5. **Default Azure region** — primary region for the workload (e.g., `eastus`, `westus2`). Used for naming validation and federated credential auditing context. 6. **Project / deployment name** — short slug used to name the App Registration (`sp--`), federated credentials (`fc---main-branch`), and downstream Git-Ape deployments. +7. **Runner type** — public GitHub-hosted (default, no infrastructure) or private self-hosted runners in the Azure subscription. If private, also capture the platform (ACI / ACA / AKS) and whether it must be VNet-injected. Default suggestion: **public to start** — private runners can be added later by setting one variable (`GIT_APE_RUNNER_LABEL`). -Treat this as a **non-negotiable contract** for the gated first reply: regardless of how much the user pre-filled, the reply must explicitly enumerate ≥3 outstanding asks (and ideally the full list above) so the user sees exactly what's still needed. Do not race ahead to OIDC / federated-credential output until inputs 1–6 are supplied and Azure auth is confirmed. +Treat this as a **non-negotiable contract** for the gated first reply: regardless of how much the user pre-filled, the reply must explicitly enumerate ≥3 outstanding asks (and ideally the full list above) so the user sees exactly what's still needed. Do not race ahead to OIDC / federated-credential output until inputs 1–7 are supplied and Azure auth is confirmed. ## Workflow 1. Confirm target repository URL **and default branch** (input #1 above). 2. Ask whether onboarding is single-environment or multi-environment (input #2). -3. Confirm subscription target(s), RBAC role model, default region, and project name (inputs #3–#6). +3. Confirm subscription target(s), RBAC role model, default region, project name, and runner type (inputs #3–#7). 4. Validate prerequisites: - `az`, `gh`, `jq` installed - Azure authenticated (`az account show`) @@ -98,9 +99,10 @@ Treat this as a **non-negotiable contract** for the gated first reply: regardles - macOS / Linux / WSL: `./scripts/scaffold-repo.sh` - Windows (PowerShell 7+): `pwsh ./scripts/scaffold-repo.ps1` Both scripts produce byte-identical output. Report which files were created vs skipped. -9. Ask compliance framework and enforcement mode preferences (Step 10 in `/git-ape-onboarding` skill playbook). +9. Ask compliance framework and enforcement mode preferences (Step 11 in `/git-ape-onboarding` skill playbook). 10. Update the `## Compliance & Azure Policy` section in `.github/copilot-instructions.md` with the user's choices. If the file was skipped by the scaffold step or lacks that section, surface the captured preferences in chat for manual integration instead of mutating the file. -11. Summarize created/updated artifacts and next checks. +11. Select the runner type (input #7). If private runners were chosen, point the user at `./templates/runners//` for the reference IaC, have them provision it (sourcing the GitHub credential from Key Vault, never inlined), confirm the runner is online, and set the `GIT_APE_RUNNER_LABEL` variable. If public, leave the variable unset. (Step 12 in `/git-ape-onboarding` skill playbook.) +12. Summarize created/updated artifacts and next checks. ## Output Requirements diff --git a/website/docs/getting-started/onboarding.md b/website/docs/getting-started/onboarding.md index 665b963..b0118cf 100644 --- a/website/docs/getting-started/onboarding.md +++ b/website/docs/getting-started/onboarding.md @@ -113,13 +113,14 @@ or: /git-ape-onboarding ``` -The skill collects five inputs (or uses sensible defaults): +The skill collects six inputs (or uses sensible defaults): 1. **GitHub repository URL** — for example, `https://github.com/your-org/your-repo` 2. **Entra ID App Registration name** — for example, `sp-git-ape-your-repo` 3. **Mode** — single or multi-environment 4. **Azure subscription(s)** — defaults to your current `az` subscription 5. **RBAC role(s)** — Contributor (default) or Contributor + User Access Administrator +6. **Runner type** — public GitHub-hosted (default) or private self-hosted runners in your Azure subscription (ACI / ACA / AKS, optionally VNet-injected). You can start public and switch later. ### Example: single environment @@ -488,6 +489,82 @@ This adds four workflows: | `git-ape-destroy.yml` | PR merge with `destroy-requested` status | Delete resource group | | `git-ape-verify.yml` | Manual dispatch | Verify OIDC and RBAC health | +All four workflows resolve their runner from a single variable, so they run on +public GitHub-hosted runners by default and can be pointed at private runners +later without editing the workflows: + +```yaml +runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} +``` + +--- + +### Step 6: Choose your runner (optional) + +By default the workflows run on **public GitHub-hosted `ubuntu-latest`** — no +infrastructure required. To run them on **private self-hosted runners** inside +your Azure subscription (for private connectivity or policy reasons), provision +runners and set one variable. + +| `GIT_APE_RUNNER_LABEL` | Effect | +|------------------------|--------| +| **unset** (default) | Jobs run on GitHub-hosted `ubuntu-latest`. | +| set to a label (default `git-ape-runner`) | Jobs target your self-hosted runners registered with that label. | + +**Runner types:** + +- **Self-hosted (subscription)** — runners are Azure resources with outbound + internet; control over image, region, and identity without a VNet. +- **VNet-injected** — runners run inside a subnet you manage, for private + connectivity to Azure resources (no public egress except to GitHub). + +**Platforms** (on-demand reference IaC ships with the onboarding skill under +[`templates/runners/`](https://github.com/Azure/git-ape/tree/main/.github/skills/git-ape-onboarding/templates/runners)): + +| Platform | What it provisions | +|----------|--------------------| +| **ACI** | ARM `template.json` — a container group running an ephemeral runner. | +| **ACA** (recommended) | Deployed as a **Git-Ape deployment** — see below. A KEDA `github-runner`-scaled Container Apps Job (scale-to-zero) with managed identity, ACR, and Key Vault. | +| **AKS** | Actions Runner Controller (ARC) via Helm `values.yaml`. | + +#### Git-Ape deploying Git-Ape (ACA runners) + +The ACA runner path is itself a **first-class Git-Ape deployment**. Rather than an +imperative `az deployment group create`, onboarding scaffolds a subscription-scoped +deployment artifact to `.azure/deployments/git-ape-runners/` +(source: [`templates/deployments/git-ape-runners/`](https://github.com/Azure/git-ape/tree/main/.github/skills/git-ape-onboarding/templates/deployments/git-ape-runners)). +The runner infrastructure then flows through Git-Ape's own managed pipeline: + +- **Architecture diagram** — `architecture.md` (topology + bootstrap sequence). +- **Cost estimate** — via the `azure-cost-estimator` skill on the template. +- **Deploy** — `az stack sub create --action-on-unmanage deleteAll` (the same + Deployment Stack primitive every Git-Ape deployment uses), producing `state.json`. +- **Destroy** — single-command teardown via the `azure-stack-destroy` skill. + +The PAT is never in git, ARM parameters, or deployment history — the stack creates +an empty Key Vault and the token is set post-deploy with `az keyvault secret set`; +the ACA Job reads it as a Key Vault secret reference. + +**Bootstrap ordering (the self-hosting loop):** the first `git-ape-runners` deploy +runs on `ubuntu-latest` (or your local machine) because the private runner does not +exist yet. Once it is online and `GIT_APE_RUNNER_LABEL` is set, every subsequent +Git-Ape run — including updates to the runner stack itself — executes on the private +runner. Git-Ape ends up deploying and maintaining the very runners that run Git-Ape. + +Once a runner is online (with the `git-ape-runner` label), flip the switch: + +```bash +# Switch to private runners +gh variable set GIT_APE_RUNNER_LABEL --repo your-org/your-repo --body "git-ape-runner" + +# Clean fallback to GitHub-hosted runners +gh variable delete GIT_APE_RUNNER_LABEL --repo your-org/your-repo +``` + +The GitHub registration credential is the only secret — source it from Key Vault, +never inline it. Azure access uses a user-assigned managed identity. Run +`@Git-Ape Onboarding` and pick a private runner to be walked through provisioning. + --- ## Verify your setup {#verify-setup} diff --git a/website/docs/skills/git-ape-onboarding.md b/website/docs/skills/git-ape-onboarding.md index 37b9804..e284dac 100644 --- a/website/docs/skills/git-ape-onboarding.md +++ b/website/docs/skills/git-ape-onboarding.md @@ -68,9 +68,10 @@ This skill configures: 2. OIDC federated credentials for GitHub Actions 3. RBAC role assignment(s) on subscription scope 4. GitHub environments (`azure-deploy*`, `azure-destroy`) -5. Required GitHub secrets (`AZURE_CLIENT_ID`, `AZURE_TENANT_ID`) and the `AZURE_SUBSCRIPTION_ID` variable +5. Required GitHub secrets (`AZURE_CLIENT_ID`, `AZURE_TENANT_ID`) and the `AZURE_SUBSCRIPTION_ID` variable, plus the optional `GIT_APE_RUNNER_LABEL` variable that selects private runners 6. Scaffolded GitHub Actions workflow files (`git-ape-plan.yml`, `-deploy.yml`, `-destroy.yml`, `-verify.yml`, `-drift.{md,lock.yml}`) and deployment standards (`.github/copilot-instructions.md`) into the user's working copy 7. *(Optional)* The `COPILOT_GITHUB_TOKEN` repository secret that powers the agentic drift-detection workflow (`git-ape-drift.lock.yml`) — only when the user opts into scheduled drift detection +8. The GitHub Actions **runner type** the workflows run on — public GitHub-hosted (default), **hosted compute networking** (GitHub-managed runners with Azure private networking, requires GHEC), or self-hosted runners in your Azure subscription (ACI / ACA / AKS). On-demand IaC for private runners ships at `./templates/runners/`. ACA runners are deployed as a first-class **Git-Ape deployment** (`./templates/deployments/git-ape-runners/`) — Git-Ape deploying Git-Ape — so they get an architecture diagram, cost estimate, managed deploy, and single-command destroy. ## Prerequisites @@ -160,12 +161,13 @@ OIDC_PREFIX="repository_owner_id::repository_id:" - `fc-azure-deploy` subject `"$OIDC_PREFIX:environment:azure-deploy"` (one per environment in multi-env mode) - `fc-azure-destroy` subject `"$OIDC_PREFIX:environment:azure-destroy"` 6. Assign RBAC on each target subscription. -7. Set GitHub repo or environment secrets (`AZURE_CLIENT_ID`, `AZURE_TENANT_ID`) and the `AZURE_SUBSCRIPTION_ID` variable. +7. Set GitHub repo or environment secrets (`AZURE_CLIENT_ID`, `AZURE_TENANT_ID`) and the `AZURE_SUBSCRIPTION_ID` variable. (The `GIT_APE_RUNNER_LABEL` variable is set later in Step 12 only if private runners are chosen.) 8. Create GitHub environments and branch policies when permissions allow. 9. Scaffold workflow files and deployment standards into the user's working copy (see below). 10. *(Optional)* Provision the drift detector engine credential (`COPILOT_GITHUB_TOKEN`) so the agentic drift workflow can run (see below). 11. Capture compliance and Azure Policy preferences (see below). -12. Verify federated credentials, role assignments, and secrets. +12. Select the GitHub Actions runner type (public / hosted compute networking / self-hosted) and, if private runners are chosen, provision them and set `GIT_APE_RUNNER_LABEL` (see below). +13. Verify federated credentials, role assignments, and secrets. ### Step 9: Scaffold workflow files and deployment standards @@ -292,6 +294,415 @@ After RBAC and environment setup, ask the user about compliance requirements and preferences and a suggested patch in chat so the user can apply it. - In all cases, leave changes unstaged and let the user commit them. +### Step 12: Runner Selection & Provisioning (optional) + +Git-Ape workflows resolve their runner from a single variable: + +```yaml +runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} +``` + +Unset → public GitHub-hosted `ubuntu-latest` (the default; no infrastructure). +Set to a label → private runners with that label. This is the **bootstrap model: +start public, switch to private later with one variable.** + +1. **Ask the runner type:** + ``` + What runner should the Git-Ape workflows run on? + - Public GitHub-hosted (recommended to start — no infrastructure) + - Hosted compute networking (GitHub-managed runners in your Azure VNet — requires GHEC) + - Self-hosted in my Azure subscription (you manage compute, image, scaling) + ``` + +2. **If public (default):** do nothing. Leave `GIT_APE_RUNNER_LABEL` unset. + Onboarding is complete; the user can switch to private runners any time by + repeating this step. + +3. **If hosted compute networking:** + Follow the hosted compute sub-flow (Step 12a below). + +4. **If self-hosted:** + Follow the self-hosted sub-flow (Step 12b below). + +--- + +### Step 12a: Hosted Compute Networking (GitHub-managed, Azure private networking) + +GitHub-hosted runners with Azure private networking. GitHub manages the compute +(full Ubuntu images with `az`, `gh`, `jq`, `git` pre-installed), runners execute +inside your Azure VNet for private connectivity. + +**Prerequisites:** GitHub Enterprise Cloud. + +**Reference:** [About networking for hosted compute products](https://docs.github.com/en/enterprise-cloud@latest/admin/configuring-settings/configuring-private-networking-for-hosted-compute-products/about-networking-for-hosted-compute-products-in-your-enterprise) + +#### a. Consolidate GitHub auth scopes first + +Before starting provisioning, authenticate with **all required scopes in one +call** to avoid repeated auth prompts: + +```bash +gh auth refresh -h github.com -s admin:org,admin:enterprise,manage_runners:org,read:enterprise,write:network_configurations +``` + +| Scope | Purpose | +|-------|---------| +| `admin:org` | Create org-level runner groups, assign repos | +| `admin:enterprise` | Enterprise-level runner groups and hosted runners | +| `manage_runners:org` | Create/manage hosted runners | +| `read:enterprise` | Query enterprise metadata (databaseId, org membership) | +| `write:network_configurations` | Create network configurations | + +#### b. Ask scope: organization or enterprise + +``` +Where should the network configuration live? +- Enterprise level (shared across all orgs in the enterprise) +- Organization level (scoped to this org only) +``` + +| Scope | `businessId` value | UI location | +|-------|-------------------|-------------| +| **Enterprise** | Enterprise `databaseId` (from GraphQL) | Enterprise Settings → Hosted compute networking | +| **Organization** | Org numeric ID (REST: `.id` field) | Org Settings → Hosted compute networking | + +Query the needed ID: +```bash +# Enterprise databaseId (for enterprise scope): +gh api graphql -f query='{enterprise(slug: "") { databaseId }}' --jq '.data.enterprise.databaseId' + +# Org numeric ID (for org scope): +gh api orgs/ --jq '.id' +``` + +#### c. Provision Azure networking + +1. Create resource group and VNet with a `/28` subnet (minimum 16 IPs): + ```bash + az group create --name --location + az network vnet create --name --resource-group \ + --address-prefix 10.0.0.0/16 --subnet-name snet-runners --subnet-prefix 10.0.0.0/28 + ``` + +2. Delegate subnet to `GitHub.Network/networkSettings`: + ```bash + az network vnet subnet update --name snet-runners --vnet-name \ + --resource-group --delegations GitHub.Network/networkSettings + ``` + +3. Register `GitHub.Network` resource provider: + ```bash + az provider register --namespace GitHub.Network + # Wait until Registered: + az provider show --namespace GitHub.Network --query "registrationState" -o tsv + ``` + +4. Create `GitHub.Network/networkSettings` resource: + ```bash + az rest --method PUT \ + --url "https://management.azure.com/subscriptions//resourceGroups//providers/GitHub.Network/networkSettings/?api-version=2024-04-02" \ + --body '{ + "location": "", + "properties": { + "businessId": "", + "subnetId": "" + } + }' + ``` + ⚠️ **`businessId` is immutable.** If wrong, you must delete and recreate. + +5. Extract the `GitHubId` tag from the resource — this is the ID GitHub uses: + ```bash + az rest --method GET \ + --url "https://management.azure.com/subscriptions//resourceGroups//providers/GitHub.Network/networkSettings/?api-version=2024-04-02" \ + --query "tags.GitHubId" -o tsv + ``` + +#### d. Create GitHub network configuration + +Use the **`GitHubId` tag value** (NOT the Azure resource ID): + +```bash +# Enterprise scope: +gh api --method POST enterprises//network-configurations \ + -f name="" -f compute_service="actions" \ + -f network_settings_ids[]="" + +# Organization scope: +gh api --method POST orgs//settings/network-configurations \ + -f name="" -f compute_service="actions" \ + -f network_settings_ids[]="" +``` + +Save the returned `id` — needed for the runner group. + +#### e. Create runner group and hosted runner + +```bash +# Enterprise scope: +gh api --method POST enterprises//actions/runner-groups \ + -f name="" -f visibility="selected" \ + -F allows_public_repositories=false \ + -f network_configuration_id="" + +# Assign the org to the enterprise runner group: +gh api --method PUT enterprises//actions/runner-groups//organizations/ + +# For enterprise groups: also assign the repo at org level (inherited group ID): +gh api orgs//actions/runner-groups --jq '.runner_groups[] | select(.name=="") | .id' +gh api --method PUT orgs//actions/runner-groups//repositories/ +``` + +```bash +# Query available images and sizes: +gh api orgs//actions/hosted-runners/images/github-owned --jq '.images[] | {id, display_name, platform}' +gh api orgs//actions/hosted-runners/machine-sizes --jq '.machine_specs[:5] | .[] | {id, cpu_cores, memory_gb}' + +# Create hosted runner (image IDs are NUMERIC, sizes are like "4-core"): +echo '{"name":"","runner_group_id":,"platform":"linux-x64","image":{"id":"","source":"github"},"size":"4-core","maximum_runners":5}' | \ + gh api --method POST enterprises//actions/hosted-runners --input - +``` + +Wait for `status: "Ready"`: +```bash +gh api enterprises//actions/hosted-runners --jq '.runners[] | {name, status}' +``` + +#### f. Set variable and verify + +```bash +gh variable set GIT_APE_RUNNER_LABEL --repo / --body "" +gh workflow run git-ape-verify.yml --repo / +``` + +Confirm all steps pass — no custom image needed, GitHub provides everything. + +--- + +### Step 12b: Self-Hosted Runners (ACI / ACA / AKS) + +Self-hosted runners run in your Azure subscription. You manage compute, image, +scaling, and networking. + +1. **Ask the platform:** + ``` + Which Azure platform should host the runners? + - ACA — Azure Container Apps (event-driven, ephemeral, scale-to-zero) — RECOMMENDED + - ACI — Azure Container Instances (simplest; a handful of runners) + - AKS — Azure Kubernetes Service (Actions Runner Controller; large scale) + ``` + + **ACA is deployed as a Git-Ape deployment** (Git-Ape deploying Git-Ape), so the + runner infrastructure gets an architecture diagram, cost estimate, managed + deploy, and single-command destroy — see the ACA path directly below. **ACI** + and **AKS** use the imperative provisioning steps further down. + +#### ACA — deploy runners as a Git-Ape deployment (recommended) + +Instead of an imperative `az deployment group create`, scaffold the runner +infrastructure as a first-class Git-Ape deployment and deploy it through the +normal Git-Ape stack flow. The template is a subscription-scoped Deployment Stack, +so destroy is a single idempotent command and the PAT lives in Key Vault (never in +git or ARM parameters). + +1. **Scaffold the deployment artifact** into the working copy and set inputs: + ```bash + mkdir -p .azure/deployments/git-ape-runners + cp -R .github/skills/git-ape-onboarding/templates/deployments/git-ape-runners/. \ + .azure/deployments/git-ape-runners/ + # set githubOwnerRepo (+ any overrides) — NEVER put the PAT here: + $EDITOR .azure/deployments/git-ape-runners/parameters.json + ``` + `template.json` creates, in one self-contained stack: the resource group, a + user-assigned identity, an ACR, an `AcrPull` role assignment, a Key Vault, a + `Key Vault Secrets User` role assignment, an ACA managed environment, and the + event-driven ACA Job. + +2. **Deploy the stack.** The first deploy runs on a public runner or locally, + because the private runner does not exist yet: + ```bash + /azure-stack-deploy git-ape-runners # local (VS Code / terminal) + ``` + In CI, open a PR that adds `.azure/deployments/git-ape-runners/`; the + `git-ape-deploy.yml` workflow deploys it on `ubuntu-latest` and writes + `state.json` plus the architecture/cost artifacts, exactly like any other + Git-Ape deployment. + +3. **Build & push the runner image** into the ACR the stack just created (the + stock `actions-runner` image lacks `az`/`gh`/`jq` and self-registration — see + the Dockerfile note in the imperative path below): + ```bash + ACR=$(jq -r '.acrLoginServer.value' .azure/deployments/git-ape-runners/state.json) + az acr build --registry "${ACR%%.*}" --image git-ape-runner:latest \ + --file .github/skills/git-ape-onboarding/templates/runners/Dockerfile \ + .github/skills/git-ape-onboarding/templates/runners/ --no-logs + ``` + +4. **Write the GitHub PAT into Key Vault** — never in git, ARM params, or chat + output. Collect a long-lived fine-grained PAT exactly as in the imperative + "Collect a GitHub PAT" step below (never a short-lived registration token): + ```bash + KV=$(jq -r '.keyVaultName.value' .azure/deployments/git-ape-runners/state.json) + az keyvault secret set --vault-name "$KV" --name github-pat \ + --value '' --output none + ``` + The ACA Job reads it at runtime through a Key Vault secret reference + (`keyVaultUrl` + user-assigned `identity`), enabled by the in-template + `Key Vault Secrets User` role assignment. + +5. **Point workflows at the runner** — after this, re-deploys of this very stack + run on it (the self-hosting loop): + ```bash + gh variable set GIT_APE_RUNNER_LABEL --repo / --body "git-ape-runner" + ``` + +6. **Destroy** with one command when no longer needed (tears down the whole stack + and purges the soft-deleted Key Vault): + ```bash + /azure-stack-destroy git-ape-runners + ``` + +See `templates/deployments/git-ape-runners/README.md` for the full walkthrough and +`templates/deployments/git-ape-runners/architecture.md` for the topology and +bootstrap sequence. + +#### ACI / AKS — imperative provisioning + +1. **Build the custom runner image using ACR Tasks (cloud build).** The base + `ghcr.io/actions/actions-runner:latest` (GitHub's official runner image) does + **NOT** include `az`, `gh`, or `jq`, and ships no registration entrypoint. + Workflows fail with `Unable to locate executable file: az` — and on ACI/ACA + the runner never registers — without a custom image. + + Always build via **ACR Tasks** (cloud build) — never local Docker. This + avoids Windows CRLF line-ending corruption of `entrypoint.sh` and eliminates + the need for a local Docker install. + ```bash + # Create ACR (one-time) — no --admin-enabled; use managed identity for pulls + az acr create --name --resource-group --location --sku Basic + + # Build and push image (runs in Azure, ~3 min, no local Docker needed) + # On Windows, add --no-logs to avoid a Unicode encoding crash in log streaming + az acr build --registry --image git-ape-runner:latest \ + --file ./templates/runners/Dockerfile ./templates/runners/ --no-logs + ``` + The `Dockerfile` at `./templates/runners/Dockerfile` extends the base runner + with all Git-Ape prerequisites (`az`, `gh`, `jq`, `git`) and an `entrypoint.sh` + that self-registers the runner on ACI/ACA (on AKS, ARC handles registration). + It includes a `sed` safety net that strips CRLF line endings from + `entrypoint.sh` at build time. + + After the build, verify the image exists: + ```bash + az acr repository list --name -o table + ``` + +2. **Create a managed identity and assign `AcrPull` role** for image pulls: + ```bash + # Create identity + az identity create --name id-git-ape-runner --resource-group --location + + # Get IDs + IDENTITY_ID=$(az identity show --name id-git-ape-runner --resource-group --query id -o tsv) + PRINCIPAL_ID=$(az identity show --name id-git-ape-runner --resource-group --query principalId -o tsv) + ACR_ID=$(az acr show --name --query id -o tsv) + + # Assign AcrPull role (may take 30–60s to propagate) + az role assignment create --assignee-object-id $PRINCIPAL_ID --assignee-principal-type ServicePrincipal \ + --role AcrPull --scope $ACR_ID + ``` + **Do NOT use ACR admin credentials** (`--admin-enabled true` + username/password). + Managed identity is the secure, recommended approach. + +3. **Collect a GitHub PAT from the user.** The ACA/ACI runner needs a + **long-lived GitHub Personal Access Token (PAT)** — NOT a short-lived + registration token from `POST /actions/runners/registration-token`. + Registration tokens expire in ~1 hour, but the KEDA `github-runner` scaler + continuously polls the Actions queue AND each ephemeral runner re-registers + on every scale-up, so a long-lived PAT is required. + + **Ask the user to create a PAT** before deploying: + ``` + The self-hosted runner needs a GitHub Personal Access Token (PAT) for + continuous queue polling and runner registration. + + Please create a fine-grained PAT at: + https://github.com/settings/tokens?type=beta + + Required permissions (scoped to the target repo): + - Actions: Read & Write + - Administration: Read & Write (for runner registration) + + Alternatively, a classic PAT with the `repo` scope works. + + Paste the token when prompted — it will only be passed to the deployment + and will not be stored or displayed. + ``` + + **Do NOT generate a registration token** via the GitHub API + (`POST repos///actions/runners/registration-token`). These are + short-lived (~1 hour) and will cause the runner to fail with a 401 error + once expired. The KEDA scaler and ephemeral runner registration both need + a token that does not expire. + + Never print the token value in chat output (see Safe-Execution Rules). + +4. **Deploy the runner infrastructure (ACI).** Use the ACI template + (`templates/runners/aci/template.json`) — ACA is covered by the Git-Ape-managed + path above. Pass the custom image, ACR server, managed identity, and + user-provided PAT: + ```bash + az deployment group create -g -f ./templates/runners/aci/template.json \ + -p runnerImage='.azurecr.io/git-ape-runner:latest' \ + acrServer='.azurecr.io' \ + userAssignedIdentityId=$IDENTITY_ID \ + githubOwnerRepo='/' \ + githubAccessToken='' + ``` + - The ACA template's `registries` block automatically uses identity-based + auth when both `acrServer` and `userAssignedIdentityId` are set. + - The GitHub PAT is the only secret — for production, store it in Key Vault + and reference it; for initial setup, pass it directly at deploy time. + Never inline it in a committed `parameters.json`. + - For private networking, set the subnet parameter (`subnetId` for ACI, + `infrastructureSubnetId` for ACA, or a VNet node pool for AKS). + - For AKS, use `helm install` instead of ARM. + - **Note:** The ACA managed environment may take 1–2 minutes to fully + provision. If deploying step-by-step (not via ARM template), wait for the + environment's `provisioningState` to reach `Succeeded` before creating the + job. + +5. **Set `minExecutions=1`** (recommended) so at least one runner is always + warm and visible in GitHub Settings. Without this, KEDA scale-from-zero can + take 1–3 minutes on cold start, during which GitHub shows "No runners + configured": + ```bash + az containerapp job update --name git-ape-runner --resource-group --min-executions 1 + ``` + Leave at `0` only if you prefer true scale-to-zero and can tolerate cold-start + delays. + +6. **Confirm the runner is online** in *GitHub → Settings → Actions → Runners* + with the `git-ape-runner` label. (With `minExecutions=1`, a runner should + appear within 30–60 seconds of deployment.) + +7. **Set the variable** so workflows target it (repo-wide or per environment): + ```bash + gh variable set GIT_APE_RUNNER_LABEL --repo / --body "git-ape-runner" + # per environment instead: + gh variable set GIT_APE_RUNNER_LABEL --repo / --env azure-deploy --body "git-ape-runner" + ``` + Clean fallback to GitHub-hosted runners is `gh variable delete GIT_APE_RUNNER_LABEL`. + +9. **Verify** by triggering `Git-Ape: Verify Setup` and confirming all steps + pass on the private runner (especially "Test OIDC login" which requires `az`). + +10. **Continuous drift detection** (`git-ape-drift.lock.yml`) is a compiled gh-aw + workflow and does NOT honor `GIT_APE_RUNNER_LABEL`. To move drift onto a + private runner, set `runs-on:` in the source `git-ape-drift.md` frontmatter + and recompile with `gh aw compile` — never hand-edit the `.lock.yml` (it + carries an integrity hash). The other four workflows need no recompile. + ## Mode: Enterprise Distribution (`.github-private`) Use this mode to distribute Git-Ape to **everyone on an organization's or @@ -449,7 +860,9 @@ OIDC, RBAC, environments, and workflows. 7. *(Optional)* Offer to onboard the drift detector workflow by provisioning `COPILOT_GITHUB_TOKEN` (Step 10 in playbook). Skip if the user does not want scheduled drift detection. 8. Ask compliance framework and enforcement mode preferences (Step 11 in playbook). 9. Update `copilot-instructions.md` with compliance preferences — or, if the file was skipped by the scaffold step, surface the preferences in chat for manual integration. -10. Summarize outcome (including scaffolded file counts) and suggest verification commands. +10. Ask the runner type (and platform/scope if private), and — if private runners are chosen — provision the full stack. For **hosted compute networking**: consolidate gh auth scopes → ask org vs enterprise scope → provision Azure VNet + subnet → create GitHub.Network/networkSettings → create network config + runner group + hosted runner → assign repo → set `GIT_APE_RUNNER_LABEL` (Step 12a). For **self-hosted ACA (recommended)**: scaffold `.azure/deployments/git-ape-runners/` → deploy the subscription-scoped stack via `/azure-stack-deploy git-ape-runners` (first deploy on `ubuntu-latest`) → `az acr build` the runner image into the stack's ACR → `az keyvault secret set` the PAT (never a registration token) → set `GIT_APE_RUNNER_LABEL` (Step 12b, ACA path) — Git-Ape deploying Git-Ape. For **self-hosted ACI/AKS**: ask the user for a GitHub PAT → ACR (no admin) + cloud build via ACR Tasks (`--no-logs` on Windows) + managed identity with `AcrPull` role + ACI deployment with identity-based registry auth using user-provided PAT + `minExecutions=1` + `GIT_APE_RUNNER_LABEL` (Step 12b, imperative path). +11. **Verify** by triggering `Git-Ape: Verify Setup` and confirming ALL steps pass on the private runner. +12. Summarize outcome (including scaffolded file counts and the chosen runner type) and suggest verification commands. ### Enterprise distribution @@ -462,6 +875,184 @@ OIDC, RBAC, environments, and workflows. ## Known Gotchas +### Self-hosted: registration tokens don't work for KEDA-based runners + +**Never use `POST repos///actions/runners/registration-token`** to +generate the `githubAccessToken` for ACA/ACI runners. Registration tokens are +short-lived (~1 hour) and expire silently. Once expired: +- The KEDA `github-runner` scaler can no longer poll the Actions queue +- Each ephemeral runner fails to register on scale-up with a **401 Unauthorized** +- Runners appear as `offline` in GitHub Settings + +The `githubAccessToken` parameter requires a **long-lived GitHub PAT** because: +1. KEDA continuously polls the GitHub API every 30 seconds to detect queued jobs +2. Each ephemeral runner re-registers itself on every scale-up event +3. Both operations need a token that outlives any single job + +**Fix:** Always **ask the user** to create a fine-grained PAT +(`https://github.com/settings/tokens?type=beta`) with **Actions (Read & Write)** +and **Administration (Read & Write)** permissions scoped to the target repo. A +classic PAT with the `repo` scope also works. Never generate a registration +token programmatically — it will always fail after ~1 hour. + +### Hosted compute: `network_settings_ids` expects the GitHubId tag, not the Azure resource ID + +When creating a GitHub network configuration, the `network_settings_ids` field +expects the **`GitHubId` tag value** (a SHA-256 hash assigned by GitHub to the +Azure `GitHub.Network/networkSettings` resource), NOT the Azure resource ID path. + +```bash +# ❌ WRONG — Azure resource ID +-f network_settings_ids[]="/subscriptions/.../providers/GitHub.Network/networkSettings/my-resource" + +# ✅ CORRECT — GitHubId tag value from the Azure resource +-f network_settings_ids[]="FA1AD85973374477AF8C49119ADEA731EFD4B9BD6B7764A8FCD6B036CBA796F3" +``` + +Extract the GitHubId after creating the Azure resource: +```bash +az rest --method GET \ + --url "https://management.azure.com/subscriptions//resourceGroups//providers/GitHub.Network/networkSettings/?api-version=2024-04-02" \ + --query "tags.GitHubId" -o tsv +``` + +### Hosted compute: `businessId` is immutable and scope-specific + +The `businessId` on `GitHub.Network/networkSettings` determines whether the +resource works at enterprise or organization scope: +- **Enterprise scope:** use the enterprise `databaseId` (query via GraphQL) +- **Organization scope:** use the org's numeric ID (query via REST `.id` field) + +If wrong, the GitHub API returns `"The business ID is invalid or does not match"`. +The property is **immutable** — you cannot update it; you must delete and recreate. + +### Hosted compute: repeated auth prompts from missing scopes + +The hosted compute provisioning flow requires **5 distinct GitHub token scopes** +(`admin:org`, `admin:enterprise`, `manage_runners:org`, `read:enterprise`, +`write:network_configurations`). If not collected upfront, each missing scope +triggers a separate `gh auth refresh` device-code flow. + +**Fix:** Always consolidate auth at the start of Step 12a: +```bash +gh auth refresh -h github.com -s admin:org,admin:enterprise,manage_runners:org,read:enterprise,write:network_configurations +``` + +### Hosted compute: image and size IDs are GitHub-specific + +The hosted runners API uses **numeric image IDs** (e.g., `"2295"` = Ubuntu 24.04) +and **GitHub-specific size IDs** (e.g., `"4-core"`, `"8-core"`), not Azure VM SKU +names or Ubuntu version strings. + +Always query available options first: +```bash +gh api orgs//actions/hosted-runners/images/github-owned --jq '.images[] | {id, display_name}' +gh api orgs//actions/hosted-runners/machine-sizes --jq '.machine_specs[:10] | .[] | {id, cpu_cores, memory_gb}' +``` + +### Default runner image lacks required tools (self-hosted only) + +The base image `ghcr.io/actions/actions-runner:latest` (GitHub's official runner) +is a **minimal** self-hosted runner — it does NOT include `az`, `gh`, or `jq`, and +ships no registration entrypoint. If you deploy without the custom image, the +runner never registers on ACI/ACA and workflows fail with: + +``` +Error: Unable to locate executable file: az +``` + +**Fix:** Always build and use the custom image from `./templates/runners/Dockerfile`. +The onboarding flow must: +1. Create an ACR (`az acr create` — no `--admin-enabled`) +2. Build the image via ACR Tasks (`az acr build --no-logs` on Windows) +3. Create a managed identity with `AcrPull` role on the ACR +4. Deploy the template with `acrServer`, `userAssignedIdentityId`, and `runnerImage` + +### KEDA scale-from-zero cold start + +With `minExecutions=0` (the default), KEDA's `github-runner` scaler polls the +GitHub Actions queue every 30 seconds. On a fresh deployment or after long idle +periods, the first job can wait 1–3 minutes before a runner spins up. During +this time: +- GitHub shows the job as "Waiting for a runner to pick up this job" +- The Settings → Runners page shows "No runners configured" (ephemeral runners + only register while executing) + +**Fix:** Set `minExecutions=1` to keep one runner always warm. This costs +~$30–50/month on the Consumption plan but eliminates cold-start delays and +ensures a runner is always visible in GitHub Settings. + +### Windows CRLF corrupts `entrypoint.sh` (self-hosted only) + +When the `Dockerfile` build context is uploaded from a Windows checkout (where +`git autocrlf` converts LF to CRLF), `entrypoint.sh` gets `\r\n` line endings. +Linux interprets the shebang as `#!/usr/bin/env bash\r`, failing with: + +``` +'bash\r': No such file or directory +``` + +The runner container starts but never registers, and all executions fail +immediately. + +**Fix (belt-and-suspenders):** +1. The `Dockerfile` includes a `sed -i 's/\r$//'` line after `COPY entrypoint.sh` + that strips CRLF at build time — this is always safe and is a no-op on clean + LF files. +2. Prefer **ACR Tasks** (cloud build) over local `docker build` — ACR Tasks run + in Linux and handle the context correctly. +3. If building locally on Windows, ensure `.gitattributes` marks `*.sh` as + `text eol=lf`, or run `dos2unix entrypoint.sh` before building. + +### `az acr build` crashes on Windows (Unicode encoding) + +On Windows, `az acr build` may crash while streaming build logs with: + +``` +UnicodeEncodeError: 'charmap' codec can't encode character '\u2192' +``` + +This is a known Azure CLI bug — the `colorama` library on Windows can't encode +Unicode characters (like `→`) in `apt-get` output. The build itself may or may +not have completed in Azure before the crash. + +**Fix:** Always use `--no-logs` when running `az acr build` on Windows: +```bash +az acr build --registry --image git-ape-runner:latest \ + --file ... ... --no-logs +``` +The build runs in Azure regardless; `--no-logs` just skips the local log +streaming. Verify success with `az acr repository list --name `. + +### ACA managed environment provisioning delay + +The `Microsoft.App/managedEnvironments` resource can take 1–2 minutes to +provision. If you create the ACA job immediately after the environment, the +deployment may fail with `ManagedEnvironmentNotProvisioned`. + +**Fix:** When deploying via ARM template (`az deployment group create`), the +`dependsOn` in the template handles ordering automatically. When deploying +step-by-step (e.g., `az containerapp env create` followed by +`az containerapp job create`), poll the environment status first: +```bash +az containerapp env show --name --resource-group \ + --query "properties.provisioningState" -o tsv +# Wait until "Succeeded" before creating the job +``` + +### Stale workflow files in target repos + +If the target repo was onboarded before the `GIT_APE_RUNNER_LABEL` pattern was +introduced, its workflow files may have hardcoded `runs-on: ubuntu-latest`. The +private runner will never pick up jobs because workflows don't request its label. + +**Fix:** The scaffold helper (`scaffold-repo.sh` / `.ps1`) skips existing files. +To update stale workflows, the agent must either: +1. Detect the stale pattern (`grep 'runs-on: ubuntu-latest'`) and offer to + update all 4 workflow files with the dynamic pattern, OR +2. Advise the user to manually replace `runs-on: ubuntu-latest` with + `runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }}` in each job. + ### GitHub Org Custom OIDC Subject Template (e.g. Azure org) Some GitHub organizations (notably the `Azure` org) override the default OIDC subject diff --git a/website/docs/workflows/git-ape-deploy.md b/website/docs/workflows/git-ape-deploy.md index 672748b..a08522a 100644 --- a/website/docs/workflows/git-ape-deploy.md +++ b/website/docs/workflows/git-ape-deploy.md @@ -36,7 +36,7 @@ This workflow is **shipped as a template** under `.github/skills/git-ape-onboard | Property | Value | |----------|-------| | **Display Name** | Detect deployments to execute | -| **Runs On** | `ubuntu-latest` | +| **Runs On** | `${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }}` | | **Steps** | 2 | ### `deploy` @@ -44,7 +44,7 @@ This workflow is **shipped as a template** under `.github/skills/git-ape-onboard | Property | Value | |----------|-------| | **Display Name** | Deploy: ${{ matrix.deployment_id }} | -| **Runs On** | `ubuntu-latest` | +| **Runs On** | `${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }}` | | **Environment** | `azure-deploy` | | **Depends On** | `detect-deployments` | | **Steps** | 17 | @@ -95,7 +95,7 @@ concurrency: jobs: detect-deployments: name: Detect deployments to execute - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} outputs: deployment_ids: ${{ steps.find.outputs.deployment_ids }} has_deployments: ${{ steps.find.outputs.has_deployments }} @@ -139,7 +139,7 @@ jobs: name: "Deploy: ${{ matrix.deployment_id }}" needs: [detect-deployments] if: needs.detect-deployments.outputs.has_deployments == 'true' - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} environment: azure-deploy strategy: matrix: diff --git a/website/docs/workflows/git-ape-destroy.md b/website/docs/workflows/git-ape-destroy.md index 0da5928..1cac8c0 100644 --- a/website/docs/workflows/git-ape-destroy.md +++ b/website/docs/workflows/git-ape-destroy.md @@ -35,7 +35,7 @@ This workflow is **shipped as a template** under `.github/skills/git-ape-onboard | Property | Value | |----------|-------| | **Display Name** | Detect destroy requests | -| **Runs On** | `ubuntu-latest` | +| **Runs On** | `${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }}` | | **Steps** | 2 | ### `destroy` @@ -43,7 +43,7 @@ This workflow is **shipped as a template** under `.github/skills/git-ape-onboard | Property | Value | |----------|-------| | **Display Name** | Destroy: ${{ matrix.deployment_id }} | -| **Runs On** | `ubuntu-latest` | +| **Runs On** | `${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }}` | | **Environment** | `azure-destroy` | | **Depends On** | `detect-destroys` | | **Steps** | 9 | @@ -98,7 +98,7 @@ concurrency: jobs: detect-destroys: name: Detect destroy requests - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} outputs: deployment_ids: ${{ steps.find.outputs.deployment_ids }} has_destroys: ${{ steps.find.outputs.has_destroys }} @@ -188,7 +188,7 @@ jobs: name: "Destroy: ${{ matrix.deployment_id }}" needs: detect-destroys if: needs.detect-destroys.outputs.has_destroys == 'true' - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} environment: azure-destroy strategy: matrix: diff --git a/website/docs/workflows/git-ape-plan.md b/website/docs/workflows/git-ape-plan.md index 5898701..3947ddb 100644 --- a/website/docs/workflows/git-ape-plan.md +++ b/website/docs/workflows/git-ape-plan.md @@ -35,7 +35,7 @@ This workflow is **shipped as a template** under `.github/skills/git-ape-onboard | Property | Value | |----------|-------| | **Display Name** | Detect changed deployments | -| **Runs On** | `ubuntu-latest` | +| **Runs On** | `${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }}` | | **Steps** | 2 | ### `plan-local` @@ -43,7 +43,7 @@ This workflow is **shipped as a template** under `.github/skills/git-ape-onboard | Property | Value | |----------|-------| | **Display Name** | Plan Local: ${{ matrix.deployment_id }} | -| **Runs On** | `ubuntu-latest` | +| **Runs On** | `${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }}` | | **Depends On** | `detect-deployments` | | **Steps** | 12 | @@ -52,7 +52,7 @@ This workflow is **shipped as a template** under `.github/skills/git-ape-onboard | Property | Value | |----------|-------| | **Display Name** | Plan Azure: ${{ matrix.deployment_id }} | -| **Runs On** | `ubuntu-latest` | +| **Runs On** | `${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }}` | | **Depends On** | `detect-deployments` | | **Steps** | 8 | @@ -61,7 +61,7 @@ This workflow is **shipped as a template** under `.github/skills/git-ape-onboard | Property | Value | |----------|-------| | **Display Name** | Plan Comment: ${{ matrix.deployment_id }} | -| **Runs On** | `ubuntu-latest` | +| **Runs On** | `${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }}` | | **Depends On** | `detect-deployments`, `plan-local`, `plan-azure` | | **Steps** | 3 | @@ -103,7 +103,7 @@ concurrency: jobs: detect-deployments: name: Detect changed deployments - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} outputs: deployment_ids: ${{ steps.find.outputs.deployment_ids }} has_deployments: ${{ steps.find.outputs.has_deployments }} @@ -152,7 +152,7 @@ jobs: name: "Plan Local: ${{ matrix.deployment_id }}" needs: detect-deployments if: needs.detect-deployments.outputs.has_deployments == 'true' - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} strategy: matrix: deployment_id: ${{ fromJson(needs.detect-deployments.outputs.deployment_ids) }} @@ -440,7 +440,7 @@ jobs: name: "Plan Azure: ${{ matrix.deployment_id }}" needs: detect-deployments if: needs.detect-deployments.outputs.has_deployments == 'true' - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} strategy: matrix: deployment_id: ${{ fromJson(needs.detect-deployments.outputs.deployment_ids) }} @@ -624,7 +624,7 @@ jobs: name: "Plan Comment: ${{ matrix.deployment_id }}" needs: [detect-deployments, plan-local, plan-azure] if: always() && needs.detect-deployments.outputs.has_deployments == 'true' - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} strategy: matrix: deployment_id: ${{ fromJson(needs.detect-deployments.outputs.deployment_ids) }} diff --git a/website/docs/workflows/git-ape-verify.md b/website/docs/workflows/git-ape-verify.md index f094204..3eddf9b 100644 --- a/website/docs/workflows/git-ape-verify.md +++ b/website/docs/workflows/git-ape-verify.md @@ -32,8 +32,8 @@ This workflow is **shipped as a template** under `.github/skills/git-ape-onboard | Property | Value | |----------|-------| | **Display Name** | Verify Git-Ape configuration | -| **Runs On** | `ubuntu-latest` | -| **Steps** | 6 | +| **Runs On** | `${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }}` | +| **Steps** | 7 | @@ -58,7 +58,7 @@ permissions: jobs: verify: name: Verify Git-Ape configuration - runs-on: ubuntu-latest + runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }} steps: - uses: actions/checkout@v6 @@ -196,6 +196,24 @@ jobs: fi done + - name: Check runner configuration + env: + RUNNER_LABEL: ${{ vars.GIT_APE_RUNNER_LABEL }} + run: | + echo "## Runner Configuration" + echo "" + # Git-Ape workflows resolve `runs-on` from the GIT_APE_RUNNER_LABEL + # variable, falling back to GitHub-hosted `ubuntu-latest` when unset. + if [[ -z "$RUNNER_LABEL" ]]; then + echo "ℹ️ GIT_APE_RUNNER_LABEL is unset — jobs run on GitHub-hosted runners (ubuntu-latest)." + echo " To switch to private/self-hosted runners, provision them via" + echo " @Git-Ape Onboarding and set the GIT_APE_RUNNER_LABEL variable." + else + echo "✅ GIT_APE_RUNNER_LABEL is set to '$RUNNER_LABEL' — jobs target self-hosted runners with this label." + echo " Ensure at least one online runner is registered with the '$RUNNER_LABEL' label," + echo " otherwise deployment jobs will queue indefinitely." + fi + - name: Print summary if: always() run: |