From 48b2ebe3a12248adf0a0c9031fffe54d91c5c233 Mon Sep 17 00:00:00 2001 From: Luke Murray <276872131+Luke-Murray_atprod@users.noreply.github.com> Date: Mon, 25 May 2026 10:46:15 +1200 Subject: [PATCH 1/3] feat(agents): add session resumption for interrupted deployments - Implement check for interrupted deployments on startup - Provide user options to resume, start fresh, or view details - Track phase state to allow resuming at specific points - Update output formatting for resumable deployments --- .github/agents/git-ape.agent.md | 120 +++++++++++++++++++++++++++++++- 1 file changed, 119 insertions(+), 1 deletion(-) diff --git a/.github/agents/git-ape.agent.md b/.github/agents/git-ape.agent.md index d206482..6be3d5c 100644 --- a/.github/agents/git-ape.agent.md +++ b/.github/agents/git-ape.agent.md @@ -168,6 +168,68 @@ Coordinate the deployment of Azure resources by delegating to specialized subage - This is a brand new deployment (no existing resources) - User explicitly says "deploy fresh" or "new deployment" +## Session Resumption + +Before starting a new deployment workflow, check whether any previous deployment was interrupted. + +**On startup (before asking what to deploy):** +```bash +# List any deployments with a resumable phase state +_jq_read() { local file=$1 field=$2 + jq -r ".$field" "$file" 2>/dev/null || \ + python3 -c "import json,sys; print(json.load(sys.stdin)['$field'])" < "$file" 2>/dev/null +} + +for dir in .azure/deployments/*/; do + state_file="${dir}tracking/phase-state.json" + if [ -f "$state_file" ]; then + status=$(_jq_read "$state_file" status) + case "$status" in + suspended|blocked|awaiting-confirmation) + dep=$(_jq_read "$state_file" deploymentId) + phase=$(_jq_read "$state_file" phase) + updated=$(_jq_read "$state_file" updatedAt) + echo "Found: $dep | $phase | $status | $updated" + ;; + esac + fi +done +``` + +**If one or more resumable deployments are found**, show a resume prompt before asking what to deploy: + +```markdown +⏸️ In-Progress Deployment Found + +| Deployment | Last Stage | Status | Last Updated | +|------------|-----------|--------|--------------| +| {deploymentId} | {phase} | {status} | {updatedAt} | + +Would you like to: +R. Resume — continue from {phase} +N. Start a new deployment +V. View details of the interrupted deployment +``` + +**If user chooses R:** +1. Set `DEPLOYMENT_ID` to the found deployment's ID +2. Read existing artifacts from `.azure/deployments/$DEPLOYMENT_ID/` to reconstruct context +3. Determine the resume point from `completedPhases` in `phase-state.json`: + - Skip all phases listed in `completedPhases` + - Re-enter at the phase listed in `phase` with `status` of `suspended`, `blocked`, or `awaiting-confirmation` +4. For `blocked` at the security gate: re-show the blocking findings from `security-gate.json` and present options A/B/C/D again +5. For `awaiting-confirmation`: re-display the deployment summary and wait for user approval +6. For `suspended` mid-phase: re-invoke the subagent for that phase with the existing artifact context + +**If user chooses N:** Proceed with a fresh deployment. Do not modify the interrupted deployment's state. + +**If user chooses V:** Read and display `requirements.json` summary and the current `phase-state.json`, then ask R or N. + +**Skip resume check if:** +- User's opening message explicitly says "new deployment", "start fresh", "skip resume", or "ignore previous" +- No `.azure/deployments/` directory exists +- All found `phase-state.json` files have terminal statuses (`completed`, `failed`, `aborted`) + ## Workflow Stages ### Stage 1: Requirements Gathering @@ -489,10 +551,66 @@ For each deployment, save: **In Headless Mode:** Commit these files to the branch so the PR shows full deployment artifacts. The PR diff becomes the deployment review. +**Phase State Tracking:** + +Write `tracking/phase-state.json` at every stage boundary so interrupted sessions can be resumed. + +**Pattern — write BEFORE invoking subagent (`status: in-progress`), then AFTER it returns (`status: completed`):** + +```bash +# Before delegating to a subagent: +cat > .azure/deployments/$DEPLOYMENT_ID/tracking/phase-state.json << 'EOF' +{ + "deploymentId": "DEPLOYMENT_ID_PLACEHOLDER", + "phase": "PHASE_PLACEHOLDER", + "status": "in-progress", + "startedAt": "STARTED_AT_PLACEHOLDER", + "updatedAt": "UPDATED_AT_PLACEHOLDER", + "completedPhases": [], + "mode": "MODE_PLACEHOLDER" +} +EOF +# Replace placeholders with actual values using jq or shell substitution at write time + +# After the subagent returns successfully: +# Re-write with status: completed and the current phase added to completedPhases +``` + +**Phase names and transition events:** + +| Phase | Write `in-progress` when | Write `completed` when | +|-------|--------------------------|------------------------| +| `stage-1-requirements` | Gatherer invoked | `requirements.json` saved + user confirmed | +| `stage-2-template` | Generator invoked | `template.json`, `security-analysis.md`, `preflight-report.md`, `cost-estimate.json` saved | +| `stage-2.5-security-gate` | Gate evaluated | `security-gate.json` shows `PASSED` or `OVERRIDDEN` | +| `stage-2.75-waf-review` | Architect invoked | `waf-review.md` saved | +| `stage-2.85-confirmation` | Confirmation shown to user | User types yes/proceed/deploy | +| `stage-3-deployment` | Deployer invoked | `deployment.log` saved with succeeded status | +| `stage-4-validation` | Integration tests start | `tests.json` saved | + +**Terminal statuses (never resumed):** `completed`, `failed`, `aborted` + +**Resumable statuses:** `suspended`, `blocked`, `awaiting-confirmation` + +**On any abort or unrecoverable error — write suspended state before ending:** +```bash +# Write status: suspended with a suspendReason before the session closes +# Include all phases completed so far in completedPhases +``` + +**Special case — Stage 2.5 Security Gate BLOCKED, user chose D (abort for now):** +Write `status: blocked` (not `suspended`) — resume will re-show the blocking findings from `security-gate.json` and offer A/B/C/D again. + +**Special case — Stage 2.85 awaiting confirmation:** +Write `status: awaiting-confirmation` immediately after showing the deployment summary, before waiting for the user's yes/no. Resume will re-display the summary. + +**Final state — all stages complete:** +Write `status: completed` with all 7 phase keys in `completedPhases`. This deployment is never offered for resumption. + **Before Starting:** ```bash DEPLOYMENT_ID="deploy-$(date +%Y%m%d-%H%M%S)" -mkdir -p .azure/deployments/$DEPLOYMENT_ID +mkdir -p .azure/deployments/$DEPLOYMENT_ID/tracking ``` **After Each Stage:** From 8ac9dcfdf01d1c2abb3a40d7d4693485d9c18beb Mon Sep 17 00:00:00 2001 From: Luke Murray <24467442+lukemurraynz@users.noreply.github.com> Date: Mon, 25 May 2026 11:59:03 +1200 Subject: [PATCH 2/3] Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --- .github/agents/git-ape.agent.md | 30 ++++++++++++++++-------------- 1 file changed, 16 insertions(+), 14 deletions(-) diff --git a/.github/agents/git-ape.agent.md b/.github/agents/git-ape.agent.md index 6be3d5c..47ddc19 100644 --- a/.github/agents/git-ape.agent.md +++ b/.github/agents/git-ape.agent.md @@ -180,20 +180,22 @@ _jq_read() { local file=$1 field=$2 python3 -c "import json,sys; print(json.load(sys.stdin)['$field'])" < "$file" 2>/dev/null } -for dir in .azure/deployments/*/; do - state_file="${dir}tracking/phase-state.json" - if [ -f "$state_file" ]; then - status=$(_jq_read "$state_file" status) - case "$status" in - suspended|blocked|awaiting-confirmation) - dep=$(_jq_read "$state_file" deploymentId) - phase=$(_jq_read "$state_file" phase) - updated=$(_jq_read "$state_file" updatedAt) - echo "Found: $dep | $phase | $status | $updated" - ;; - esac - fi -done +if [ -d .azure/deployments ]; then + for dir in .azure/deployments/*/; do + state_file="${dir}tracking/phase-state.json" + if [ -f "$state_file" ]; then + status=$(_jq_read "$state_file" status) + case "$status" in + suspended|blocked|awaiting-confirmation) + dep=$(_jq_read "$state_file" deploymentId) + phase=$(_jq_read "$state_file" phase) + updated=$(_jq_read "$state_file" updatedAt) + echo "Found: $dep | $phase | $status | $updated" + ;; + esac + fi + done +fi ``` **If one or more resumable deployments are found**, show a resume prompt before asking what to deploy: From 672e488e77196605de9ee73e0b2879fc71712702 Mon Sep 17 00:00:00 2001 From: Luke Murray <276872131+Luke-Murray_atprod@users.noreply.github.com> Date: Mon, 25 May 2026 12:17:30 +1200 Subject: [PATCH 3/3] feat(agents): enhance resumable deployment handling - Allow 'in-progress' status to be treated as resumable - Update documentation to reflect changes in resumable statuses - Improve phase-state.json creation for tracking deployments --- .github/agents/git-ape.agent.md | 47 +++++++++++++++++++++------------ 1 file changed, 30 insertions(+), 17 deletions(-) diff --git a/.github/agents/git-ape.agent.md b/.github/agents/git-ape.agent.md index 6be3d5c..ab065c6 100644 --- a/.github/agents/git-ape.agent.md +++ b/.github/agents/git-ape.agent.md @@ -174,7 +174,9 @@ Before starting a new deployment workflow, check whether any previous deployment **On startup (before asking what to deploy):** ```bash -# List any deployments with a resumable phase state +# List any deployments with a resumable phase state. +# Treat in-progress as resumable because an agent crash/kill may leave the +# phase-state file before it can be rewritten to suspended. _jq_read() { local file=$1 field=$2 jq -r ".$field" "$file" 2>/dev/null || \ python3 -c "import json,sys; print(json.load(sys.stdin)['$field'])" < "$file" 2>/dev/null @@ -185,7 +187,7 @@ for dir in .azure/deployments/*/; do if [ -f "$state_file" ]; then status=$(_jq_read "$state_file" status) case "$status" in - suspended|blocked|awaiting-confirmation) + in-progress|suspended|blocked|awaiting-confirmation) dep=$(_jq_read "$state_file" deploymentId) phase=$(_jq_read "$state_file" phase) updated=$(_jq_read "$state_file" updatedAt) @@ -216,10 +218,10 @@ V. View details of the interrupted deployment 2. Read existing artifacts from `.azure/deployments/$DEPLOYMENT_ID/` to reconstruct context 3. Determine the resume point from `completedPhases` in `phase-state.json`: - Skip all phases listed in `completedPhases` - - Re-enter at the phase listed in `phase` with `status` of `suspended`, `blocked`, or `awaiting-confirmation` + - Re-enter at the phase listed in `phase` with `status` of `in-progress`, `suspended`, `blocked`, or `awaiting-confirmation` 4. For `blocked` at the security gate: re-show the blocking findings from `security-gate.json` and present options A/B/C/D again 5. For `awaiting-confirmation`: re-display the deployment summary and wait for user approval -6. For `suspended` mid-phase: re-invoke the subagent for that phase with the existing artifact context +6. For `in-progress` or `suspended` mid-phase: re-invoke the subagent for that phase with the existing artifact context **If user chooses N:** Proceed with a fresh deployment. Do not modify the interrupted deployment's state. @@ -559,18 +561,27 @@ Write `tracking/phase-state.json` at every stage boundary so interrupted session ```bash # Before delegating to a subagent: -cat > .azure/deployments/$DEPLOYMENT_ID/tracking/phase-state.json << 'EOF' -{ - "deploymentId": "DEPLOYMENT_ID_PLACEHOLDER", - "phase": "PHASE_PLACEHOLDER", - "status": "in-progress", - "startedAt": "STARTED_AT_PLACEHOLDER", - "updatedAt": "UPDATED_AT_PLACEHOLDER", - "completedPhases": [], - "mode": "MODE_PLACEHOLDER" -} -EOF -# Replace placeholders with actual values using jq or shell substitution at write time +mkdir -p ".azure/deployments/$DEPLOYMENT_ID/tracking" +PHASE_STARTED_AT="$(date -u +%Y-%m-%dT%H:%M:%SZ)" +COMPLETED_PHASES_JSON="${COMPLETED_PHASES_JSON:-[]}" + +jq -n \ + --arg deploymentId "$DEPLOYMENT_ID" \ + --arg phase "$PHASE" \ + --arg status "in-progress" \ + --arg startedAt "$PHASE_STARTED_AT" \ + --arg updatedAt "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \ + --arg mode "$MODE" \ + --argjson completedPhases "$COMPLETED_PHASES_JSON" \ + '{ + deploymentId: $deploymentId, + phase: $phase, + status: $status, + startedAt: $startedAt, + updatedAt: $updatedAt, + completedPhases: $completedPhases, + mode: $mode + }' > ".azure/deployments/$DEPLOYMENT_ID/tracking/phase-state.json" # After the subagent returns successfully: # Re-write with status: completed and the current phase added to completedPhases @@ -590,7 +601,9 @@ EOF **Terminal statuses (never resumed):** `completed`, `failed`, `aborted` -**Resumable statuses:** `suspended`, `blocked`, `awaiting-confirmation` +**Resumable statuses:** `in-progress`, `suspended`, `blocked`, `awaiting-confirmation` + +`in-progress` is resumable because a crash, kill, or host interruption can stop the agent before it writes `suspended`. If `updatedAt` is very recent, tell the user it may represent a still-running session before offering R/N/V. **On any abort or unrecoverable error — write suspended state before ending:** ```bash