Forward SIGTERM to bundle steps for graceful shutdown#3603
Forward SIGTERM to bundle steps for graceful shutdown#3603kichristensen wants to merge 3 commits into
Conversation
|
Does the fix in cnab-go only apply to explicit outputs not being captured, ot does it also apply to state, in which case I guess that one would have to be fixed as well in order for this to fully work? |
|
State goes through the same path. |
…r#3602) When a user cancels porter with SIGINT, the cancellation now propagates gracefully to the running bundle step instead of hard-killing it. - Porter runtime (PID 1 in the invocation image) now handles SIGTERM in addition to SIGINT, so Docker's container stop signal reaches it. - The mixin runner sends SIGTERM to the mixin process on context cancellation and waits up to 30 s before falling back to SIGKILL, giving steps like terraform time to flush state. - Mixin outputs are read even when the mixin exits with a non-zero code, so state written before shutdown is not silently discarded. - Finalize (which packs the state bag) now runs with a non-cancelled context so state is always persisted after a graceful stop. - Docker containers are configured with a 30 s stop timeout so SIGKILL is not sent before the step has had a chance to clean up. - action.go makes a best-effort save of any outputs returned by cnab-go after context cancellation. Note: full end-to-end output capture also requires an upstream fix to cnab-go so that fetchOutputs is called after ContainerStop in the context-cancelled path of getContainerResult. Signed-off-by: Kim Christensen <kimworking@gmail.com>
…ture) - pkg/signals: verify InterruptSignals includes os.Interrupt on all platforms - pkg/pkgmgmt/client: integration-style test that cancel() sends SIGTERM (not SIGKILL) to the child process on POSIX - pkg/cnab/provider: verify Docker driver is configured with a graceful stop timeout on container.Config.StopTimeout - pkg/runtime: verify mixin outputs are captured even when the mixin exits with an error (e.g. after receiving SIGTERM and flushing state) Signed-off-by: Kim Christensen <kimworking@gmail.com>
There was a problem hiding this comment.
Pull request overview
This PR improves cancellation behavior for Porter bundle executions so that a cancelled step receives SIGTERM and has time to flush state/outputs before being force-killed, enabling better state persistence (e.g. terraform state) during Ctrl+C / container stop scenarios.
Changes:
- Forward SIGTERM on context cancellation for mixin/package executions (POSIX) with a 30s grace period before force-kill; keep current behavior on Windows.
- Capture and apply mixin outputs even when a step exits with a non-zero status, and ensure runtime finalization (state bag packing) still runs after cancellation.
- Configure the CNAB Docker driver to use a 30s container stop timeout and attempt best-effort output persistence on cancellation.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/signals/signal.go | Exposes the set of OS signals used to trigger graceful shutdown. |
| pkg/signals/signal_test.go | Adds a unit test to ensure interrupt signals include os.Interrupt. |
| cmd/porter/main.go | Updates interrupt handling to subscribe to SIGTERM as well as SIGINT (platform-dependent). |
| pkg/pkgmgmt/client/runner.go | Hooks graceful shutdown configuration into the package/mixin runner command creation. |
| pkg/pkgmgmt/client/runner_posix.go | Implements SIGTERM forwarding + 30s grace period on POSIX via exec.Cmd cancellation behavior. |
| pkg/pkgmgmt/client/runner_windows.go | Adds Windows no-op implementation for graceful shutdown forwarding. |
| pkg/pkgmgmt/client/runner_graceful_shutdown_test.go | Adds a POSIX-only test validating SIGTERM forwarding on cancellation. |
| pkg/runtime/runtime.go | Ensures outputs are read/applied even on mixin failure; runs Finalize even when cancelled. |
| pkg/runtime/runtime_test.go | Adds a test validating output capture/application after mixin failure. |
| pkg/runtime/helpers.go | Exposes TestConfig from the test runtime helper for use in new tests. |
| pkg/cnab/provider/driver.go | Configures Docker driver containers with a 30s stop timeout for graceful shutdown. |
| pkg/cnab/provider/driver_test.go | Adds coverage asserting Docker driver stop timeout configuration is applied. |
| pkg/cnab/provider/action.go | Attempts best-effort saving of operation outputs when execution ends due to context cancellation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Kim Christensen <kimworking@gmail.com>
What does this change
When a user cancels a
porter install/upgrade/uninstallwith Ctrl+C, the running bundle step now receives SIGTERM and has up to 30 seconds to clean up before SIGKILL is sent. Outputs written during that window (e.g. a terraform state file) are captured and persisted as in a normal run.Previously the cascade looked like this:
docker stop(SIGTERM to container PID 1)os.Interrupt, so the OS terminated it immediately on SIGTERMAfter this change:
Finalizepacks the state bag using a non-cancelled contextWhat issue does it fix
Closes #3602
Notes for the reviewer
Upstream dependency: full end-to-end output capture also requires a fix to cnab-go. In
driver/docker/docker.go, thecase <-ctx.Done()branch ofgetContainerResultcallsContainerStopbut then returnsdriver.OperationResult{}(empty) without callingfetchOutputs. That means outputs written inside the container to/cnab/app/outputs/are never copied back to the host. The Porter-side changes here lay the groundwork so that fix lands cleanly.Windows:
configureGracefulShutdownis a no-op on Windows (build-tagged) since SIGTERM is not a real process signal there; the existing SIGKILL behaviour is preserved.Grace period: 30 seconds was chosen to match common terraform state-write times; this could be made configurable in a follow-up.
Checklist