Skip to content

handlers: StewardHandler returns SKIPPED when no merge actually happened#153

Open
Antawari wants to merge 2 commits into
mainfrom
ishtar/steward-handler-skipped-when-no-merge
Open

handlers: StewardHandler returns SKIPPED when no merge actually happened#153
Antawari wants to merge 2 commits into
mainfrom
ishtar/steward-handler-skipped-when-no-merge

Conversation

@Antawari
Copy link
Copy Markdown
Contributor

Summary

Pre-fix, StewardHandler.handle returned TaskStatus.COMPLETED for every non-error path — even when verdict was reject, verdict was missing, or pr_number couldn't be identified. The closer's contract is to seal the work; reporting "completed" for cases where no merge happened makes the pipeline lie about its actual outcomes. Downstream gates and CLI summaries see COMPLETED → assume the PR landed → propagate the false-positive.

Existing tests already verified the behavioral side of the cold path (verdict != "approve" does NOT call merge_pr on the github_client mock) — but never asserted the resulting TaskStatus. That's the gap this PR closes.

Fix

src/bonfire/handlers/steward.py — three-line addition right before the model_copy:

merge_actually_happened = verdict == "approve" and pr_number is not None
final_status = (
    TaskStatus.COMPLETED if merge_actually_happened else TaskStatus.SKIPPED
)
Case Pre-fix Post-fix
verdict="approve" + pr_number set COMPLETED COMPLETED (unchanged)
verdict="reject" + pr_number set COMPLETED (bug) SKIPPED
missing/empty verdict + pr_number COMPLETED (bug) SKIPPED
verdict="approve" + no pr_number COMPLETED (bug) SKIPPED

All other return-envelope behavior preserved: same metadata (steward_verdict + steward_pr), same result string, same error path. Only status changes.

SKIPPED was chosen over FAILED per the ticket — the closer evaluated its preconditions and intentionally declined; it didn't crash. SKIPPED conveys "intentionally did not run" exactly.

Tests

tests/unit/test_steward_handler_skipped_on_non_approve.py (new · Knight RED) — 5 tests:

Class Test What it pins
TestStatusReflectsActualMerge reject verdict → SKIPPED Primary bug case
TestStatusReflectsActualMerge missing verdict → SKIPPED Empty-prior-results case
TestStatusReflectsActualMerge approve without pr_number → SKIPPED Can't-identify case
TestStatusReflectsActualMerge approve + pr_number → COMPLETED Regression guard for the only true-merge path
TestStatusPreservesEnvelopeStructure SKIPPED path still attaches metadata + result Only status changes

TDD verification

  • Knight RED state confirmed against pre-fix code: 4 fail / 1 pass (the 1 passing is the approve+pr_number regression guard).
  • Warrior GREEN state confirmed after fix: 5/5 pass in 1.00s.
  • Existing regression: tests/unit/test_steward_handler.py (570-line suite) — 30/30 still green.
  • ruff check + ruff format --check on changed files: clean.

Test plan

  • pytest tests/unit/test_steward_handler_skipped_on_non_approve.py — confirm 5/5 green.
  • pytest tests/unit/test_steward_handler.py — confirm 30/30 still green.
  • (Optional manual) Run a pipeline with a request_changes Wizard verdict; confirm the closer stage now shows SKIPPED in CLI summaries rather than COMPLETED.

🤖 Generated with Claude Code

Antawari and others added 2 commits May 27, 2026 17:09
Pre-fix, StewardHandler.handle returned TaskStatus.COMPLETED for every
non-error path — even when verdict was "reject", verdict was missing,
or pr_number couldn't be identified. The closer's contract is to seal
the work; reporting "completed" for cases where no merge happened
makes the pipeline lie about its actual outcomes. Downstream gates and
CLI summaries see COMPLETED → assume the PR landed → propagate the
false-positive.

## Fix

src/bonfire/handlers/steward.py:

  - Compute final_status from the actual-merge precondition:
      merge_actually_happened = verdict == "approve" and pr_number is not None
      final_status = COMPLETED if merge_actually_happened else SKIPPED
  - Reject verdict + pr_number → SKIPPED
  - Empty verdict + pr_number → SKIPPED
  - Approve verdict without pr_number → SKIPPED (nothing to merge)
  - Approve + pr_number → COMPLETED (the only true-merge path)
  - All other return-envelope behavior preserved: same metadata
    (steward_verdict + steward_pr), same result string, same error path.

SKIPPED is the right semantics over FAILED — the closer evaluated its
preconditions and intentionally declined; it didn't crash.

## Tests

tests/unit/test_steward_handler_skipped_on_non_approve.py (new · Knight RED):

  Five tests across two classes:
  - TestStatusReflectsActualMerge (4):
      * reject verdict → SKIPPED
      * missing verdict → SKIPPED
      * approve verdict without pr_number → SKIPPED
      * approve + pr_number → COMPLETED (regression guard for the only
        true-merge path, plus a sanity check that merge_pr was actually
        invoked on the github_client mock)
  - TestStatusPreservesEnvelopeStructure (1):
      * SKIPPED path still attaches steward_verdict + steward_pr metadata
        + preserves the result string — only the status changes.

## Verification

  pytest tests/unit/test_steward_handler_skipped_on_non_approve.py
    5 passed (Knight RED → GREEN verified — 4 failed pre-fix, all
    green post-fix; the 1 pre-fix-passing test is the COMPLETED
    regression guard)

  pytest tests/unit/test_steward_handler.py (existing 570-line suite)
    30 passed (zero regressions — the existing reject-verdict tests
    only asserted "merge_pr was not called", not the resulting
    TaskStatus, so they're unchanged by this fix)

  ruff check + format on changed files: clean

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to the prior commit. ruff format was reported as needing a
re-flow on steward.py after the new merge_actually_happened block;
running it now closes the format check that CI would otherwise flag.
No behavior change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant