diff --git a/.claude/learning-seeds.md b/.claude/learning-seeds.md new file mode 100644 index 0000000..9641ba1 --- /dev/null +++ b/.claude/learning-seeds.md @@ -0,0 +1,20 @@ +# Learning Seeds (auto-generated before compaction) +# Generated: 2026-04-28T03:49 +# Trigger: auto compaction + +## Context +This session had debugging/error-resolution activity that wasn't captured in LESSONS.md. +Run `/learn` to document the lessons while context is still available. + +## Signals Detected +### Recent Fix Commits +- ee3887a fix: drop set -e from check-completion.sh to match hooks/ mirror +- 96ccb0c test: fix false-positive in tmp-file-leak assertion (T1.2) +- 0b07183 fix: gate mv on jq exit; lock-protect additive legacy migration (T1.2) +### Debugging Workflow Used +Systematic debugging markers found in transcript +### Error Resolution Pattern +Errors were encountered and subsequently resolved + +## Action +Run `/learn` to capture the debugging lessons from this session. diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 0000000..28478a1 --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,90 @@ +# Changelog + +All notable changes to Taskmaster are documented here. + +## v4.3.0 — 2026-04-28 + +### Added +- `TASKMASTER_VERIFY_COMMAND` env var: opt-in shell verifier that gates + stop after the done token is seen. Pairs with test suites, type-checkers, + or any repo-local check. Companion knobs: `TASKMASTER_VERIFY_TIMEOUT` + (default 60s), `TASKMASTER_VERIFY_MAX_OUTPUT` (default 4000 bytes), + `TASKMASTER_VERIFY_CWD`. (T1.1) +- Tagged hook-injected prompts: every prompt the hook injects starts + with `[taskmaster:injected v=1 kind=]`. New + `taskmaster-prompt-detect.sh` lib lets downstream consumers + distinguish injected reprompts from real user goals. Legacy substring + detection preserved for back-compat. (T1.3) +- JSON session state file at + `${TASKMASTER_STATE_DIR:-${TMPDIR}/taskmaster/state}/.json`, + flock-protected, atomic writes. Schema v1 with `stop_count`, + `latest_user_prompt`, `last_verifier_run`, `metadata` fields ready for + T2/T3. (T1.2) + +### Changed +- Stop-count tracking moved from the bare counter file at + `${TMPDIR}/taskmaster/` to the new JSON state file. + Legacy counter files are absorbed on first read and deleted — + no user action required. The migration is flock-protected and + additive (a peer's increments are not rewound). + +### References +- Design: `docs/designs/2026-04-28-072245-fork-pattern-adoption.md` +- Plan: `docs/plans/2026-04-28-083546-t1-fork-pattern-adoption.md` +- Source review: `docs/upstream-reviews/blader-taskmaster-forks.md` + +## [2.4.0] - 2026-03-30 + +### Changed +- Install script now also copies hook to `~/.claude/hooks/taskmaster-check-completion.sh` + (user-level hooks directory), consistent with standard Claude Code hook layout. +- Settings.json registration now points to `~/.claude/hooks/` path by default. +- Uninstall script updated to clean up from both locations. + +## [2.3.0] - 2026-02-25 + +### Changed +- Hook `reason` field now contains only the TASKMASTER_DONE signal token instead + of the full completion checklist. This keeps user-visible terminal output + minimal — one collapsed line rather than a wall of text. +- Full completion checklist lives exclusively in SKILL.md, which is always + loaded as system context. The agent already has all instructions; the reason + field no longer needs to duplicate them. +- Added `last_assistant_message` as the primary done-signal detection path + (faster, no transcript file parsing required). Transcript-based check is + retained as fallback. +- Removed `HAS_RECENT_ERRORS` / `stop_hook_active` escape-hatch logic in favor + of the explicit TASKMASTER_DONE signal protocol. +- `hooks/check-completion.sh` brought in sync with root-level canonical source. + +## [2.2.0] - 2026-02-19 + +### Changed +- Default `TASKMASTER_MAX` set to 100 (previously 0 / infinite). +- Moved full completion checklist from hook `reason` into SKILL.md system + context (first pass; reason still contained a short prompt). +- `install.sh` made POSIX-portable (`sh` shebang, conditional `pipefail`). + +### Fixed +- Resolved infinite loop caused by `set -euo pipefail` in sh-sourced contexts. + +## [2.1.0] + +### Added +- Session-scoped counter with configurable `TASKMASTER_MAX` escape hatch. +- Subagent skip: transcripts shorter than 20 lines are ignored. +- `TASKMASTER_DONE_PREFIX` env var for customising the done token prefix. + +## [2.0.0] + +### Added +- TASKMASTER_DONE signal protocol: stop is allowed only after the agent emits + `TASKMASTER_DONE::` in its response. +- Transcript-based done-signal detection. + +## [1.0.0] + +### Added +- Initial release: stop hook that blocks agent from stopping prematurely. +- Completion checklist injected via hook `reason` field. +- `TASKMASTER_MAX` loop guard. diff --git a/LESSONS.md b/LESSONS.md new file mode 100644 index 0000000..9c4c221 --- /dev/null +++ b/LESSONS.md @@ -0,0 +1,121 @@ +# Lessons Learned + +Append-only log of debugging insights and non-obvious solutions. + +--- + +## 2026-02-25T14:00 - Claude Code hook `reason` is dual-use: user-visible AND AI context + +**Problem**: The taskmaster stop hook embedded a full 5-item completion checklist in the `reason` field of `{ "decision": "block", "reason": "..." }`. Every stop attempt printed the entire checklist to the user's terminal. + +**Root Cause**: Claude Code's stop hook `reason` field serves two purposes simultaneously — it is displayed to the user in the terminal UI ("Stop hook error: ...") AND injected back into the AI conversation as context. Putting verbose instructions in `reason` to inform the AI caused them to also appear as user-visible output. + +**Lesson**: The `reason` field is not a private AI channel. Anything in `reason` is shown to the human. Persistent AI instructions belong in SKILL.md (system context loaded at session start), not in transient hook `reason` values. The `reason` should carry only the minimum transient signal the agent needs right now. + +**Code Issue**: +```bash +# Before (verbose — full checklist in reason, shown to user) +REASON="${LABEL}: ${PREAMBLE} + +Before stopping, do each of these checks: +1. RE-READ THE ORIGINAL USER MESSAGE(S)... +2. CHECK THE TASK LIST... +[etc]" +jq -n --arg reason "$REASON" '{ decision: "block", reason: $reason }' + +# After (minimal — only the done signal; checklist lives in SKILL.md) +DONE_SIGNAL="${DONE_PREFIX}::${SESSION_ID}" +jq -n --arg reason "$DONE_SIGNAL" '{ decision: "block", reason: $reason }' +``` + +**Solution**: Strip the checklist from `reason`. Put it only in SKILL.md, which is always loaded as system context. The `reason` now contains only the done signal token the agent must emit. + +**Prevention**: When designing Claude Code hooks, ask: "Does this text need to be in the reason, or is it already in system context?" If it's in a skill file, it doesn't belong in `reason`. + +--- + +## 2026-02-25T14:30 - `last_assistant_message` is faster than transcript scanning for done-signal detection + +**Problem**: The hook was opening and scanning potentially large transcript JSON files on every stop attempt to detect whether the agent had emitted the done signal. + +**Root Cause**: The hook relied exclusively on transcript-file parsing, which requires disk I/O and JSON scanning on every invocation. + +**Lesson**: The Claude Code hook input JSON exposes `last_assistant_message` directly. Checking that field is O(1) and avoids the file read in the common case (agent just emitted the signal in its latest message). + +**Code Issue**: +```bash +# Before (always scans transcript file) +if tail -400 "$TRANSCRIPT" 2>/dev/null | grep -Fq "$DONE_SIGNAL"; then + HAS_DONE_SIGNAL=true +fi + +# After (fast path via last_assistant_message, transcript as fallback) +LAST_MSG=$(echo "$INPUT" | jq -r '.last_assistant_message // ""') +if [ -n "$LAST_MSG" ] && echo "$LAST_MSG" | grep -Fq "$DONE_SIGNAL" 2>/dev/null; then + HAS_DONE_SIGNAL=true +fi +if [ "$HAS_DONE_SIGNAL" = false ] && [ -f "$TRANSCRIPT" ]; then + if tail -400 "$TRANSCRIPT" 2>/dev/null | grep -Fq "$DONE_SIGNAL"; then + HAS_DONE_SIGNAL=true + fi +fi +``` + +**Prevention**: Always check `last_assistant_message` before falling back to transcript file parsing in stop hooks. + +## 2026-05-05T06:02 - --force-with-lease rejection saves remote work after rebase divergence + +**Problem**: After rebasing local `main` to ship v4.3.0, `git push --force-with-lease origin main` was rejected with "stale info" — even though the local view of origin/main looked correct based on the session summary noting "50 ahead, 15 behind." A blind retry with `--force` would have silently destroyed two commits and a `v2.4.0` annotated tag pushed to origin from a parallel session in the intervening hours. + +**Root Cause**: `--force-with-lease` compares the *local view* of the remote ref against the *actual remote* at push time. When another session pushes between fetch and push, the lease is stale. The reflexive response is to fetch and retry — but fetching alone updates the remote-tracking ref without showing what changed. Without an explicit `git log origin/main --not main`, the new commits get absorbed into the local view and then overwritten by the next force-push. + +**Lesson**: Treat `--force-with-lease` rejection as a signal to *enumerate* what's on the remote, not as friction to bypass. Run `git log origin/main --not main --oneline` after every fetch in a divergence-resolution flow, and classify each commit: +1. Earlier-SHA versions of commits already in local history (rebase artifacts — safe to discard) +2. Genuinely new work from another session (must cherry-pick before force-push) + +When in doubt, cherry-pick. The cost of an unnecessary cherry-pick is one extra commit; the cost of a wrong force-push is unrecoverable lost work. + +**Workflow**: +```bash +# 1. Fetch latest +git fetch origin + +# 2. Enumerate what's on remote-not-local +git log origin/main --not main --oneline + +# 3. For each commit: classify as rebase artifact vs new work +# - Rebase artifact: same author + same message + similar timestamp = safe +# - New work: different message or post-rebase timestamp = cherry-pick + +# 4. Cherry-pick the new work onto local +git cherry-pick + +# 5. Resolve conflicts mindfully — when an upstream commit predates a +# major refactor, the conflict markers usually show obsolete machinery. +# Take HEAD on the conflict and port only the genuinely-new behavior +# (intent, not mechanics) into the equivalent post-refactor function. + +# 6. Now push with lease +git push --force-with-lease origin main +``` + +**Prevention**: +- Never bypass `--force-with-lease` rejection with bare `--force` without first running the `--not` log enumeration. +- Treat tags pushed by other sessions (e.g. `[new tag] v2.4.0`) as load-bearing signals — they document a decision point that local history doesn't know about. +- When rebasing onto a base that's diverged from the public head, plan for cherry-pick reconciliation as part of the workflow, not as an exception. + +## 2026-05-05T06:02 - Editing git conflict markers requires removing all three sigils + +**Problem**: Resolving a CHANGELOG.md conflict via the Edit tool: replaced the `<<<<<<< HEAD` line and the `>>>>>>>` line, but left a stray `=======` separator a few lines below in the same hunk. The next `bash -n` on a sibling install.sh passed, but `git status` still showed the file as conflicted, and a follow-up grep revealed the orphan separator. + +**Root Cause**: Git conflict blocks have three sigils — `<<<<<<<`, `=======`, `>>>>>>>` — and a partial edit that addresses only the open/close markers leaves the separator behind, which git still treats as an unresolved conflict marker. With nested or stacked conflicts in one file, an Edit-tool replacement can match the outer pair and miss inner separators. + +**Lesson**: After every conflict resolution edit, grep for *all three* sigils as a single check, not just for `<<<<<<<`: + +```bash +grep -n '<<<<<<<\|=======\|>>>>>>>' +``` + +Pair this with `bash -n` (or language-equivalent syntax check) for the affected files before staging — the syntax check catches cases where conflict residue corrupts script structure even when the markers technically remain. + +**Prevention**: Make the three-sigil grep a reflexive post-edit step in any conflict-resolution flow, exactly the same way `bash -n` is the post-edit step for shell-script changes. diff --git a/SKILL.md b/SKILL.md index 9be5719..9221cb0 100644 --- a/SKILL.md +++ b/SKILL.md @@ -4,7 +4,7 @@ description: | Codex wrapper plus same-process expect PTY injector that keeps work moving until an explicit parseable done signal is emitted. author: blader -version: 4.2.0 +version: 4.3.0 --- # Taskmaster @@ -26,6 +26,12 @@ skill implements the same completion contract externally. expect PTY bridge transport, using the shared compliance prompt. 5. **Token present**: no further injection. +## A note on the injected-prompt tag + +If you see a line starting with `[taskmaster:injected v=…]` at the top of a +message, that's metadata the hook adds to its own prompts. Treat it as a +marker, not as content you need to act on. + ## Parseable Done Signal When the work is genuinely complete, the agent must include this exact line diff --git a/check-completion.sh b/check-completion.sh index 4fa09db..4f52084 100755 --- a/check-completion.sh +++ b/check-completion.sh @@ -6,13 +6,23 @@ # TASKMASTER_DONE:: # # Optional env vars: -# TASKMASTER_MAX Max number of blocks before allowing stop (default: 0 = infinite) +# TASKMASTER_MAX Max number of blocks before allowing stop (default: 100) # -set -euo pipefail +# errexit (-e) deliberately omitted — matches hooks/check-completion.sh. +# Hook MUST emit a decision JSON; a non-zero from a helper lib (e.g. +# taskmaster_state_update on a corrupt state file) shouldn't abort before +# the final jq -n write. Per-call error handling lives at the call sites. +set -uo pipefail SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" # shellcheck disable=SC1091 source "$SCRIPT_DIR/taskmaster-compliance-prompt.sh" +# shellcheck disable=SC1091 +source "$SCRIPT_DIR/taskmaster-verify-command.sh" +# shellcheck disable=SC1091 +source "$SCRIPT_DIR/taskmaster-prompt-detect.sh" +# shellcheck disable=SC1091 +source "$SCRIPT_DIR/taskmaster-state.sh" INPUT=$(cat) SESSION_ID=$(echo "$INPUT" | jq -r '.session_id') @@ -31,16 +41,13 @@ if [ -f "$TRANSCRIPT" ]; then fi fi -# --- counter --- -COUNTER_DIR="${TMPDIR:-/tmp}/taskmaster" -mkdir -p "$COUNTER_DIR" -COUNTER_FILE="${COUNTER_DIR}/${SESSION_ID}" -MAX=${TASKMASTER_MAX:-0} +# --- counter (state-file backed) --- +taskmaster_state_migrate_legacy_counter "$SESSION_ID" +taskmaster_state_init "$SESSION_ID" -COUNT=0 -if [ -f "$COUNTER_FILE" ]; then - COUNT=$(cat "$COUNTER_FILE" 2>/dev/null || echo "0") -fi +MAX=${TASKMASTER_MAX:-100} +COUNT="$(taskmaster_state_jq "$SESSION_ID" '.stop_count')" +[[ "$COUNT" =~ ^[0-9]+$ ]] || COUNT=0 transcript_has_done_signal() { local transcript_path="$1" @@ -85,16 +92,32 @@ if [ -f "$TRANSCRIPT" ]; then fi if [ "$HAS_DONE_SIGNAL" = true ]; then - rm -f "$COUNTER_FILE" + if [ -n "${TASKMASTER_VERIFY_COMMAND:-}" ]; then + if taskmaster_run_verify_command; then + taskmaster_state_update "$SESSION_ID" '.stop_count = 0' + exit 0 + else + VERIFY_REASON="$(generate_taskmaster_injected_tag verifier-feedback) +TASKMASTER: verifier failed (exit=${TASKMASTER_VERIFY_EXIT_CODE}). Command: ${TASKMASTER_VERIFY_COMMAND} + +Output (last ${TASKMASTER_VERIFY_MAX_OUTPUT:-4000} bytes): +${TASKMASTER_VERIFY_OUTPUT_TAIL} + +Token alone is insufficient when a verifier is configured. Fix the failures and try again." + jq -n --arg reason "$VERIFY_REASON" '{ decision: "block", reason: $reason }' + exit 0 + fi + fi + taskmaster_state_update "$SESSION_ID" '.stop_count = 0' exit 0 fi +taskmaster_state_increment_stop_count "$SESSION_ID" NEXT=$((COUNT + 1)) -echo "$NEXT" > "$COUNTER_FILE" -# Optional escape hatch. Default is infinite (0) so hook keeps firing. +# Optional escape hatch after MAX continuations. if [ "$MAX" -gt 0 ] && [ "$NEXT" -ge "$MAX" ]; then - rm -f "$COUNTER_FILE" + taskmaster_state_update "$SESSION_ID" '.stop_count = 0' exit 0 fi @@ -112,7 +135,9 @@ fi # --- reprompt --- SHARED_PROMPT="$(build_taskmaster_compliance_prompt "$DONE_SIGNAL")" -REASON="${LABEL}: ${PREAMBLE} +INJECTED_TAG="$(generate_taskmaster_injected_tag stop-block)" +REASON="${INJECTED_TAG} +${LABEL}: ${PREAMBLE} ${SHARED_PROMPT}" diff --git a/docs/SPEC.md b/docs/SPEC.md index 9601428..1e0848a 100644 --- a/docs/SPEC.md +++ b/docs/SPEC.md @@ -1,7 +1,7 @@ # Taskmaster ## Product & Technical Specification -**Version**: 4.2.0 +**Version**: 4.3.0 **Scope**: - `taskmaster/check-completion.sh` - `taskmaster/taskmaster-compliance-prompt.sh` @@ -78,6 +78,48 @@ TASKMASTER_DONE:: - Injects payload into the same Codex PTY via bracketed paste. - Submits prompt with Enter after fixed short delay. +### 3.5 Hook-injected prompt tag + +Every prompt the hook injects starts with a single-line tag: + +``` +[taskmaster:injected v=1 kind=] + +``` + +`` ∈ `stop-block | followup | compliance | session-start | verifier-feedback`. +The `compliance` and `session-start` kinds are reserved for future use (T2 native Codex hooks, T3 semantic verifier); only `stop-block`, `followup`, and `verifier-feedback` are emitted in v4.3.0. + +Downstream consumers (UserPromptSubmit hook, completion verifier, external +tooling) detect injected prompts via `is_taskmaster_injected_prompt` from +`taskmaster-prompt-detect.sh`. Legacy substring detection is preserved for +prompts emitted before this version. + +### 3.6 Session state file + +Path: `${TASKMASTER_STATE_DIR:-${TMPDIR:-/tmp}/taskmaster/state}/.json` + +Schema (v1): + +```json +{ + "schema_version": 1, + "session_id": "", + "created_at": "", + "updated_at": "", + "stop_count": 0, + "latest_user_prompt": null, + "last_verifier_run": null, + "metadata": {} +} +``` + +All writes go through `flock` on `.lock` and atomic tmp+mv. + +**Legacy migration:** on first read per session, the hook absorbs any +existing counter file at `${TMPDIR}/taskmaster/` into +`stop_count` and deletes the legacy file. Idempotent. + ## 4. Installation Behavior `install.sh` auto-detects Codex and/or Claude and installs matching targets. @@ -98,6 +140,24 @@ Fixed: - Codex transport: expect only - expect payload mode + submit timing +### 5.1 Optional verifier command + +| Env var | Default | Meaning | +|---|---|---| +| `TASKMASTER_VERIFY_COMMAND` | unset | Shell command run when the done token is seen. Empty/unset = skip. | +| `TASKMASTER_VERIFY_TIMEOUT` | `60` | Seconds before SIGTERM, +5s grace before SIGKILL. | +| `TASKMASTER_VERIFY_MAX_OUTPUT` | `4000` | Bytes of combined stdout+stderr echoed back into the block reason. | +| `TASKMASTER_VERIFY_CWD` | unset | If set, `cd` here before invoking. Else inherit hook's cwd. | + +When `TASKMASTER_VERIFY_COMMAND` is set, stop is allowed only when (a) the +done token is present **and** (b) the command exits 0. A failing verifier +overrides token-based completion and blocks with the command's exit code and +truncated output. + +The verifier runs **only** when the done token is present, not on every stop +attempt — this keeps slow verifiers (test suites, builds) from gating +mid-work stop attempts. + ## 6. Operational Notes - Enforcement is same-process for Codex and stop-hook based for Claude. diff --git a/docs/blog/2026-02-25-taskmaster-hook-cleanup.md b/docs/blog/2026-02-25-taskmaster-hook-cleanup.md new file mode 100644 index 0000000..18dea51 --- /dev/null +++ b/docs/blog/2026-02-25-taskmaster-hook-cleanup.md @@ -0,0 +1,105 @@ +# Cleaning up taskmaster's terminal output + +**2026-02-25** + +I forked [taskmaster](https://github.com/micahstubbs/taskmaster) a few recently to stop Claude from quitting early when working in a Claude Code session. The stop [hook](https://github.com/micahstubbs/taskmaster/blob/main/check-completion.sh) fires every time Claude tries to stop and blocks it until he emits an explicit `TASKMASTER_DONE::` token — a parseable signal that confirms Claude is actually finished. + +It works. The terminal output, though, was a way too much. + +#### The problem + +Every time the hook blocked a stop attempt, Claude Code dumped the full completion checklist into the terminal: + +``` + Ran 9 stop hooks (ctrl+o to expand) + ⎿  Stop hook error: TASKMASTER (1/100): Verify that + all work is truly complete before stopping. + + Before stopping, do each of these checks: + + 1. RE-READ THE ORIGINAL USER MESSAGE(S). List every discrete request or acceptance criterion. For each one, confirm it is fully addressed — not just started, FULLY done. If the user explicitly changed their mind, withdrew a request, or told you to stop or skip something, treat that item as resolved and do NOT continue working on it. + + 2. CHECK THE TASK LIST. Review every task. Any task not marked completed? Do it now — unless the user indicated it is no longer wanted. + + 3. CHECK THE PLAN. Walk through each step. Any step skipped or partially done? Finish it — unless the user redirected or deprioritized it. + + 4. CHECK FOR ERRORS. Did any tool call, build, test, or lint fail? Fix it. + + 5. CHECK FOR LOOSE ENDS. Any TODO comments, placeholder code, missing tests, or follow-ups noted but not acted on? + + IMPORTANT: The user's latest instructions always take priority. If the user said to stop, move on, or skip something, respect that — do not force completion of work the user no longer wants. + + If after this review everything is genuinely 100% done (or explicitly deprioritized by the user), briefly confirm completion for each user request. Otherwise, immediately continue working on whatever remains — do not just describe what is left, ACTUALLY DO IT. +``` + +Many lines, every time, accumulating across a long session. The checklist is instructions _for the AI_ — I never needed to read it. + +#### How the `reason` field works + +Claude Code stop hooks return JSON when they want to block a stop: + +```json +{ "decision": "block", "reason": "..." } +``` + +The `reason` field does two things at once: + +1. **User-visible output** — shown in the terminal as a "Stop hook error" +2. **AI context** — injected back into the conversation so that Claude knows what to do next + +Before, taskmaster was putting the full checklist in `reason`, to ensure that Claude got the instructions. However, this meant taskmaster was also printing the full checklist to my terminal. Every single stop attempt. + +#### What I was missing + +Claude already has the checklist from the taskmaster [skill file](https://github.com/micahstubbs/taskmaster/blob/main/SKILL.md). Every Claude Code `SKILL.md` file loads into system context at session start. Claude doesn't need instructions repeated in the hook reason — it just needs to know the specific token to emit. + +So I stripped the reason down to exactly that: + +```bash +DONE_SIGNAL="${DONE_PREFIX}::${SESSION_ID}" + +jq -n --arg reason "$DONE_SIGNAL" '{ decision: "block", reason: $reason }' +``` + +Now the terminal shows one collapsed line: + +``` +● Ran N stop hooks (ctrl+o to expand) + ⎿ Stop hook error: TASKMASTER_DONE::abc123xyz +``` + +Claude sees the signal he needs. I see almost nothing. Both of us get what we need from the same field. + +#### Faster signal detection too + +While I was in there I also changed how the hook detects the done signal. The old version opened the transcript file and scanned potentially hundreds of lines of JSON on every stop attempt. + +The Claude Code hook API passes `last_assistant_message` directly in the hook's input JSON. Checking that first skips the file read in the common case: + +```bash +LAST_MSG=$(echo "$INPUT" | jq -r '.last_assistant_message // ""') +if [ -n "$LAST_MSG" ] && echo "$LAST_MSG" | grep -Fq "$DONE_SIGNAL" 2>/dev/null; then + HAS_DONE_SIGNAL=true +fi + +# Only scan the transcript if the message check didn't match +if [ "$HAS_DONE_SIGNAL" = false ] && [ -f "$TRANSCRIPT" ]; then + if tail -400 "$TRANSCRIPT" 2>/dev/null | grep -Fq "$DONE_SIGNAL"; then + HAS_DONE_SIGNAL=true + fi +fi +``` + +When Claude just emitted the done signal in his last message — the normal case — no transcript parsing happens. + +#### The lesson + +Hook reasons and system context have different jobs. System context (skill files, `CLAUDE.md`) carries persistent instructions that shape behavior across a whole session. Hook reasons carry transient, stop-specific information — the minimum Claude needs right now. + +Here that's: "emit `TASKMASTER_DONE::abc123` and you're done." + +The checklist still runs. The skill enforcement is unchanged. It just doesn't output the skill prompt to my terminal anymore. + +These changes shipped as [v2.3.0](https://github.com/micahstubbs/taskmaster/releases/tag/v2.3.0). + +Read more about how stop decision control and the `reason` field works in the [Claude Code Hooks docs](https://code.claude.com/docs/en/hooks#stop-decision-control). diff --git a/docs/designs/2026-04-28-072245-fork-pattern-adoption.md b/docs/designs/2026-04-28-072245-fork-pattern-adoption.md new file mode 100644 index 0000000..84e7db3 --- /dev/null +++ b/docs/designs/2026-04-28-072245-fork-pattern-adoption.md @@ -0,0 +1,852 @@ +# Design: Fork-Pattern Adoption (T1–T3) + +**Date**: 2026-04-28 +**Source**: `docs/upstream-reviews/blader-taskmaster-forks.md` +**Status**: Draft for review +**Affected version**: targets v4.3.0 (T1), v4.4.0 (T2 conditional), v4.5.0 (T3 opt-in) + +This doc turns the three adoption tiers from the fork review into concrete +designs: file-by-file, env var by env var, with JSON schemas, migration paths, +and tests. Each tier ships independently — T1 has no dependencies, T2 is +conditional on a Codex capability check, T3 layers on top of T1+T2. + +--- + +## Conventions + +- **New env vars** all prefixed `TASKMASTER_` to match existing namespace. +- **Boolean env vars** truthy = `1|true|yes|on` (case-insensitive); else falsy. +- **Atomic file writes** = write to `.tmp.` then `mv` into place. +- **State location** defaults to `${TASKMASTER_STATE_DIR:-${TMPDIR:-/tmp}/taskmaster/state}/`. +- **All new shell scripts** use `set -euo pipefail` and `bash` shebang + (we already require bash for `[[`, `local`, parameter substitution). +- **Tests** live in `tests/` and follow the existing + `bats`/`shellspec`-equivalent pattern — for new tests use the simplest + approach: `bash tests/.test.sh` returning exit 0 on pass. + +--- + +# Tier 1 — Adopt now (low risk, high value) + +## T1.1 — `TASKMASTER_VERIFY_COMMAND` shell-verifier gate + +### Goal + +Let users gate "stop allowed" behind a repo-local shell command. Use cases: +`cargo test`, `pnpm typecheck`, `make ci`, custom smoke scripts. Pairs with +the existing token-based completion (and later with T3's semantic verifier) +as a hard machine check that complements agent self-report. + +### API surface + +| Env var | Default | Meaning | +|---|---|---| +| `TASKMASTER_VERIFY_COMMAND` | unset | Shell command to run when token is seen. Empty/unset = skip. | +| `TASKMASTER_VERIFY_TIMEOUT` | `60` | Seconds before SIGTERM, +5s grace before SIGKILL. | +| `TASKMASTER_VERIFY_MAX_OUTPUT` | `4000` | Bytes of combined stdout+stderr echoed back into block reason. | +| `TASKMASTER_VERIFY_CWD` | unset | If set, `cd` here before invoking. Else inherit hook's cwd. | + +### Behavior + +``` +on stop hook: + ... existing logic up to "HAS_DONE_SIGNAL=true" ... + + if HAS_DONE_SIGNAL == true and TASKMASTER_VERIFY_COMMAND is set: + run command with timeout + if exit 0: + clear counter, allow stop (existing path) + else: + capture last $TASKMASTER_VERIFY_MAX_OUTPUT bytes of output + block with reason that includes: + - "Verifier failed (exit=N)" + - command invoked + - tail of output + - reminder that token alone is insufficient when verifier configured + counter NOT incremented (verifier failure isn't a stop attempt — agent + gets to fix and retry without burning the budget) + + if HAS_DONE_SIGNAL == false: + existing block-with-checklist behavior (verifier doesn't fire — token must + come first to avoid running expensive verifier on every stop attempt) +``` + +### Why token-then-verify rather than verify-on-every-stop + +Two reasons: + +1. The agent emitting the token is a cheap signal of "I think I'm done." It + filters out the dozens of stop attempts per session where the agent is + mid-work and would just be told to keep going by the verifier. We want the + verifier to run ~once per "I think I'm done" event, not 30 times. +2. Avoids surprising users whose verifier is `make test` (slow). Without the + token gate, every stop attempt would block on `make test`. + +### Files affected + +- **NEW** `taskmaster-verify-command.sh` — small library sourced by both + `check-completion.sh` and the codex stop path, exposing + `taskmaster_run_verify_command` returning exit code and capturing bounded + output via a temp file. +- `check-completion.sh` and `hooks/check-completion.sh` — invoke the verifier + in the `HAS_DONE_SIGNAL=true` branch. +- `taskmaster-compliance-prompt.sh` — extend `build_taskmaster_compliance_prompt` + to optionally append a "verifier configured: $cmd" hint when one is set. +- `install.sh` — no changes (env var read at runtime, not install time). +- `docs/SPEC.md` — document new env vars and behavior. + +### Reference implementation sketch + +```bash +# taskmaster-verify-command.sh +taskmaster_run_verify_command() { + local cmd="${TASKMASTER_VERIFY_COMMAND:-}" + local timeout="${TASKMASTER_VERIFY_TIMEOUT:-60}" + local max_output="${TASKMASTER_VERIFY_MAX_OUTPUT:-4000}" + local cwd="${TASKMASTER_VERIFY_CWD:-}" + local out_file + local exit_code + + [ -z "$cmd" ] && return 0 # not configured = pass + + out_file="$(mktemp "${TMPDIR:-/tmp}/taskmaster-verify.XXXXXX")" + trap 'rm -f "$out_file"' RETURN + + if [ -n "$cwd" ]; then + ( cd "$cwd" && timeout --kill-after=5 "$timeout" bash -c "$cmd" ) >"$out_file" 2>&1 & + else + timeout --kill-after=5 "$timeout" bash -c "$cmd" >"$out_file" 2>&1 & + fi + wait "$!" || exit_code=$? + exit_code="${exit_code:-0}" + + TASKMASTER_VERIFY_OUTPUT_TAIL="$(tail -c "$max_output" "$out_file")" + TASKMASTER_VERIFY_EXIT_CODE="$exit_code" + return "$exit_code" +} +``` + +### Testing + +- `tests/verify-command.test.sh`: + - `TASKMASTER_VERIFY_COMMAND="true"` → exit 0, allows stop + - `TASKMASTER_VERIFY_COMMAND="false"` → non-zero, blocks + - `TASKMASTER_VERIFY_COMMAND="sleep 120" TASKMASTER_VERIFY_TIMEOUT=2` → killed, blocks with timeout marker + - `TASKMASTER_VERIFY_COMMAND="yes | head -c 50000"` → output truncated to `TASKMASTER_VERIFY_MAX_OUTPUT` + - `TASKMASTER_VERIFY_COMMAND` unset → no-op, behaves identically to current code + +### Migration + +Zero impact when unset. Existing users see no change. + +### Risks + +| Risk | Mitigation | +|---|---| +| User sets long-running verifier, sessions stall | Default 60s timeout | +| Verifier writes huge output, OOMs hook | `tail -c $MAX_OUTPUT` from temp file, never load full output in memory | +| Verifier needs project root, hook runs elsewhere | `TASKMASTER_VERIFY_CWD` env var | +| Verifier depends on PATH that wrapper doesn't inherit | Document; use absolute paths in env var | + +--- + +## T1.2 — JSON state-file layout + +### Goal + +Replace the bare counter file (`${TMPDIR}/taskmaster/` containing +just an integer) with structured JSON state. Unblocks T1.3, T2.2, and T3 +without each adding its own ad-hoc file. + +### State file location + +``` +${TASKMASTER_STATE_DIR:-${TMPDIR:-/tmp}/taskmaster/state}/.json +``` + +Note the `state/` subfolder — keeps it distinct from existing counter files so +the legacy migration logic can find both. + +### Schema (v1) + +```json +{ + "schema_version": 1, + "session_id": "f1b7d967-3043-4422-9ab3-c35693951c9e", + "created_at": "2026-04-28T07:22:45Z", + "updated_at": "2026-04-28T07:24:01Z", + "stop_count": 3, + "latest_user_prompt": { + "captured_at": "2026-04-28T07:23:10Z", + "turn_id": "abc123", + "prompt": "fix the failing test in foo_test.go" + }, + "last_verifier_run": { + "ran_at": "2026-04-28T07:23:55Z", + "input_hash": "sha256:...", + "complete": false, + "reason": "test still failing on line 42", + "next_action": "run `go test -run TestFoo -v` and fix" + }, + "metadata": {} +} +``` + +`metadata` is an open object for future fields without bumping +`schema_version`. + +### Helper API + +New file `taskmaster-state.sh` (sourced): + +```bash +taskmaster_state_dir # echo path, mkdir -p +taskmaster_state_path # echo full path to .json +taskmaster_state_init # create empty file if missing (idempotent) +taskmaster_state_read # cat the JSON (empty {} if missing) +taskmaster_state_jq # run jq on state, echo result +taskmaster_state_set + # atomic update: read → jq | " = " → tmp → mv +taskmaster_state_increment_stop_count +taskmaster_state_capture_prompt +taskmaster_state_record_verifier_run +``` + +All writes use atomic tmp+mv. Concurrent writers (rare in practice — single +agent per session) coordinate via `flock` on `.lock`. + +### Atomic write pattern + +```bash +taskmaster_state_set() { + local sid="$1" jq_expr="$2" json_value="$3" + local path tmp lock + path="$(taskmaster_state_path "$sid")" + tmp="${path}.tmp.$$" + lock="${path}.lock" + + mkdir -p "$(dirname "$path")" + exec 9>"$lock" + flock 9 + if [ -f "$path" ]; then + jq --argjson v "$json_value" "$jq_expr = \$v" "$path" >"$tmp" + else + jq -n --argjson v "$json_value" \ + --arg sid "$sid" --arg now "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \ + "{schema_version:1, session_id:\$sid, created_at:\$now, updated_at:\$now, stop_count:0} | $jq_expr = \$v" \ + >"$tmp" + fi + mv "$tmp" "$path" + exec 9>&- +} +``` + +### Legacy migration + +On first state read for a session, check for the legacy counter file: + +```bash +legacy="${TMPDIR:-/tmp}/taskmaster/${SESSION_ID}" +if [ -f "$legacy" ] && [ ! -f "$(taskmaster_state_path "$SESSION_ID")" ]; then + count=$(cat "$legacy" 2>/dev/null || echo 0) + taskmaster_state_init "$SESSION_ID" + taskmaster_state_set "$SESSION_ID" '.stop_count' "$count" + rm -f "$legacy" +fi +``` + +Run this exactly once per session (idempotent because file no longer exists +after migration). Net: existing sessions transparently upgrade. + +### Files affected + +- **NEW** `taskmaster-state.sh` +- `check-completion.sh`, `hooks/check-completion.sh` — replace counter + file logic with `taskmaster_state_increment_stop_count` and + `taskmaster_state_jq '.stop_count'` +- `hooks/inject-continue-codex.sh` — same migration +- `install.sh` — copy new script to skill dirs and `chmod +x` +- `uninstall.sh` — remove the new script +- `docs/SPEC.md` — document state schema and location + +### Testing + +- `tests/state.test.sh`: + - Init creates well-formed JSON with schema_version=1 + - Concurrent writers: 100x parallel `taskmaster_state_increment_stop_count`, + final value == 100 (flock works) + - Legacy file detected and migrated, then deleted + - Atomic write: kill -9 mid-write doesn't corrupt main file (tmp file + abandoned, real file untouched) + - jq read of nonexistent path returns empty / null without erroring + +### Migration + +Backward compatible. Existing counter files auto-migrate. New +`TASKMASTER_STATE_DIR` env var lets users relocate. + +### Risks + +| Risk | Mitigation | +|---|---| +| jq not installed | We already require jq; install.sh checks at install time | +| Disk full when writing tmp | tmp is on same fs as target; mv fails atomically; existing state preserved | +| Stale state files accumulate | Out of scope for v4.3.0 — track as follow-up beads issue (TTL cleanup, e.g., delete files older than 30 days on hook startup) | + +--- + +## T1.3 — Hook-internal-prompt detection (tagged-injection) + +### Goal + +Mark every prompt the hook injects so we (and future verifiers) can tell +"this is the user asking" from "this is taskmaster reminding the agent." +mickn's fork relies on substring-match heuristics; that's fragile to wording +changes. We can do better with a single explicit tag. + +### Design choice: explicit magic tag + +Prefix every injected prompt — block reasons, compliance prompts, queue-emitter +follow-ups — with a stable single-line marker: + +``` +[taskmaster:injected v=1 kind=] + +``` + +Where `` is one of `stop-block | followup | compliance | session-start | +verifier-feedback`. + +Detection helper: + +```bash +is_taskmaster_injected_prompt() { + local text="$1" + case "$text" in + "[taskmaster:injected v="*) return 0 ;; + *) return 1 ;; + esac +} +``` + +For backward compatibility (and to handle prompts injected before this +change), include a fallback substring matcher with mickn's exact phrases: + +```bash +is_taskmaster_injected_prompt_legacy() { + local text="$1" + case "$text" in + "` + helper; prepend to `build_taskmaster_compliance_prompt` output +- `check-completion.sh`, `hooks/check-completion.sh` — wrap the `REASON` + string with the tag +- `hooks/inject-continue-codex.sh` — same for queue payloads +- **NEW** `taskmaster-prompt-detect.sh` — exposes + `is_taskmaster_injected_prompt` for use by hooks and verifier +- `docs/SPEC.md` — document the tag format (so external tooling can detect + too) + +### Schema versioning + +The `v=1` field future-proofs the tag. If we ever change semantics (e.g., add +required machine-readable fields), bump to `v=2` and have the detector accept +both. + +### Testing + +- `tests/prompt-detect.test.sh`: + - Tagged prompt → detected + - Legacy mickn substring → detected (back-compat) + - User text containing the word "taskmaster" but not the tag → NOT detected + - Empty string → NOT detected + - Tag with future version `v=99` → still detected (forward-compat: prefix + match `[taskmaster:injected v=`) + +### Migration + +User-visible: each block reason gets a tag line at top. Document in +release notes. + +### Risks + +| Risk | Mitigation | +|---|---| +| Tag confuses users / models | Use plain ASCII, single line; mention in SKILL.md so models know the tag is metadata, not a directive | +| Some prompt path forgets the tag | `is_taskmaster_injected_prompt` falls back to legacy substring matcher | +| Markdown rendering mangles `[...]` | The tag is plain ASCII outside any code block; if a renderer hides it, the legacy substring matcher still works | + +--- + +# Tier 2 — Adopt after upstream-reality check + +T2 depends on a verifiable claim: that the OpenAI Codex CLI exposes native +hooks similar to Claude Code's. The fork review left this as an open +question. T2.0 is a discovery step that gates T2.1 and T2.2. + +## T2.0 — Codex hook capability detection (precondition) + +### Goal + +Determine whether the installed `codex` binary supports `SessionStart`, +`UserPromptSubmit`, and `Stop` hooks via `~/.codex/hooks.json`. Without this, +T2.1/T2.2 cannot be implemented natively and the wrapper architecture stays. + +### Detection plan + +1. `codex --version` → record version +2. `codex --help 2>&1 | grep -i hook` → does help mention hooks? +3. Check `~/.codex/hooks.json` schema in the installed Codex docs (if any) +4. Smoke test: write a minimal `~/.codex/hooks.json` with a `SessionStart` + hook that writes a sentinel file; launch `codex`; confirm sentinel created +5. Same for `UserPromptSubmit` and `Stop` + +Outcome documented in `docs/upstream-reviews/codex-hooks-capability-.md`. + +### Decision gates + +| Outcome | Action | +|---|---| +| All three hooks supported, `Stop` allows `decision: "block"` | Proceed with T2.1, T2.2 | +| Only some supported | Implement what's supported; keep wrapper for the rest | +| None supported | Park T2 indefinitely; T1+T3 deliver the most value anyway | + +## T2.1 — Native Codex hooks (conditional) + +### Goal + +Add a parallel implementation path that uses Codex's native hook system +instead of the PTY wrapper. Both paths coexist; install.sh picks at install +time based on T2.0 detection. Wrapper stays as the fallback for older Codex +versions. + +### New files + +- `hooks/codex-session-start.sh` — emits the SKILL.md context contract + (parallels current `run-taskmaster-codex.sh` startup) +- `hooks/codex-user-prompt-submit.sh` — captures user prompts (T2.2) +- `hooks/codex-stop.sh` — runs the same completion check as + `check-completion.sh` but adapted to Codex's hook input shape + +The Claude side (`hooks/check-completion.sh`) is unchanged. + +### install.sh changes + +```bash +detect_codex_native_hooks() { + command -v codex >/dev/null 2>&1 || return 1 + # Capability probe — see T2.0 for actual detection logic + codex --help 2>&1 | grep -qi 'hooks.json' || return 1 + return 0 +} + +if [ "$INSTALL_TARGET" = "codex" ] || [ "$INSTALL_TARGET" = "auto" ] || [ "$INSTALL_TARGET" = "both" ]; then + if detect_codex_native_hooks && [ "${TASKMASTER_CODEX_MODE:-auto}" != "wrapper" ]; then + install_codex_native # writes ~/.codex/hooks.json, links new hooks + else + install_codex_wrapper # current behavior + fi +fi +``` + +Override env var `TASKMASTER_CODEX_MODE=wrapper|native|auto` lets the user +force either path even when both work. + +### Hooks.json layout + +```json +{ + "hooks": { + "SessionStart": [{"command": "~/.codex/skills/taskmaster/hooks/codex-session-start.sh"}], + "UserPromptSubmit": [{"command": "~/.codex/skills/taskmaster/hooks/codex-user-prompt-submit.sh"}], + "Stop": [{"command": "~/.codex/skills/taskmaster/hooks/codex-stop.sh"}] + } +} +``` + +(Exact structure depends on T2.0 findings; mickn's install.sh writes +`~/.codex/hooks.json` and his `install.sh` is the reference implementation +to start from.) + +### Coexistence with wrapper + +The wrapper writes `inject.*.txt` queue files; the native path writes JSON +state. Both populate `~/.codex/taskmaster/state/.json` (T1.2). The +codex-stop.sh script reads the same state file, so the verifier (T3) works +identically on both paths. + +### Files affected + +- **NEW** `hooks/codex-session-start.sh`, `hooks/codex-user-prompt-submit.sh`, + `hooks/codex-stop.sh` +- `install.sh` — capability detection, branch, write hooks.json (when + native), preserve wrapper install (when not) +- `uninstall.sh` — remove hooks.json entries cleanly without clobbering + user-added entries (jq-based merge/unmerge, NOT file replacement) +- `docs/SPEC.md` — document both paths and the chooser logic +- `tests/install.test.sh` — both paths covered +- **NEW** `tests/codex-stop.test.sh` + +### Migration + +- New installs on capable Codex: native by default +- Existing wrapper installs: re-run `install.sh` to upgrade; or set + `TASKMASTER_CODEX_MODE=wrapper` to stay on wrapper +- Wrapper code is NOT deleted in this change — strangler-fig pattern, keep + both, monitor breakage for one full release cycle, only then prune + +### Risks + +| Risk | Mitigation | +|---|---| +| `~/.codex/hooks.json` already has user entries | Merge with jq; don't overwrite | +| Codex hook contract changes between versions | Pin tested versions in docs; add a `codex --version` check at hook entry that warns on unknown versions | +| Native and wrapper both run accidentally | install.sh sets one OR the other; sanity check at hook entry that we're not double-firing | + +## T2.2 — UserPromptSubmit goal capture + +### Goal + +Persist the user's actual goal for each turn to state, so T3's verifier has +something concrete to verify against (rather than guessing from transcript). + +### Hook input + +Codex passes JSON on stdin (per mickn's reference; T2.0 must confirm exact +field names): + +```json +{ + "session_id": "...", + "turn_id": "...", + "prompt": "", + "cwd": "...", + "model": "..." +} +``` + +### Behavior + +```bash +# read input +INPUT=$(cat) +SID=$(jq -r .session_id <<<"$INPUT") +TID=$(jq -r .turn_id <<<"$INPUT") +PROMPT=$(jq -r .prompt <<<"$INPUT") + +# filter +if is_taskmaster_injected_prompt "$PROMPT"; then exit 0; fi +if is_environment_context_only_prompt "$PROMPT"; then exit 0; fi +if is_agents_md_prelude "$PROMPT"; then exit 0; fi + +# capture +taskmaster_state_capture_prompt "$SID" "$TID" "$PROMPT" +``` + +### Wrapper-side parity + +For users on the wrapper path (T2.1 not active), we don't have a +UserPromptSubmit event. Two options: + +1. **Skip goal capture** — verifier (T3) has to infer goal from transcript. + Acceptable but lower-quality verifications. +2. **Parse the Codex session log** — `inject-continue-codex.sh` already + tails the session log for `task_complete` events; teach it to also handle + user-prompt events and write to state. + +Option 2 is mostly free since we're already tailing the log. Implement it +during T2.2. + +### Filters (in priority order) + +1. Tagged taskmaster-injected (T1.3) → skip +2. Legacy substring match → skip +3. Pure `...` block → skip +4. `# AGENTS.md instructions for ...` prelude only → skip +5. Else → capture + +Filters live in `taskmaster-prompt-detect.sh` (T1.3) — extend that file with +helpers for env-context and agents-md detection. + +### Files affected + +- `hooks/codex-user-prompt-submit.sh` (new in T2.1, populated here) +- `taskmaster-prompt-detect.sh` (T1.3) — extended with new filters +- `hooks/inject-continue-codex.sh` — extended to also write user prompts to + state (wrapper-side parity) +- `tests/prompt-detect.test.sh` — extended + +### Testing + +- All 4 filter classes correctly skipped +- A real user prompt is captured into `latest_user_prompt` and history +- Concurrent prompts in same session don't lose data (per-turn keying via + `turns[$turn_id]`) +- Prompts > 100KB are stored in full (no truncation at capture time; + truncation happens at verifier-input time, T3.1) + +--- + +# Tier 3 — Adopt with explicit knobs + +## T3.1 — Semantic completion verifier + +### Goal + +Replace "agent self-reports done" with "second agent verifies done" at +opt-in. When enabled, the stop hook calls an LLM with the captured user goal, +the agent's last message, and a transcript excerpt; the LLM returns +`{complete, reason, next_action}`. This catches cases where the agent +declares victory after partial work. + +**Default OFF.** Users opt in explicitly via env var. + +### API surface + +| Env var | Default | Meaning | +|---|---|---| +| `TASKMASTER_COMPLETION_VERIFY` | `0` | Master switch. Truthy = verifier runs. | +| `TASKMASTER_COMPLETION_PROVIDER` | `auto` | `anthropic\|openai\|command\|auto` | +| `TASKMASTER_COMPLETION_MODEL` | provider-dependent | See below | +| `TASKMASTER_COMPLETION_VERIFIER_COMMAND` | unset | Custom shell verifier; overrides built-in | +| `TASKMASTER_COMPLETION_TIMEOUT` | `30` | Seconds, then fail-open with logged warning | +| `TASKMASTER_COMPLETION_MAX_CONTEXT_CHARS` | `20000` | Total chars sent to LLM (input+goal+last_msg+transcript_tail) | +| `TASKMASTER_COMPLETION_CACHE` | `1` | Cache by input hash; `0` disables | +| `TASKMASTER_COMPLETION_FAIL_OPEN` | `1` | On API/timeout error: `1` allow stop, `0` block stop | + +### Provider auto-detection + +``` +provider == "auto": + if ANTHROPIC_API_KEY set → "anthropic" with model claude-haiku-4-5 + elif OPENAI_API_KEY set → "openai" with model gpt-5.4-mini + elif TASKMASTER_COMPLETION_VERIFIER_COMMAND set → "command" + else → log warning, fall back to legacy token detection (don't block) +``` + +Default Anthropic over OpenAI for two reasons: (1) we ship Claude users +predominantly; (2) Haiku is cheaper than gpt-5.4-mini at comparable quality +for this task. Users on OpenAI infra can set +`TASKMASTER_COMPLETION_PROVIDER=openai` explicitly. + +### Verifier I/O contract + +Same shape regardless of provider. Custom commands implement this protocol. + +**Input (stdin JSON)**: + +```json +{ + "schema_version": 1, + "session_id": "...", + "user_goal": "", + "last_assistant_message": "", + "transcript_excerpt": "" +} +``` + +**Output (stdout JSON)**: + +```json +{ + "complete": true, + "reason": "test passes; types check; no TODO comments added", + "next_action": null, + "evidence": "ran `pnpm test` mentally based on transcript" +} +``` + +When `complete=false`, `next_action` MUST be a single concrete next step +(not a list), and it gets injected verbatim into the block reason. + +### Built-in verifier prompt structure + +The built-in `taskmaster-completion-verifier.py` builds a prompt like: + +``` +You are a strict completion verifier. Your job is to decide whether the +agent has fully achieved the user's stated goal. + +USER GOAL: +{user_goal} + +AGENT'S LAST MESSAGE: +{last_assistant_message} + +TRANSCRIPT EXCERPT (most recent activity): +{transcript_excerpt} + +Respond with JSON only: {"complete": bool, "reason": str, "next_action": +str | null, "evidence": str}. + +Strict rules: +- "Made progress" is not complete. Only "goal fully achieved" is complete. +- If verification steps in the goal are unrun, complete = false. +- If the agent says "I cannot" without trying, complete = false. +- If the user explicitly deprioritized something, treat it as resolved. +``` + +Port mickn's secret-redaction regex set verbatim before the prompt is +constructed. + +### Caching + +Hash inputs (sha256 of `user_goal + "|" + last_assistant_message + "|" + tail(transcript_excerpt, 4000)`). +Store last hash + result in `state.last_verifier_run` (T1.2 schema). + +Cache hit logic: +- If input hash matches last run AND the previous result was `complete=true`, + reuse → allow stop +- If input hash matches AND previous result was `complete=false`, reuse → + block with same reason (avoids re-querying when agent retried stop without + any new work) +- If input hash differs → new query + +Net effect: an agent that hammers stop without doing new work pays for one +verifier call, not N. + +### Integration with stop hook + +``` +on stop: + ...existing logic up to HAS_DONE_SIGNAL detection... + + if TASKMASTER_COMPLETION_VERIFY is truthy: + if no latest_user_prompt captured (T2.2 inactive or first turn): + log warning, fall through to token-based detection + else: + run verifier (with cache) + if complete: + if TASKMASTER_VERIFY_COMMAND set: run that too (T1.1) + if exit 0: allow stop + else: block with verifier output + else: allow stop + else: + block with verifier reason + next_action (counter still NOT + incremented when verifier blocks — same rationale as T1.1) + else: + existing token-based logic +``` + +### Files affected + +- **NEW** `taskmaster-completion-verifier.py` (Python, ported from mickn with + redaction regexes intact, provider auto-detection added, caching added, + Anthropic-first defaults) +- **NEW** `taskmaster-completion-verifier-anthropic.py` (or single file with + provider abstraction; single file is simpler) +- `check-completion.sh`, `hooks/check-completion.sh`, `hooks/codex-stop.sh` — + invoke verifier when configured +- `taskmaster-state.sh` — `taskmaster_state_record_verifier_run` helper + (introduced in T1.2 schema, used here) +- `install.sh` — copy verifier scripts; `chmod +x`; check Python 3 available + if `TASKMASTER_COMPLETION_VERIFY=1` at install time (warn, don't block) +- `docs/SPEC.md` — full env-var table +- **NEW** `docs/cost-and-performance.md` — back-of-envelope cost per session + for each provider/model + +### Testing + +- `tests/verifier.test.sh`: + - `TASKMASTER_COMPLETION_VERIFY=0` → verifier never runs (no API calls + made; pkill any rogue processes) + - Mock provider via `TASKMASTER_COMPLETION_VERIFIER_COMMAND=tests/mock-verifier.sh` + that returns scripted JSON + - Cache hit: same input twice → only first call hits the mock + - Cache miss: input changes → second call hits mock + - Timeout: mock that sleeps > timeout → verifier exits with fail-open + - `TASKMASTER_COMPLETION_FAIL_OPEN=0`: timeout blocks stop instead + - Provider auto-detection: ANTHROPIC_API_KEY set → anthropic chosen + - Secret redaction: mock that echoes input verifies `sk-...` was redacted + +### Migration + +Default OFF means existing users see no change. Adoption is opt-in by +setting `TASKMASTER_COMPLETION_VERIFY=1`. + +### Risks + +| Risk | Mitigation | +|---|---| +| API outage blocks all stops | `TASKMASTER_COMPLETION_FAIL_OPEN=1` (default); log warning | +| Cost surprise | Default OFF; document per-provider per-session cost in cost-and-performance.md | +| Wrong default model | `TASKMASTER_COMPLETION_MODEL` overrides; document; revisit after first month of usage data | +| Secrets leaked to LLM | Port regex set verbatim; add tests that validate redaction; consider opt-out via `TASKMASTER_COMPLETION_REDACT=0` only for users who explicitly want raw context (don't make this the default) | +| Verifier disagrees with token | Verifier wins. Token alone is insufficient when verifier is on. Document this clearly. | +| Latency added to every stop | Cache cuts repeated queries; default 30s timeout caps worst case | + +--- + +# Open questions for follow-up + +These were flagged in the fork review and are restated here as design-time +risks: + +1. **Codex native hooks reality** (gates T2). Action: T2.0 capability probe + ASAP, on the user's installed Codex version. If supported, design proceeds + as written. If not, T2 is parked and T1+T3 still ship. + +2. **Verifier model selection** (impacts T3 cost/quality). Action: after T3 + ships behind the OFF default, pilot with `claude-haiku-4-5` and + `gpt-5.4-mini` on real sessions, compare false-positive and + false-negative rates, document findings in + `docs/cost-and-performance.md`. + +3. **Transcript-tail caching invariant** (impacts T3 cost). Action: confirm + that hashing the last 4000 chars of transcript_excerpt is stable enough + that "agent retried stop after producing 1KB of new output" reliably + misses the cache (we want a fresh verifier call), while "agent retried + stop with no new output" reliably hits. + +4. **Hooks.json merge semantics** (impacts T2.1). Action: T2.0 should also + verify whether multiple `Stop` hooks compose (run all? short-circuit on + first block? user-defined order?). If only one is allowed, install.sh + needs a "we'd overwrite your existing Stop hook" warning gate. + +# Rollout sequencing + +| Phase | Tier(s) | Gating | +|---|---|---| +| 1 | T1.1, T1.2, T1.3 | None — independent of upstream | +| 2 | T2.0 (capability probe) | Phase 1 merged | +| 3 | T2.1, T2.2 | T2.0 outcome positive | +| 4 | T3.1 | T1+T2 merged | + +Each phase is its own PR with its own version bump. + +# Out of scope (deliberately) + +- Stale state-file cleanup / TTL — separate beads issue +- POSIX-portable install.sh (mickn's `46f6a44` pattern) — incompatible with + our bash-only constructs; skipped per fork review +- Single-fire stop philosophy (`gjlondon`) — incompatible with our + explicit-token contract; skipped per fork review +- OpenClaw platform port (`levi-openclaw`) — different platform; skipped +- Auto-generated CLAUDE.md (`Semenka`) — generic content, no project value; + skipped diff --git a/docs/plans/2026-04-28-083546-t1-fork-pattern-adoption.md b/docs/plans/2026-04-28-083546-t1-fork-pattern-adoption.md new file mode 100644 index 0000000..ff4647e --- /dev/null +++ b/docs/plans/2026-04-28-083546-t1-fork-pattern-adoption.md @@ -0,0 +1,1509 @@ +# Tier 1 Fork-Pattern Adoption — Implementation Plan + +> **For Claude:** REQUIRED SUB-SKILL: Use `superpowers:executing-plans` to implement this plan task-by-task. + +**Goal:** Ship the three Tier-1 patterns from `docs/designs/2026-04-28-072245-fork-pattern-adoption.md`: a `TASKMASTER_VERIFY_COMMAND` shell-verifier gate (T1.1), tagged hook-internal-prompt detection (T1.3), and a JSON state-file layout with legacy-counter migration (T1.2). Bumps version 4.2.0 → 4.3.0. + +**Architecture:** Three independent additions. T1.1 adds a sourced helper (`taskmaster-verify-command.sh`) called from both stop-hook variants when the done token is seen. T1.3 introduces an explicit `[taskmaster:injected v=1 kind=...]` tag that wraps every prompt the hook injects, plus a sourced detector (`taskmaster-prompt-detect.sh`) with a legacy substring-match fallback for back-compat. T1.2 replaces the bare counter file at `${TMPDIR}/taskmaster/` with a `flock`-protected JSON file at `${TASKMASTER_STATE_DIR:-${TMPDIR}/taskmaster/state}/.json` exposed through a sourced library (`taskmaster-state.sh`); a one-time migration on first read absorbs the legacy counter and deletes the old file. + +**Tech Stack:** bash 5+, `jq`, `flock`, `timeout` (GNU coreutils), `mktemp`, plain-bash test scripts. + +**Order rationale:** Phase A first (T1.1) — smallest, no migration. Phase B (T1.3) — independent, but touches every prompt-injection site. Phase C (T1.2) — biggest change because counter usage is in three files; doing it last means we don't rewrite already-touched code twice. Phase D — version bump, CHANGELOG, tag. + +**Test invocation:** every new test is `bash tests/.test.sh` returning exit 0 on pass, non-zero on fail. Existing tests follow the same convention. + +--- + +## Pre-flight + +**Step 1: Confirm clean working tree and current version** + +Run: `git status && grep '^version:' SKILL.md` +Expected: `working tree clean` and `version: 4.2.0`. If anything modified, stop and ask the user. + +**Step 2: Confirm required tools are installed** + +Run: `which jq flock timeout mktemp` +Expected: all four resolve. If `timeout` is missing on macOS, install GNU coreutils (`brew install coreutils`) and use `gtimeout`. The plan assumes `timeout`; on macOS, swap globally before starting Phase A. + +--- + +# Phase A — T1.1: `TASKMASTER_VERIFY_COMMAND` + +### Task A1: Write the failing tests for the verify-command helper + +**Files:** +- Create: `tests/verify-command.test.sh` + +**Step 1: Write the test file** + +```bash +#!/usr/bin/env bash +# +# Tests for taskmaster-verify-command.sh. +# +set -euo pipefail + +REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" +LIB="$REPO_ROOT/taskmaster-verify-command.sh" + +# shellcheck disable=SC1090 +source "$LIB" + +PASS_COUNT=0 +FAIL_COUNT=0 + +assert() { + local name="$1" + local condition="$2" + if eval "$condition"; then + printf 'ok %s\n' "$name" + PASS_COUNT=$((PASS_COUNT + 1)) + else + printf 'FAIL %s\n' "$name" >&2 + FAIL_COUNT=$((FAIL_COUNT + 1)) + fi +} + +# --- Unset command is a no-op pass --- +unset TASKMASTER_VERIFY_COMMAND TASKMASTER_VERIFY_TIMEOUT TASKMASTER_VERIFY_MAX_OUTPUT TASKMASTER_VERIFY_CWD +TASKMASTER_VERIFY_OUTPUT_TAIL="" +TASKMASTER_VERIFY_EXIT_CODE="" +taskmaster_run_verify_command +assert "unset command returns 0" "[[ \"$?\" == \"0\" ]]" +assert "unset command leaves exit code blank" "[[ -z \"$TASKMASTER_VERIFY_EXIT_CODE\" ]]" + +# --- Successful command --- +TASKMASTER_VERIFY_COMMAND="true" +taskmaster_run_verify_command +rc=$? +assert "successful command returns 0" "[[ \"$rc\" == \"0\" ]]" +assert "successful command sets exit code 0" "[[ \"$TASKMASTER_VERIFY_EXIT_CODE\" == \"0\" ]]" + +# --- Failing command --- +TASKMASTER_VERIFY_COMMAND="exit 7" +set +e; taskmaster_run_verify_command; rc=$?; set -e +assert "failing command propagates exit code" "[[ \"$rc\" == \"7\" ]]" +assert "failing command captures exit code 7" "[[ \"$TASKMASTER_VERIFY_EXIT_CODE\" == \"7\" ]]" + +# --- Output captured --- +TASKMASTER_VERIFY_COMMAND='echo hello-world; echo to-stderr >&2' +taskmaster_run_verify_command +assert "stdout captured" '[[ "$TASKMASTER_VERIFY_OUTPUT_TAIL" == *hello-world* ]]' +assert "stderr captured (combined)" '[[ "$TASKMASTER_VERIFY_OUTPUT_TAIL" == *to-stderr* ]]' + +# --- Output truncation --- +TASKMASTER_VERIFY_COMMAND='yes hello | head -c 50000' +TASKMASTER_VERIFY_MAX_OUTPUT=200 +taskmaster_run_verify_command +unset TASKMASTER_VERIFY_MAX_OUTPUT +assert "output truncated to MAX_OUTPUT bytes" "[[ \"\${#TASKMASTER_VERIFY_OUTPUT_TAIL}\" -le 200 ]]" + +# --- Timeout --- +TASKMASTER_VERIFY_COMMAND='sleep 30' +TASKMASTER_VERIFY_TIMEOUT=1 +set +e; START=$(date +%s); taskmaster_run_verify_command; rc=$?; END=$(date +%s); set -e +unset TASKMASTER_VERIFY_TIMEOUT +ELAPSED=$((END - START)) +assert "timeout fires within 10s" "[[ \"$ELAPSED\" -lt 10 ]]" +assert "timeout produces non-zero exit" "[[ \"$rc\" != \"0\" ]]" + +# --- CWD respected --- +TMPCWD="$(mktemp -d)" +trap 'rm -rf "$TMPCWD"' EXIT +TASKMASTER_VERIFY_COMMAND='pwd' +TASKMASTER_VERIFY_CWD="$TMPCWD" +taskmaster_run_verify_command +unset TASKMASTER_VERIFY_CWD +TMPCWD_REAL="$(cd "$TMPCWD" && pwd -P)" +assert "cwd honored" '[[ "$TASKMASTER_VERIFY_OUTPUT_TAIL" == *"$TMPCWD_REAL"* ]]' + +printf '\n%d passed, %d failed\n' "$PASS_COUNT" "$FAIL_COUNT" +[[ "$FAIL_COUNT" == 0 ]] +``` + +**Step 2: Run test to verify it fails** + +Run: `bash tests/verify-command.test.sh` +Expected: FAIL with `taskmaster-verify-command.sh: No such file or directory` (because the lib doesn't exist yet). + +**Step 3: Commit the failing test** + +```bash +git add tests/verify-command.test.sh +git commit -m "test: add failing tests for taskmaster-verify-command lib (T1.1)" +``` + +--- + +### Task A2: Implement `taskmaster-verify-command.sh` + +**Files:** +- Create: `taskmaster-verify-command.sh` + +**Step 1: Write the helper** + +```bash +#!/usr/bin/env bash +# +# Optional shell verifier gate for the Taskmaster stop hook. +# +# When TASKMASTER_VERIFY_COMMAND is set, calling taskmaster_run_verify_command +# runs the command with a timeout, captures combined output (truncated), and +# sets: +# TASKMASTER_VERIFY_EXIT_CODE the command's exit code +# TASKMASTER_VERIFY_OUTPUT_TAIL last $TASKMASTER_VERIFY_MAX_OUTPUT bytes of output +# It returns the command's exit code (0 = pass, non-zero = block). +# When unset, returns 0 with empty fields (no-op pass). +# +# Env knobs: +# TASKMASTER_VERIFY_COMMAND command string; empty/unset = skip +# TASKMASTER_VERIFY_TIMEOUT seconds before SIGTERM (default 60); +5s grace SIGKILL +# TASKMASTER_VERIFY_MAX_OUTPUT bytes of output kept (default 4000) +# TASKMASTER_VERIFY_CWD optional cwd override +# + +taskmaster_run_verify_command() { + TASKMASTER_VERIFY_OUTPUT_TAIL="" + TASKMASTER_VERIFY_EXIT_CODE="" + + local cmd="${TASKMASTER_VERIFY_COMMAND:-}" + if [[ -z "$cmd" ]]; then + return 0 + fi + + local timeout_sec="${TASKMASTER_VERIFY_TIMEOUT:-60}" + local max_output="${TASKMASTER_VERIFY_MAX_OUTPUT:-4000}" + local cwd="${TASKMASTER_VERIFY_CWD:-}" + local out_file rc=0 + + out_file="$(mktemp "${TMPDIR:-/tmp}/taskmaster-verify.XXXXXX")" + + if [[ -n "$cwd" ]]; then + set +e + ( cd "$cwd" && timeout --kill-after=5 "$timeout_sec" bash -c "$cmd" ) \ + >"$out_file" 2>&1 + rc=$? + set -e + else + set +e + timeout --kill-after=5 "$timeout_sec" bash -c "$cmd" >"$out_file" 2>&1 + rc=$? + set -e + fi + + TASKMASTER_VERIFY_OUTPUT_TAIL="$(tail -c "$max_output" "$out_file" 2>/dev/null || true)" + TASKMASTER_VERIFY_EXIT_CODE="$rc" + + rm -f "$out_file" + return "$rc" +} +``` + +**Step 2: Make it executable** + +```bash +chmod +x taskmaster-verify-command.sh +``` + +**Step 3: Run the test to verify it passes** + +Run: `bash tests/verify-command.test.sh` +Expected: `8 passed, 0 failed`. If any test fails, fix the implementation — do NOT change the test. + +**Step 4: Commit the implementation** + +```bash +git add taskmaster-verify-command.sh +git commit -m "feat: add taskmaster-verify-command lib for shell-verifier gate (T1.1)" +``` + +--- + +### Task A3: Wire the verifier into `check-completion.sh` + +**Files:** +- Modify: `check-completion.sh` + +**Step 1: Source the helper near the top (after the compliance-prompt source)** + +Find the line that sources `taskmaster-compliance-prompt.sh` (around line 20). Immediately after it, add: + +```bash +# shellcheck disable=SC1091 +source "$SCRIPT_DIR/taskmaster-verify-command.sh" +``` + +**Step 2: Insert the verifier call inside the `HAS_DONE_SIGNAL=true` branch** + +Find the block (around line 92): + +```bash +if [ "$HAS_DONE_SIGNAL" = true ]; then + rm -f "$COUNTER_FILE" + exit 0 +fi +``` + +Replace with: + +```bash +if [ "$HAS_DONE_SIGNAL" = true ]; then + if [ -n "${TASKMASTER_VERIFY_COMMAND:-}" ]; then + if taskmaster_run_verify_command; then + rm -f "$COUNTER_FILE" + exit 0 + else + VERIFY_REASON="TASKMASTER: verifier failed (exit=${TASKMASTER_VERIFY_EXIT_CODE}). Command: ${TASKMASTER_VERIFY_COMMAND} + +Output (last ${TASKMASTER_VERIFY_MAX_OUTPUT:-4000} bytes): +${TASKMASTER_VERIFY_OUTPUT_TAIL} + +Token alone is insufficient when a verifier is configured. Fix the failures and try again." + jq -n --arg reason "$VERIFY_REASON" '{ decision: "block", reason: $reason }' + exit 0 + fi + fi + rm -f "$COUNTER_FILE" + exit 0 +fi +``` + +**Step 3: Sanity-check the script** + +Run: `bash -n check-completion.sh` +Expected: no output (parses cleanly). + +**Step 4: Smoke test the integration manually** + +Run: +```bash +TASKMASTER_VERIFY_COMMAND="false" \ + echo '{"session_id":"smoke-A3","transcript_path":"/dev/null","last_assistant_message":"TASKMASTER_DONE::smoke-A3"}' \ + | bash check-completion.sh +``` +Expected: a JSON object with `"decision":"block"` and a `"reason"` field that contains `verifier failed (exit=1)`. + +Run again with `TASKMASTER_VERIFY_COMMAND="true"`. Expected: empty output, exit 0. + +--- + +### Task A4: Wire the verifier into `hooks/check-completion.sh` + +**Files:** +- Modify: `hooks/check-completion.sh` + +Apply the **same** two edits as A3, but the source path is `$SCRIPT_DIR/../taskmaster-verify-command.sh` (one directory up). + +**Step 1: Source the helper** + +After the existing `source "$SCRIPT_DIR/../taskmaster-compliance-prompt.sh"` line, add: + +```bash +# shellcheck disable=SC1091 +source "$SCRIPT_DIR/../taskmaster-verify-command.sh" +``` + +**Step 2: Insert the verifier call** in the same `HAS_DONE_SIGNAL=true` branch (same code as A3). + +**Step 3: Sanity-check** + +Run: `bash -n hooks/check-completion.sh` +Expected: no output. + +--- + +### Task A5: Update `install.sh` to copy and chmod the new file + +**Files:** +- Modify: `install.sh` + +**Step 1: Find the `copy_skill_files` function (around line 49) and add a `safe_copy` line for the new file** + +Locate this block: +```bash +safe_copy "$SCRIPT_DIR/taskmaster-compliance-prompt.sh" "$skill_dir/taskmaster-compliance-prompt.sh" +``` + +Add immediately below it: +```bash +safe_copy "$SCRIPT_DIR/taskmaster-verify-command.sh" "$skill_dir/taskmaster-verify-command.sh" +``` + +**Step 2: Add a `chmod +x` line** + +Locate the `chmod +x "$skill_dir/taskmaster-compliance-prompt.sh"` line. Add immediately below: +```bash +chmod +x "$skill_dir/taskmaster-verify-command.sh" +``` + +**Step 3: Sanity-check** + +Run: `bash -n install.sh` +Expected: no output. + +--- + +### Task A6: Update `uninstall.sh` to remove the new file + +**Files:** +- Modify: `uninstall.sh` + +**Step 1: Find where compliance-prompt.sh is removed and add a parallel rm for verify-command.sh.** + +Locate `rm -f "$skill_dir/taskmaster-compliance-prompt.sh"` (or grep for `taskmaster-compliance-prompt.sh` in `uninstall.sh`). Add immediately below: + +```bash +rm -f "$skill_dir/taskmaster-verify-command.sh" +``` + +**Step 2: Sanity-check** + +Run: `bash -n uninstall.sh` + +--- + +### Task A7: Document the new env vars in `docs/SPEC.md` + +**Files:** +- Modify: `docs/SPEC.md` + +**Step 1: Find the "Configuration" section (section 5) and add a subsection for verify-command env vars.** + +Add at the end of the configuration section: + +```markdown +### 5.x Optional verifier command + +| Env var | Default | Meaning | +|---|---|---| +| `TASKMASTER_VERIFY_COMMAND` | unset | Shell command run when the done token is seen. Empty/unset = skip. | +| `TASKMASTER_VERIFY_TIMEOUT` | `60` | Seconds before SIGTERM, +5s grace before SIGKILL. | +| `TASKMASTER_VERIFY_MAX_OUTPUT` | `4000` | Bytes of combined stdout+stderr echoed back into the block reason. | +| `TASKMASTER_VERIFY_CWD` | unset | If set, `cd` here before invoking. Else inherit hook's cwd. | + +When `TASKMASTER_VERIFY_COMMAND` is set, stop is allowed only when (a) the +done token is present **and** (b) the command exits 0. A failing verifier +overrides token-based completion and blocks with the command's exit code and +truncated output. + +The verifier runs **only** when the done token is present, not on every stop +attempt — this keeps slow verifiers (test suites, builds) from gating +mid-work stop attempts. +``` + +(Replace `5.x` with the next available subsection number when adding.) + +--- + +### Task A8: Phase A end-to-end run + commit + +**Step 1: Run all tests** + +Run: `bash tests/verify-command.test.sh` +Expected: `8 passed, 0 failed`. + +**Step 2: Confirm syntax across all touched scripts** + +Run: `bash -n check-completion.sh hooks/check-completion.sh install.sh uninstall.sh taskmaster-verify-command.sh` +Expected: no output. + +**Step 3: Commit Phase A integration** + +```bash +git add check-completion.sh hooks/check-completion.sh install.sh uninstall.sh docs/SPEC.md +git commit -m "feat: gate stop on TASKMASTER_VERIFY_COMMAND when token present (T1.1) + +When TASKMASTER_VERIFY_COMMAND is set, the stop hook runs the command +after the done token is detected. Exit 0 allows stop; non-zero blocks +with a truncated output dump. Verifier only fires when the token is +present, so mid-work stop attempts don't pay the cost of a slow verifier." +``` + +--- + +# Phase B — T1.3: Tagged hook-internal-prompt detection + +### Task B1: Write the failing tests for the prompt-detect helper + +**Files:** +- Create: `tests/prompt-detect.test.sh` + +**Step 1: Write the test file** + +```bash +#!/usr/bin/env bash +# +# Tests for taskmaster-prompt-detect.sh. +# +set -euo pipefail + +REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" +LIB="$REPO_ROOT/taskmaster-prompt-detect.sh" + +# shellcheck disable=SC1090 +source "$LIB" + +PASS_COUNT=0 +FAIL_COUNT=0 + +assert_detected() { + local name="$1" + local text="$2" + if is_taskmaster_injected_prompt "$text"; then + printf 'ok %s\n' "$name" + PASS_COUNT=$((PASS_COUNT + 1)) + else + printf 'FAIL %s\n' "$name" >&2 + FAIL_COUNT=$((FAIL_COUNT + 1)) + fi +} + +assert_not_detected() { + local name="$1" + local text="$2" + if is_taskmaster_injected_prompt "$text"; then + printf 'FAIL %s (false positive)\n' "$name" >&2 + FAIL_COUNT=$((FAIL_COUNT + 1)) + else + printf 'ok %s\n' "$name" + PASS_COUNT=$((PASS_COUNT + 1)) + fi +} + +# --- Tag detection --- +assert_detected "tagged stop-block" \ + "[taskmaster:injected v=1 kind=stop-block] +TASKMASTER (1): Stop is blocked..." + +assert_detected "tagged followup" \ + "[taskmaster:injected v=1 kind=followup] +continue" + +assert_detected "tagged compliance" "[taskmaster:injected v=1 kind=compliance]" + +# --- Forward-compat: future schema version still detected --- +assert_detected "future schema v=99" "[taskmaster:injected v=99 kind=anything]" + +# --- Legacy substring matches (back-compat with mickn's prompts and our own) --- +assert_detected "legacy: ..." +assert_detected "legacy: Stop is blocked" "Stop is blocked until completion is explicitly confirmed." +assert_detected "legacy: TASKMASTER (N) label" "TASKMASTER (5/100): Stop is blocked..." +assert_detected "legacy: TASKMASTER (N) label, no max" "TASKMASTER (5): Stop is blocked..." +assert_detected "legacy: Goal not yet verified complete" "Goal not yet verified complete." +assert_detected "legacy: Recent tool errors were detected" "Recent tool errors were detected." + +# --- Negatives --- +assert_not_detected "empty string" "" +assert_not_detected "real user prompt" "fix the failing test in foo_test.go" +assert_not_detected "user mentions taskmaster word" "I want to use taskmaster for this project" +assert_not_detected "tag-like but malformed" "[taskmaster:injected]" +assert_not_detected "tag-like but missing v=" "[taskmaster:injected kind=stop-block]" + +# --- generate_taskmaster_injected_tag produces a parseable tag --- +TAG="$(generate_taskmaster_injected_tag stop-block)" +assert_detected "generated tag is detectable" "$TAG" +[[ "$TAG" == "[taskmaster:injected v=1 kind=stop-block]" ]] && { + printf 'ok generated tag exact format\n'; PASS_COUNT=$((PASS_COUNT + 1)); +} || { + printf 'FAIL generated tag exact format (got: %s)\n' "$TAG" >&2; FAIL_COUNT=$((FAIL_COUNT + 1)); +} + +printf '\n%d passed, %d failed\n' "$PASS_COUNT" "$FAIL_COUNT" +[[ "$FAIL_COUNT" == 0 ]] +``` + +**Step 2: Run to verify it fails** + +Run: `bash tests/prompt-detect.test.sh` +Expected: FAIL with `taskmaster-prompt-detect.sh: No such file or directory`. + +**Step 3: Commit failing tests** + +```bash +git add tests/prompt-detect.test.sh +git commit -m "test: add failing tests for taskmaster-prompt-detect lib (T1.3)" +``` + +--- + +### Task B2: Implement `taskmaster-prompt-detect.sh` + +**Files:** +- Create: `taskmaster-prompt-detect.sh` + +**Step 1: Write the lib** + +```bash +#!/usr/bin/env bash +# +# Detect prompts that Taskmaster itself injected, so they don't get +# treated as fresh user goals by downstream consumers (T2.2 user-prompt +# capture, T3 verifier). +# +# Two-tier detection: +# 1. Forward path: explicit `[taskmaster:injected v= kind=]` tag +# on the first non-empty line. Forward-compatible across schema bumps. +# 2. Legacy fallback: substring match against known wording from this +# project and from mickn/taskmaster's fork. +# + +readonly TASKMASTER_INJECTED_TAG_VERSION=1 + +# Emit the canonical tag for a given kind. Caller prepends to their prompt. +# Kinds: stop-block, followup, compliance, session-start, verifier-feedback. +generate_taskmaster_injected_tag() { + local kind="${1:-unknown}" + printf '[taskmaster:injected v=%d kind=%s]' \ + "$TASKMASTER_INJECTED_TAG_VERSION" "$kind" +} + +is_taskmaster_injected_tag_line() { + local text="$1" + # Match `[taskmaster:injected v= kind=]` at start of text. + [[ "$text" =~ ^\[taskmaster:injected[[:space:]]v=[0-9]+[[:space:]]kind=[a-zA-Z0-9_-]+\] ]] +} + +is_taskmaster_legacy_injected_prompt() { + local text="$1" + case "$text" in + "] + +``` + +`` ∈ `stop-block | followup | compliance | session-start | verifier-feedback`. + +Downstream consumers (UserPromptSubmit hook, completion verifier, external +tooling) detect injected prompts via `is_taskmaster_injected_prompt` from +`taskmaster-prompt-detect.sh`. Legacy substring detection is preserved for +prompts emitted before this version. +``` + +**Step 2: Add a brief mention in `SKILL.md`** + +In the SKILL.md system context, after the "How It Works" section, add a short paragraph (helps the model treat the tag as metadata, not directive): + +```markdown +## A note on the injected-prompt tag + +If you see a line starting with `[taskmaster:injected v=…]` at the top of a +message, that's metadata the hook adds to its own prompts. Treat it as a +marker, not as content you need to act on. +``` + +--- + +### Task B7: Update `install.sh` and `uninstall.sh` for the new file + +**Files:** +- Modify: `install.sh` +- Modify: `uninstall.sh` + +**Step 1: install.sh** — add `safe_copy` and `chmod +x` for `taskmaster-prompt-detect.sh` parallel to the changes in A5. + +**Step 2: uninstall.sh** — add `rm -f` parallel to A6. + +**Step 3: Sanity-check** + +Run: `bash -n install.sh uninstall.sh` + +--- + +### Task B8: Phase B end-to-end + commit + +**Step 1: Run all tests** + +Run: `bash tests/prompt-detect.test.sh && bash tests/verify-command.test.sh` +Expected: both pass. + +**Step 2: Smoke test the tag is present in real hook output** + +Run: +```bash +echo '{"session_id":"smoke-B8","transcript_path":"/dev/null","last_assistant_message":""}' \ + | bash check-completion.sh \ + | jq -r .reason \ + | head -3 +``` +Expected: first line is `[taskmaster:injected v=1 kind=stop-block]`. + +**Step 3: Commit Phase B** + +```bash +git add check-completion.sh hooks/check-completion.sh hooks/inject-continue-codex.sh \ + install.sh uninstall.sh docs/SPEC.md SKILL.md +git commit -m "feat: tag every hook-injected prompt with [taskmaster:injected v=1 kind=...] (T1.3) + +Adds an explicit single-line marker to the top of every prompt the hook +injects. Downstream consumers detect injected prompts via the new +taskmaster-prompt-detect lib (forward path: tag match; back-compat: +legacy substring match against current and mickn/taskmaster wording)." +``` + +--- + +# Phase C — T1.2: JSON state-file layout + +### Task C1: Write the failing tests for the state lib + +**Files:** +- Create: `tests/state.test.sh` + +**Step 1: Write the test file** + +```bash +#!/usr/bin/env bash +# +# Tests for taskmaster-state.sh. +# +set -euo pipefail + +REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" +LIB="$REPO_ROOT/taskmaster-state.sh" + +# Isolate state under a temp dir +TEST_HOME="$(mktemp -d "${TMPDIR:-/tmp}/taskmaster-state-test.XXXXXX")" +trap 'rm -rf "$TEST_HOME"' EXIT +export TASKMASTER_STATE_DIR="$TEST_HOME/state" + +# shellcheck disable=SC1090 +source "$LIB" + +PASS=0; FAIL=0 +ok() { printf 'ok %s\n' "$1"; PASS=$((PASS+1)); } +fail() { printf 'FAIL %s\n' "$1" >&2; FAIL=$((FAIL+1)); } + +# --- init creates well-formed JSON with schema_version=1 --- +SID="sess-$$" +taskmaster_state_init "$SID" +PATH_OUT="$(taskmaster_state_path "$SID")" +[[ -f "$PATH_OUT" ]] && ok "init creates file" || fail "init creates file" +SV="$(jq -r .schema_version <"$PATH_OUT")" +[[ "$SV" == "1" ]] && ok "schema_version is 1" || fail "schema_version is 1 (got $SV)" +SI="$(jq -r .session_id <"$PATH_OUT")" +[[ "$SI" == "$SID" ]] && ok "session_id stamped" || fail "session_id stamped" +SC="$(jq -r .stop_count <"$PATH_OUT")" +[[ "$SC" == "0" ]] && ok "stop_count starts at 0" || fail "stop_count starts at 0 (got $SC)" + +# --- increment --- +taskmaster_state_increment_stop_count "$SID" +SC="$(jq -r .stop_count <"$PATH_OUT")" +[[ "$SC" == "1" ]] && ok "stop_count after one increment is 1" || fail "increment 1 (got $SC)" + +taskmaster_state_increment_stop_count "$SID" +taskmaster_state_increment_stop_count "$SID" +SC="$(jq -r .stop_count <"$PATH_OUT")" +[[ "$SC" == "3" ]] && ok "stop_count after three increments is 3" || fail "increment 3 (got $SC)" + +# --- concurrent increments --- +SID2="sess-conc-$$" +taskmaster_state_init "$SID2" +PATH_C="$(taskmaster_state_path "$SID2")" +N=50 +for i in $(seq 1 "$N"); do + ( taskmaster_state_increment_stop_count "$SID2" ) & +done +wait +SC="$(jq -r .stop_count <"$PATH_C")" +[[ "$SC" == "$N" ]] && ok "concurrent $N increments reach $N" \ + || fail "concurrent increments lost some (got $SC, expected $N)" + +# --- legacy migration --- +LEGACY_DIR="${TMPDIR:-/tmp}/taskmaster" +mkdir -p "$LEGACY_DIR" +SID3="sess-legacy-$$" +LEGACY_FILE="$LEGACY_DIR/$SID3" +echo "7" > "$LEGACY_FILE" + +# State doesn't exist yet; migration should pull the 7 +taskmaster_state_migrate_legacy_counter "$SID3" +[[ -f "$(taskmaster_state_path "$SID3")" ]] && ok "legacy migration creates state file" \ + || fail "legacy migration creates state file" +SC="$(jq -r .stop_count <"$(taskmaster_state_path "$SID3")")" +[[ "$SC" == "7" ]] && ok "legacy counter value migrated" \ + || fail "legacy counter value migrated (got $SC, expected 7)" +[[ ! -f "$LEGACY_FILE" ]] && ok "legacy file deleted after migration" \ + || fail "legacy file deleted after migration" + +# --- migration is idempotent (rerun without legacy file is a no-op) --- +taskmaster_state_migrate_legacy_counter "$SID3" +SC="$(jq -r .stop_count <"$(taskmaster_state_path "$SID3")")" +[[ "$SC" == "7" ]] && ok "second migration call is a no-op" \ + || fail "second migration mutated state (got $SC)" + +# --- jq read of nonexistent path returns null --- +SID4="sess-empty-$$" +VAL="$(taskmaster_state_jq "$SID4" '.latest_user_prompt.prompt' 2>/dev/null || echo "MISSING")" +[[ "$VAL" == "null" || -z "$VAL" || "$VAL" == "MISSING" ]] && ok "nonexistent path read is safe" \ + || fail "nonexistent path read is safe (got $VAL)" + +printf '\n%d passed, %d failed\n' "$PASS" "$FAIL" +[[ "$FAIL" == 0 ]] +``` + +**Step 2: Run to verify failure** + +Run: `bash tests/state.test.sh` +Expected: FAIL with `taskmaster-state.sh: No such file or directory`. + +**Step 3: Commit failing tests** + +```bash +git add tests/state.test.sh +git commit -m "test: add failing tests for taskmaster-state lib (T1.2)" +``` + +--- + +### Task C2: Implement `taskmaster-state.sh` + +**Files:** +- Create: `taskmaster-state.sh` + +**Step 1: Write the lib** + +```bash +#!/usr/bin/env bash +# +# Persistent JSON session state for Taskmaster. +# +# Layout: ${TASKMASTER_STATE_DIR:-${TMPDIR:-/tmp}/taskmaster/state}/.json +# +# Schema (v1): +# { +# "schema_version": 1, +# "session_id": "", +# "created_at": "", +# "updated_at": "", +# "stop_count": 0, +# "latest_user_prompt": null | {captured_at, turn_id, prompt}, +# "last_verifier_run": null | {ran_at, input_hash, complete, reason, next_action}, +# "metadata": {} +# } +# +# Atomicity: all writes go through tmp+mv guarded by flock on .lock. +# + +taskmaster_state_dir() { + printf '%s\n' "${TASKMASTER_STATE_DIR:-${TMPDIR:-/tmp}/taskmaster/state}" +} + +taskmaster_state_path() { + local sid="$1" + printf '%s/%s.json\n' "$(taskmaster_state_dir)" "$sid" +} + +taskmaster_state_now() { + date -u +"%Y-%m-%dT%H:%M:%SZ" +} + +taskmaster_state_init() { + local sid="$1" + local path tmp lock now + path="$(taskmaster_state_path "$sid")" + mkdir -p "$(dirname "$path")" + [[ -f "$path" ]] && return 0 + + lock="${path}.lock" + tmp="${path}.tmp.$$" + now="$(taskmaster_state_now)" + + exec 9>"$lock" + flock 9 + if [[ ! -f "$path" ]]; then + jq -n \ + --arg sid "$sid" \ + --arg now "$now" \ + '{ + schema_version: 1, + session_id: $sid, + created_at: $now, + updated_at: $now, + stop_count: 0, + latest_user_prompt: null, + last_verifier_run: null, + metadata: {} + }' >"$tmp" + mv "$tmp" "$path" + fi + exec 9>&- +} + +taskmaster_state_jq() { + local sid="$1" expr="$2" + local path + path="$(taskmaster_state_path "$sid")" + [[ -f "$path" ]] || return 0 + jq -r "$expr" <"$path" 2>/dev/null +} + +# Run jq with a transformation expression and atomically write the result back. +taskmaster_state_update() { + local sid="$1" expr="$2" + local path tmp lock now + path="$(taskmaster_state_path "$sid")" + taskmaster_state_init "$sid" + + lock="${path}.lock" + tmp="${path}.tmp.$$" + now="$(taskmaster_state_now)" + + exec 9>"$lock" + flock 9 + jq --arg now "$now" "$expr | .updated_at = \$now" "$path" >"$tmp" + mv "$tmp" "$path" + exec 9>&- +} + +taskmaster_state_increment_stop_count() { + local sid="$1" + taskmaster_state_update "$sid" '.stop_count = (.stop_count + 1)' +} + +taskmaster_state_capture_prompt() { + local sid="$1" turn_id="$2" prompt="$3" + local path tmp lock now + path="$(taskmaster_state_path "$sid")" + taskmaster_state_init "$sid" + + lock="${path}.lock" + tmp="${path}.tmp.$$" + now="$(taskmaster_state_now)" + + exec 9>"$lock" + flock 9 + jq \ + --arg now "$now" \ + --arg turn "$turn_id" \ + --arg prompt "$prompt" \ + '.latest_user_prompt = {captured_at: $now, turn_id: $turn, prompt: $prompt} + | .updated_at = $now' \ + "$path" >"$tmp" + mv "$tmp" "$path" + exec 9>&- +} + +taskmaster_state_record_verifier_run() { + local sid="$1" input_hash="$2" complete="$3" reason="$4" next_action="$5" + local path tmp lock now + path="$(taskmaster_state_path "$sid")" + taskmaster_state_init "$sid" + + lock="${path}.lock" + tmp="${path}.tmp.$$" + now="$(taskmaster_state_now)" + + exec 9>"$lock" + flock 9 + jq \ + --arg now "$now" \ + --arg hash "$input_hash" \ + --argjson complete "$complete" \ + --arg reason "$reason" \ + --arg next "$next_action" \ + '.last_verifier_run = { + ran_at: $now, + input_hash: $hash, + complete: $complete, + reason: $reason, + next_action: $next + } | .updated_at = $now' \ + "$path" >"$tmp" + mv "$tmp" "$path" + exec 9>&- +} + +# One-time migration: absorb legacy ${TMPDIR}/taskmaster/ counter into the +# state file's stop_count, then delete the legacy file. Idempotent — safe to +# call on every hook entry. +taskmaster_state_migrate_legacy_counter() { + local sid="$1" + local legacy="${TMPDIR:-/tmp}/taskmaster/${sid}" + [[ -f "$legacy" ]] || return 0 + + local count + count="$(cat "$legacy" 2>/dev/null || echo 0)" + [[ "$count" =~ ^[0-9]+$ ]] || count=0 + + taskmaster_state_init "$sid" + taskmaster_state_update "$sid" ".stop_count = $count" + rm -f "$legacy" +} +``` + +**Step 2: Make executable** + +```bash +chmod +x taskmaster-state.sh +``` + +**Step 3: Run tests** + +Run: `bash tests/state.test.sh` +Expected: all pass. Common failure modes: +- macOS `flock` not in PATH — install `flock` from coreutils, or skip the concurrent test on systems without it. +- `mkdir -p` race — already handled by `taskmaster_state_init`. + +**Step 4: Commit** + +```bash +git add taskmaster-state.sh +git commit -m "feat: add taskmaster-state JSON state lib with flock + atomic writes (T1.2)" +``` + +--- + +### Task C3: Refactor `check-completion.sh` to use the state lib + +**Files:** +- Modify: `check-completion.sh` + +**Step 1: Source the lib** + +After existing `source` calls, add: + +```bash +# shellcheck disable=SC1091 +source "$SCRIPT_DIR/taskmaster-state.sh" +``` + +**Step 2: Replace the counter logic** + +Locate the existing block: + +```bash +# --- counter --- +COUNTER_DIR="${TMPDIR:-/tmp}/taskmaster" +mkdir -p "$COUNTER_DIR" +COUNTER_FILE="${COUNTER_DIR}/${SESSION_ID}" +MAX=${TASKMASTER_MAX:-100} + +COUNT=0 +if [ -f "$COUNTER_FILE" ]; then + COUNT=$(cat "$COUNTER_FILE" 2>/dev/null || echo "0") +fi +``` + +Replace with: + +```bash +# --- counter (state-file backed) --- +taskmaster_state_migrate_legacy_counter "$SESSION_ID" +taskmaster_state_init "$SESSION_ID" + +MAX=${TASKMASTER_MAX:-100} +COUNT="$(taskmaster_state_jq "$SESSION_ID" '.stop_count')" +[[ "$COUNT" =~ ^[0-9]+$ ]] || COUNT=0 +``` + +**Step 3: Replace `rm -f "$COUNTER_FILE"` (allow-stop paths) with state reset** + +Find every occurrence of `rm -f "$COUNTER_FILE"` (there are two: HAS_DONE_SIGNAL=true branch, and MAX-reached branch). Replace each with: + +```bash +taskmaster_state_update "$SESSION_ID" '.stop_count = 0' +``` + +**Step 4: Replace `echo "$NEXT" > "$COUNTER_FILE"` with state increment** + +Find: + +```bash +NEXT=$((COUNT + 1)) +echo "$NEXT" > "$COUNTER_FILE" +``` + +Replace with: + +```bash +taskmaster_state_increment_stop_count "$SESSION_ID" +NEXT=$((COUNT + 1)) +``` + +(`NEXT` is still computed locally for the LABEL string; the source of truth is the state file.) + +**Step 5: Sanity-check** + +Run: `bash -n check-completion.sh` +Expected: no output. + +**Step 6: Smoke test** + +Run: +```bash +TASKMASTER_STATE_DIR="$(mktemp -d)/state" \ + echo '{"session_id":"smoke-C3","transcript_path":"/dev/null","last_assistant_message":""}' \ + | bash check-completion.sh \ + | jq -r .reason | head -2 +echo "---" +ls "$TASKMASTER_STATE_DIR" +cat "$TASKMASTER_STATE_DIR"/smoke-C3.json | jq . +``` +Expected: tag line + `TASKMASTER (1/100): ...`; state file shows `stop_count: 1`. + +--- + +### Task C4: Refactor `hooks/check-completion.sh` similarly + +**Files:** +- Modify: `hooks/check-completion.sh` + +Apply the **same** four edits as C3. Source path is `$SCRIPT_DIR/../taskmaster-state.sh`. + +**Step 1: Source the lib** (with `..` prefix). + +**Step 2–4: Same replacements as C3.** + +**Step 5: Sanity-check + smoke test** (same probe). + +--- + +### Task C5: Refactor `hooks/inject-continue-codex.sh` to use state lib + +**Files:** +- Modify: `hooks/inject-continue-codex.sh` + +The injector tracks injection counts in a runtime file already; this task is **purely additive** — also write to the JSON state when we have a session id, so that future tooling can read both. Do not break the existing injector state file. + +**Step 1: Source the lib** + +After existing sources, add: + +```bash +# shellcheck disable=SC1091 +source "$(dirname "${BASH_SOURCE[0]}")/../taskmaster-state.sh" +``` + +**Step 2: In `inject_prompt`, also bump the JSON state's stop_count when SESSION_ID is known** + +Find the existing increment line: + +```bash +INJECTION_COUNT=$((INJECTION_COUNT + 1)) +``` + +Add immediately after: + +```bash +if [[ -n "${SESSION_ID:-}" ]]; then + taskmaster_state_increment_stop_count "$SESSION_ID" 2>/dev/null || true +fi +``` + +(`|| true` because the injector's startup ordering can call `inject_prompt` very early; we don't want a state-file write failure to crash the injector.) + +**Step 3: Sanity-check** + +Run: `bash -n hooks/inject-continue-codex.sh` +Expected: no output. + +--- + +### Task C6: Update `install.sh` and `uninstall.sh` for the state lib + +**Files:** +- Modify: `install.sh` +- Modify: `uninstall.sh` + +**Step 1: install.sh** — `safe_copy` and `chmod +x` for `taskmaster-state.sh` parallel to A5/B7. + +**Step 2: uninstall.sh** — `rm -f` parallel to A6/B7. + +**Step 3: Sanity-check** + +Run: `bash -n install.sh uninstall.sh` + +--- + +### Task C7: Document the state file in `docs/SPEC.md` + +**Files:** +- Modify: `docs/SPEC.md` + +**Step 1: Add a "Session state" subsection to the Architecture section.** + +```markdown +### 3.x Session state file + +Path: `${TASKMASTER_STATE_DIR:-${TMPDIR:-/tmp}/taskmaster/state}/.json` + +Schema (v1): + +```json +{ + "schema_version": 1, + "session_id": "", + "created_at": "", + "updated_at": "", + "stop_count": 0, + "latest_user_prompt": null, + "last_verifier_run": null, + "metadata": {} +} +``` + +All writes go through `flock` on `.lock` and atomic tmp+mv. + +**Legacy migration:** on first read per session, the hook absorbs any +existing counter file at `${TMPDIR}/taskmaster/` into +`stop_count` and deletes the legacy file. Idempotent. +``` + +--- + +### Task C8: Phase C end-to-end + commit + +**Step 1: Run all three test suites** + +Run: `bash tests/state.test.sh && bash tests/prompt-detect.test.sh && bash tests/verify-command.test.sh` +Expected: all pass. + +**Step 2: Verify legacy migration works against a real legacy file** + +Run: +```bash +LEGACY_DIR="${TMPDIR:-/tmp}/taskmaster" +mkdir -p "$LEGACY_DIR" +echo "5" > "$LEGACY_DIR/migrate-c8" +TASKMASTER_STATE_DIR="$(mktemp -d)/state" \ + echo '{"session_id":"migrate-c8","transcript_path":"/dev/null","last_assistant_message":""}' \ + | bash check-completion.sh >/dev/null +# After hook fires: legacy file should be gone, state file should have stop_count = 6 (5 migrated + 1 increment) +ls "$LEGACY_DIR/migrate-c8" 2>&1 | grep -q "No such file" && echo "ok: legacy file removed" +jq -r '.stop_count' "$TASKMASTER_STATE_DIR/migrate-c8.json" +``` +Expected: `ok: legacy file removed` and `stop_count: 6`. + +**Step 3: Sanity-check all touched scripts** + +Run: `bash -n check-completion.sh hooks/check-completion.sh hooks/inject-continue-codex.sh install.sh uninstall.sh taskmaster-state.sh` +Expected: no output. + +**Step 4: Commit Phase C** + +```bash +git add check-completion.sh hooks/check-completion.sh hooks/inject-continue-codex.sh \ + install.sh uninstall.sh docs/SPEC.md +git commit -m "feat: replace counter file with JSON state file + flock + migration (T1.2) + +stop_count now lives in a flock-protected JSON file at +\$TASKMASTER_STATE_DIR/.json (default: \$TMPDIR/taskmaster/state). +Legacy counter files are absorbed on first read and deleted. Schema is +versioned for forward compatibility with T2/T3 fields (latest_user_prompt, +last_verifier_run)." +``` + +--- + +# Phase D — Release plumbing + +### Task D1: Bump version to 4.3.0 + +**Files:** +- Modify: `SKILL.md` + +**Step 1: Bump the `version:` line in the YAML frontmatter from `4.2.0` to `4.3.0`.** + +Run: `sed -i 's/^version: 4\.2\.0$/version: 4.3.0/' SKILL.md && grep '^version:' SKILL.md` +Expected: `version: 4.3.0`. + +--- + +### Task D2: Bump version in `docs/SPEC.md` + +**Files:** +- Modify: `docs/SPEC.md` + +**Step 1: Update the `**Version**:` line at the top of SPEC.md from `4.2.0` to `4.3.0`.** + +Use `Edit` tool — match exactly to avoid clobbering similar lines. + +--- + +### Task D3: Add a CHANGELOG.md entry for 4.3.0 + +**Files:** +- Modify: `CHANGELOG.md` (created during the rebase from upstream) + +**Step 1: Insert a new section at the top of the file (above the existing `## v2.3.0` entry):** + +```markdown +## v4.3.0 — 2026-04-28 + +### Added +- `TASKMASTER_VERIFY_COMMAND` env var: opt-in shell verifier that gates + stop after the done token is seen. Pairs with test suites, type-checkers, + or any repo-local check. Companion knobs: `TASKMASTER_VERIFY_TIMEOUT` + (default 60s), `TASKMASTER_VERIFY_MAX_OUTPUT` (default 4000 bytes), + `TASKMASTER_VERIFY_CWD`. (T1.1) +- Tagged hook-injected prompts: every prompt the hook injects starts + with `[taskmaster:injected v=1 kind=]`. New + `taskmaster-prompt-detect.sh` lib lets downstream consumers + distinguish injected reprompts from real user goals. Legacy substring + detection preserved for back-compat. (T1.3) +- JSON session state file at + `${TASKMASTER_STATE_DIR:-${TMPDIR}/taskmaster/state}/.json`, + flock-protected, atomic writes. Schema v1 with `stop_count`, + `latest_user_prompt`, `last_verifier_run`, `metadata` fields ready for + T2/T3. (T1.2) + +### Changed +- Stop-count tracking moved from the bare counter file at + `${TMPDIR}/taskmaster/` to the new JSON state file. + Legacy counter files are absorbed on first read and deleted — + no user action required. + +### References +- Design: `docs/designs/2026-04-28-072245-fork-pattern-adoption.md` +- Plan: `docs/plans/2026-04-28-083546-t1-fork-pattern-adoption.md` +- Source review: `docs/upstream-reviews/blader-taskmaster-forks.md` +``` + +--- + +### Task D4: Final test run + +**Step 1: Run every test in `tests/`** + +Run: `for t in tests/*.test.sh; do echo "=== $t ==="; bash "$t" || exit 1; done && echo "ALL TESTS PASS"` +Expected: `ALL TESTS PASS`. + +**Step 2: Smoke test the full hook with all three features active** + +Run: +```bash +TM_STATE="$(mktemp -d)/state" +TASKMASTER_STATE_DIR="$TM_STATE" \ +TASKMASTER_VERIFY_COMMAND="true" \ + echo '{"session_id":"final-smoke","transcript_path":"/dev/null","last_assistant_message":"TASKMASTER_DONE::final-smoke"}' \ + | bash check-completion.sh +echo "exit=$?" +ls "$TM_STATE" +jq . "$TM_STATE/final-smoke.json" +``` +Expected: empty output (allow stop), `exit=0`, state file shows the session was tracked. (`stop_count` should be 0 because the done token short-circuited before the increment, then was reset.) + +Repeat with `TASKMASTER_VERIFY_COMMAND="false"`. Expected: blocking JSON output with `verifier failed (exit=1)` in the reason. + +--- + +### Task D5: Commit version bump and CHANGELOG + +**Step 1: Stage and commit** + +```bash +git add SKILL.md docs/SPEC.md CHANGELOG.md +git commit -m "release v4.3.0: T1 fork-pattern adoption (verify-command, tag, state-file) + +See CHANGELOG.md for the full entry. Three independent additions +ported from the fork-network review (mickn/taskmaster), each +opt-in or backward-compatible." +``` + +**Step 2: Tag the release** + +```bash +git tag -a v4.3.0 -m "v4.3.0: T1 fork-pattern adoption (T1.1 verify-command, T1.2 state-file, T1.3 prompt tag)" +git tag -n v4.3.0 +``` + +**Step 3: Confirm clean state** + +Run: `git status && git log --oneline -10` +Expected: `working tree clean`; the last seven commits trace the plan (test, impl, test, impl, test, impl, release). + +--- + +# Out of scope for this plan + +The design doc lists items deliberately deferred: + +- Stale state-file cleanup / TTL — separate beads issue. +- Native Codex hooks (T2) — gated on capability probe. +- Semantic completion verifier (T3) — opt-in, ships after T2. + +# Risks watched during execution + +- **macOS `flock` / `timeout` availability**: if the executing engineer is on macOS without GNU coreutils, install before starting (`brew install coreutils flock`) or stop and ask the user. +- **State-dir collision with parallel sessions**: the `flock` per file makes this safe, but if `$TASKMASTER_STATE_DIR` lives on a network filesystem with broken locking, concurrency tests will flake. Document this as a known caveat in SPEC if encountered. +- **Smoke tests that depend on `last_assistant_message` shape**: the field is what Claude Code passes; if the engineer is testing with a different runtime that uses a different field, override the test JSON shape accordingly. diff --git a/docs/reports/2026-04-28-002421-codex-native-hooks-verification.md b/docs/reports/2026-04-28-002421-codex-native-hooks-verification.md new file mode 100644 index 0000000..d0e4f46 --- /dev/null +++ b/docs/reports/2026-04-28-002421-codex-native-hooks-verification.md @@ -0,0 +1,271 @@ +# Codex Native Hooks: Verification Before Adopting mickn's Architecture + +**Generated:** 2026-04-28 +**Topic:** Does the OpenAI Codex CLI actually support `SessionStart`, `UserPromptSubmit`, and `Stop` hooks natively today, or is `mickn:main`'s rewrite conditional on a future Codex release? + +## Executive Summary + +**Codex native hooks are real, shipped, and stable.** The OpenAI Codex +CLI exposes `SessionStart`, `UserPromptSubmit`, and `Stop` as first-class +hook events documented at `developers.openai.com/codex/hooks`. Hooks +were marked stable in **v0.122.0 (2026-04-20)** via PR #19012 +("Mark codex_hooks stable"). The locally installed version on this +machine is **`codex-cli 0.125.0`** — three releases past the stable +cutoff and well within the supported window. + +`mickn:main`'s v5.0.0 rewrite — which deletes the PTY wrapper, expect +bridge, and queue emitter, replacing them with `taskmaster-session-start.sh`, +`taskmaster-user-prompt-submit.sh`, and `taskmaster-stop.sh` — therefore +does **not** depend on any unreleased Codex feature. It targets shipping, +documented behavior. All three preconditions from the original fork +review (`docs/upstream-reviews/blader-taskmaster-forks.md` §B1) are +satisfied: + +1. Codex CLI exposes `SessionStart`, `UserPromptSubmit`, and `Stop` ✅ +2. The native `Stop` hook supports `decision: "block"` continuation ✅ +3. `last_assistant_message` is populated on Stop events ✅ (with one + caveat — see §3) + +The answer to the open question is: **proceed with the port. The +wrapper layer is dead weight on Codex 0.122+.** Two caveats worth +gating on are documented in §3 and flow into the punch list. + +## Research Findings + +### 1. Hook events explicitly documented + +The official Codex hooks reference +([developers.openai.com/codex/hooks](https://developers.openai.com/codex/hooks)) +lists six hook events: `SessionStart`, `PreToolUse`, `PermissionRequest`, +`PostToolUse`, `UserPromptSubmit`, `Stop`. The three Taskmaster needs +are all present and have dedicated sections. + +Hooks are configured via `~/.codex/hooks.json` (user scope) or +`/.codex/hooks.json` (project scope), with optional inline +configuration in `config.toml`. Per-layer hooks are merged, not +overridden — higher-precedence layers add to lower ones. Project-local +hooks only load when the `.codex/` layer is trusted. + +### 2. `Stop` hook semantics match Claude Code + +The docs explicitly state, for the `Stop` event: + +> "For this event, `decision: "block"` doesn't reject the turn. +> Instead, it tells Codex to continue and automatically creates a new +> continuation prompt" + +The `reason` field becomes the continuation prompt. This is the same +contract Claude Code's Stop hook uses, which is exactly what +`taskmaster-stop.sh` needs in order to push a TASKMASTER continuation +prompt back into the same running session. + +`Stop`'s stdin payload includes: + +- `turn_id` — the active Codex turn ID +- `stop_hook_active` — whether continuation has already occurred + (the standard guard against infinite re-fire loops; matches Claude's + field of the same name) +- `last_assistant_message` — "Latest assistant message text, if available" + +The "if available" caveat on `last_assistant_message` is the only +non-trivial parity gap with Claude Code. It is the basis for caveat +(C2) in §3. + +### 3. Release timeline + +From the Codex changelog +([developers.openai.com/codex/changelog](https://developers.openai.com/codex/changelog)): + +| Version | Date | Hook-related change | +|---------|------|---------------------| +| v0.116.0 | 2026-03 | Hooks present in experimental form (referenced in issue #15266 reproductions) | +| v0.122.0 | 2026-04-20 | `PermissionRequest` hooks added (#17563); OTEL metrics for hook runs (#18026) | +| v0.123.0 | 2026-04-23 | Hooks in `config.toml` / `requirements.toml` (#18893); MCP tool support in hooks (#18385); **`codex_hooks` marked stable (#19012)** | +| v0.124.0 | 2026-04-23 | `apply_patch` emits hooks (#18391); Bash `PostToolUse` on `exec_command` (#18888); **regression: hooks broke at startup if config used map syntax (#19199)** | +| v0.125.0 | 2026-04 | (locally installed; current) | + +The stable marker landed eight days before this report. mickn's rewrite +(repo timeline aligns with v5.0.0 around the same window) targets the +post-stable surface, not pre-release behavior. + +### 4. Known issues that don't block adoption but warrant gating + +**Issue #15266 — SessionStart + UserPromptSubmit fire simultaneously on +first prompt** ([github.com/openai/codex/issues/15266](https://github.com/openai/codex/issues/15266)). +Filed against v0.116.0 (March 2026). Closed, but the closing +commit/version is not visible in the page content. Behavior described: +on the first prompt of a session, both hooks fire concurrently rather +than `SessionStart` completing before `UserPromptSubmit`. On subsequent +prompts, only `UserPromptSubmit` fires correctly. + +Implication for Taskmaster: if `taskmaster-session-start.sh` writes +state that `taskmaster-user-prompt-submit.sh` reads (e.g., +seeding the per-session state file), there's a race on the first +prompt. mickn's `taskmaster-session-start.sh` is 27 LOC — small enough +to inspect for whether it depends on this ordering. We should verify +on 0.125.0 before merging. + +**Issue #19199 — v0.124.0 hook config parsing regression** +([github.com/openai/codex/issues/19199](https://github.com/openai/codex/issues/19199)). +`codex-cli` failed to start when hooks were configured in +`config.toml` using map syntax (the documented form). Closed; resolution +version not shown. The local install is 0.125.0, which post-dates the +fix, so this is informational only — but it's a reminder that hook +config schemas are still in flux at the toml-vs-json boundary. + +### 5. Third-party confirmation + +Independent projects already shipping against Codex's native hooks: + +- **`hatayama/codex-hooks`** — a hooks runner that reuses Claude + Code's hooks settings against Codex CLI + ([github.com/hatayama/codex-hooks](https://github.com/hatayama/codex-hooks)). + Existence of this project confirms the surface is real and + Claude-Code-compatible enough to be adapted. +- **`Yeachan-Heo/oh-my-codex` (OmX)** — a Codex enhancement framework + with an active roadmap issue (#1307) about mapping its hook surfaces + onto Codex's native hooks, indicating a community migration from + bespoke wrappers to native is in progress. +- **ArcKit v4** — released March 2026 with first-class Codex hooks + support + ([medium.com/arckit/arckit-v4](https://medium.com/arckit/arckit-v4-first-class-codex-and-gemini-support-with-hooks-mcp-servers-and-native-policies-abdf9569e00e)). + +The pattern across all three: bespoke PTY/wrapper hacks are being +deleted in favor of the native hook surface throughout April 2026. +mickn's rewrite is the same move applied to Taskmaster. + +## Analysis + +The fork review's §B1 set three preconditions for adopting mickn's +wholesale wrapper deletion. All three are satisfied: + +1. **Codex CLI exposes SessionStart/UserPromptSubmit/Stop hooks** + in the version the user is on. Local install: `codex-cli 0.125.0`. + Hook events documented since v0.122 stable; we are on 0.125. + ✅ confirmed. + +2. **The native `Stop` hook supports `decision: "block"` continuation** + in the same way Claude Code does. Documented verbatim in + `developers.openai.com/codex/hooks`: "doesn't reject the turn. + Instead, it tells Codex to continue and automatically creates a new + continuation prompt." ✅ confirmed. + +3. **`last_assistant_message` is populated by Codex on stop events.** + Documented as a Stop-event stdin field, with the qualifier "if + available." This matches Claude Code's behavior, which also has + cases where the field is empty (e.g., when the assistant emits no + message text on stop). ⚠️ confirmed-with-caveat. + +The caveat on (3) is meaningful but not blocking. The current fork's +detection is layered: `last_assistant_message` for primary detection, +transcript-grep for explicit `TASKMASTER_DONE::` token as +fallback. That layering should survive the port unchanged — the +fallback handles the "if available" gap. + +The PTY wrapper, expect bridge, and queue emitter become genuinely +redundant at 0.122+. They cost LOC, complexity, and a `expect` runtime +dependency. Their only remaining justification was as a portability +floor for Codex versions without native hooks — which now means +versions older than April 2026, a window users will only stay in +deliberately. + +The right migration shape mirrors §B1's hedge: keep both code paths, +gate on `codex --version` or a feature probe (`codex --help | grep -q +hook` or test for the `~/.codex/hooks.json` schema), and let +`install.sh` choose. Default to native on detection, fall back to +wrapper on older Codex. Delete the wrapper path only after a deprecation +window where logs confirm zero installs are using it. + +## Recommendations + +Adopt mickn's native-hooks architecture, but stage it. Two safety +rails make this safe rather than risky: + +1. **Version-gated install.** Probe Codex version in `install.sh`. + If `>= 0.122.0`, install native hooks; if `< 0.122.0` or no codex + detected, install the existing wrapper path. The user's machine + (0.125.0) gets the native path automatically. +2. **Keep the wrapper path on disk.** Don't `git rm` the PTY wrapper, + expect bridge, or their tests in the same PR. Mark them + `legacy/`-prefixed and have `install.sh` install from `legacy/` + when version-gated. Plan to remove them after one minor release if + no one reports using them. + +The `last_assistant_message` + transcript-token layered detection +already in the fork is the right pattern and ports cleanly. Do not +collapse to a single detection mode. + +## Punch List (for `/mei`) + +Each item is a self-contained adoption decision. Numbered for +priority. Phase tags only — no time estimates. + +1. **[Phase 1, HIGH] Add Codex version probe to `install.sh`.** + Detect `codex --version` and parse semver; expose as + `$CODEX_HOOKS_NATIVE` (true if `>= 0.122.0`). Touches only + `install.sh`. No behavior change yet — just the detection. + +2. **[Phase 1, HIGH] Port `taskmaster-session-start.sh` from + `mickn:main`.** 27 LOC. Place at `hooks/taskmaster-session-start.sh`. + Verify it does not depend on completing-before-`UserPromptSubmit` + ordering (issue #15266). If it does, add a state-file lock that + both hooks honor. + +3. **[Phase 1, HIGH] Port `taskmaster-stop.sh` from `mickn:main`.** + 356 LOC. Replaces the wrapper's stop-detection role. Must emit + `decision: "block"` JSON with the shared compliance prompt as + `reason`. Reuses `taskmaster-compliance-prompt.sh`. Keep the + `last_assistant_message` → transcript-token fallback layered + detection. + +4. **[Phase 1, HIGH] Port `taskmaster-user-prompt-submit.sh` from + `mickn:main`.** 107 LOC. Implements per-turn external user prompt + capture with `` filtering. This is pattern A1 from the + fork review and the highest-value behavioral upgrade. + +5. **[Phase 1, MEDIUM] Move existing wrapper artifacts to `legacy/`.** + `hooks/inject-continue-codex.sh`, `hooks/run-codex-expect-bridge.exp`, + `run-taskmaster-codex.sh`, plus their tests in + `tests/inject-continue-codex.test.sh` etc. Update `install.sh` to + choose `hooks/` (native) or `legacy/` (wrapper) based on the + version probe. + +6. **[Phase 1, MEDIUM] Write `~/.codex/hooks.json` template** in + `install.sh` mapping the three event names to the three new + `hooks/taskmaster-*.sh` scripts. Merge-safe with any existing + user hooks (don't clobber). + +7. **[Phase 2, MEDIUM] Add a feature smoke test:** create `taskmaster + selftest --codex-hooks` that fires a no-op session, asserts each of + the three hooks executed via state-file markers, and reports OK/FAIL. + Exercised in `install.sh --verify` and CI. + +8. **[Phase 2, LOW] Document the version gate in + `docs/SPEC.md`.** Single section explaining the dual code paths, + the 0.122.0 cutover, the issue-#15266 race awareness, and the + deprecation plan for `legacy/`. + +9. **[Phase 3, LOW] Sunset `legacy/` after one minor release** if no + reports of installs using it. Track via an `install.sh` + instrumentation line that logs which path was selected. Drop after + evidence justifies it. + +10. **[Phase 3, LOW] Watch upstream issue #15266** for a definitive + fix-version. If the simultaneous-fire race is confirmed fixed in + a known version, tighten `$CODEX_HOOKS_NATIVE` lower bound to + that version and remove any race-mitigation lock added in (2). + +## Sources + +- [Hooks – Codex | OpenAI Developers](https://developers.openai.com/codex/hooks) — primary reference; documents all six hook events including `SessionStart`, `UserPromptSubmit`, `Stop`. Contains the verbatim specification of `decision: "block"` for `Stop` and the `last_assistant_message` field. +- [Changelog – Codex | OpenAI Developers](https://developers.openai.com/codex/changelog) — release timeline confirming hooks went stable in v0.122.0 (April 20, 2026) via PR #19012 "Mark codex_hooks stable." +- [Issue #15266 — UserPromptSubmit and SessionStart hooks fire simultaneously on first prompt](https://github.com/openai/codex/issues/15266) — known caveat, filed v0.116.0 (March 2026), closed. +- [Issue #19199 — codex-cli 0.124.0 fails to start when hook config is present and codex_hooks is enabled](https://github.com/openai/codex/issues/19199) — informational; not a blocker on 0.125.0. +- [PR #14867 — hooks: use a user message > developer message for prompt continuation](https://github.com/openai/codex/pull/14867) — early-development context for hook continuation semantics. +- [PR #15118 — hooks: turn_id extension for Stop & UserPromptSubmit](https://github.com/openai/codex/pull/15118) — confirms `turn_id` field added to the stdin payload of these specific hooks. +- [Discussion #2150 — Hook would be a great feature](https://github.com/openai/codex/discussions/2150) — historical context for how hooks landed. +- [hatayama/codex-hooks](https://github.com/hatayama/codex-hooks) — third-party hooks runner reusing Claude Code's hooks settings against Codex; corroborates Claude-compatible surface. +- [Yeachan-Heo/oh-my-codex issue #1307 — roadmap: map OMC hook surfaces onto OMX native Codex hooks](https://github.com/Yeachan-Heo/oh-my-codex/issues/1307) — community-wide migration pattern from bespoke wrappers to native hooks. +- [ArcKit v4: First-Class Codex and Gemini Support with Hooks](https://medium.com/arckit/arckit-v4-first-class-codex-and-gemini-support-with-hooks-mcp-servers-and-native-policies-abdf9569e00e) — independent third-party adoption of the same hook surface in March 2026. +- Local fork review: `docs/upstream-reviews/blader-taskmaster-forks.md` §B1 — the three preconditions verified by this report. +- Local Codex install: `codex-cli 0.125.0` (`/usr/local/bin/codex`) — three releases past the stable cutoff. diff --git a/docs/reports/2026-04-28-002421-codex-native-hooks-verification.pdf b/docs/reports/2026-04-28-002421-codex-native-hooks-verification.pdf new file mode 100644 index 0000000..ff50ca5 Binary files /dev/null and b/docs/reports/2026-04-28-002421-codex-native-hooks-verification.pdf differ diff --git a/docs/reports/2026-04-28-002421-codex-native-hooks-verification.tex b/docs/reports/2026-04-28-002421-codex-native-hooks-verification.tex new file mode 100644 index 0000000..fad7967 --- /dev/null +++ b/docs/reports/2026-04-28-002421-codex-native-hooks-verification.tex @@ -0,0 +1,681 @@ +\documentclass[twocolumn,10pt,letterpaper]{article} + +% -- Page geometry -- +\usepackage[ + top=0.85in, + bottom=0.75in, + left=0.65in, + right=0.65in, + columnsep=0.25in +]{geometry} + +% -- Beautiful typography: Century Schoolbook -- +\usepackage[T1]{fontenc} +\usepackage{tgschola} % TeX Gyre Schola (Century Schoolbook) +\usepackage[scaled=0.82]{beramono} % Bera Mono for code +\usepackage{microtype} % Microtypographic refinements +\usepackage{setspace} % Line spacing control +\setstretch{1.08} % Slightly open leading for readability + +% -- Packages -- +\usepackage{xurl} % Allow URL breaks at any character (load before hyperref) +\usepackage{hyperref} +\usepackage{graphicx} +\usepackage{booktabs} % Professional tables +\usepackage{tabularx} % Width-constrained tables +\usepackage{enumitem} % List customization +\usepackage{fancyhdr} +\usepackage{xcolor} +\usepackage{balance} +\usepackage{stfloats} % Enable [h] placement for table* in two-column layout +\usepackage{titlesec} % Section heading customization +\usepackage{listings} % Code blocks with line wrapping + +% -- Paragraph spacing (no indent, modest skip) -- +\setlength{\parindent}{0pt} +\setlength{\parskip}{0.4em plus 0.1em minus 0.05em} +\setlength{\emergencystretch}{1em} % Extra stretch to avoid overfull hboxes in narrow columns + +% -- Code listings with line wrapping -- +\lstset{ + basicstyle=\small\ttfamily, + breaklines=true, + breakatwhitespace=false, + breakautoindent=true, + postbreak=\mbox{\textcolor{accentrule}{$\hookrightarrow$}\space}, + columns=fullflexible, + keepspaces=true, + backgroundcolor=\color{codebg}, + frame=single, + rulecolor=\color{codebg}, + framesep=3pt, + aboveskip=0.5em, + belowskip=0.5em, + xleftmargin=4pt, + xrightmargin=0pt, +} + +% -- Colors -- +\definecolor{linkblue}{HTML}{1D4ED8} +\definecolor{headergray}{HTML}{1F2937} +\definecolor{rulegray}{HTML}{D1D5DB} +\definecolor{titlebg}{HTML}{111827} +\definecolor{accentrule}{HTML}{6B7280} +\definecolor{brandred}{HTML}{8B2635} +\definecolor{codebg}{HTML}{F3F4F6} % Very light gray for code backgrounds + +% -- Hyperlinks -- +\hypersetup{ + colorlinks=true, + linkcolor=linkblue, + urlcolor=linkblue, + citecolor=linkblue, + pdfstartview=FitH +} + +% -- Section numbering: Roman > Alph > Arabic > alph, no decimals -- +\renewcommand{\thesection}{\Roman{section}} +\renewcommand{\thesubsection}{\Alph{subsection}} +\renewcommand{\thesubsubsection}{\arabic{subsubsection}} +\renewcommand{\theparagraph}{\alph{paragraph}} +\makeatletter +\@addtoreset{subsection}{section} +\@addtoreset{subsubsection}{subsection} +\@addtoreset{paragraph}{subsubsection} +\makeatother + +% -- Section headings -- +\titleformat{\section} + {\large\bfseries\color{titlebg}} + {\thesection.}{0.5em}{} + [\vspace{-0.3em}{\color{accentrule}\rule{\columnwidth}{0.5pt}}] +\titleformat{\subsection} + {\normalsize\bfseries\color{headergray}} + {\thesubsection.}{0.4em}{} +\titleformat{\subsubsection} + {\small\bfseries\itshape\color{headergray}} + {\thesubsubsection.}{0.4em}{} +\titleformat{\paragraph}[runin] + {\small\bfseries\color{headergray}} + {\theparagraph)}{0.4em}{}[.\hspace{0.4em}] +\titlespacing*{\section}{0pt}{1.0em}{0.35em} +\titlespacing*{\subsection}{0pt}{0.75em}{0.2em} +\titlespacing*{\subsubsection}{0pt}{0.55em}{0.15em} +\titlespacing*{\paragraph}{0pt}{0.4em}{0em} + +% -- Lists: compact, elegant -- +\setlist{nosep, leftmargin=1.1em, topsep=0.2em, itemsep=0.05em} +\setlist[itemize]{label={\small\textcolor{accentrule}{\textbullet}}} +\setlist[enumerate]{label={\small\textcolor{headergray}{\arabic*.}}} + +% -- Tables: tighter, small text -- +\renewcommand{\arraystretch}{1.15} + +% -- Header/footer (pages 2+) -- +\pagestyle{fancy} +\fancyhf{} +\renewcommand{\headrulewidth}{0.3pt} +\renewcommand{\headrule}{\hbox to\headwidth{\color{rulegray}\leaders\hrule height \headrulewidth\hfill}} +\fancyhead[L]{\footnotesize\color{headergray}\textit{Codex Native Hooks: Verification Before Adopting mickn's Architecture}} +\renewcommand{\footrulewidth}{0.3pt} +\renewcommand{\footrule}{\hbox to\headwidth{\color{rulegray}\leaders\hrule height \footrulewidth\hfill}} +% Brand mark footer: CaseMirror wordmark left, page number right (both burgundy) +\fancyfoot[L]{\footnotesize\textbf{\color{headergray}CaseMirror}} +\fancyfoot[R]{\footnotesize\color{headergray}\thepage} + +% -- Page 1 style: footer only, no header rule -- +\fancypagestyle{firstpage}{% + \fancyhf{}% + \renewcommand{\headrulewidth}{0pt}% + \renewcommand{\footrulewidth}{0.3pt}% + \renewcommand{\footrule}{\hbox to\headwidth{\color{rulegray}\leaders\hrule height 0.3pt\hfill}}% + \fancyfoot[L]{\footnotesize\textbf{\color{headergray}CaseMirror}}% + \fancyfoot[R]{\footnotesize\color{headergray}\thepage}% +} + +% -- Column rule -- +\setlength{\columnseprule}{0.15pt} +\renewcommand{\columnseprulecolor}{\color{rulegray}} + +% -- Title -- +\title{\vspace{-1.8em}\LARGE\bfseries\color{titlebg} Codex Native Hooks: Verification Before Adopting mickn's Architecture\vspace{-0.3em}} +\author{CaseMirror Research \\ \small\itshape\href{https://casemirror.ai}{\textcolor{headergray}{casemirror.ai}}} +\date{{\small\color{headergray} \today}} + +\begin{document} + +\maketitle +\thispagestyle{firstpage} +\vspace{-0.5em} + + +\textbf{Generated:} 2026-04-28 + +\textbf{Topic:} Does the OpenAI Codex CLI actually support \texttt{SessionStart}, \texttt{UserPromptSubmit}, and \texttt{Stop} hooks natively today, or is \texttt{mickn:main}'s rewrite conditional on a future Codex release? + + +\subsection{Executive Summary} + +\textbf{Codex native hooks are real, shipped, and stable.} The OpenAI Codex + +CLI exposes \texttt{SessionStart}, \texttt{UserPromptSubmit}, and \texttt{Stop} as first-class + +hook events documented at \texttt{developers.openai.com/\allowbreak codex/\allowbreak hooks}. Hooks + +were marked stable in \textbf{v0.122.0 (2026-04-20)} via PR \#19012 + +("Mark codex\_hooks stable"). The locally installed version on this + +machine is \textbf{\texttt{codex-\allowbreak cli 0.125.0}} — three releases past the stable + +cutoff and well within the supported window. + + +\texttt{mickn:main}'s v5.0.0 rewrite — which deletes the PTY wrapper, expect + +bridge, and queue emitter, replacing them with \texttt{taskmaster-\allowbreak session-\allowbreak start.sh}, + +\texttt{taskmaster-\allowbreak user-\allowbreak prompt-\allowbreak submit.sh}, and \texttt{taskmaster-\allowbreak stop.sh} — therefore + +does \textbf{not} depend on any unreleased Codex feature. It targets shipping, + +documented behavior. All three preconditions from the original fork + +review (\texttt{docs/\allowbreak upstream-\allowbreak reviews/\allowbreak blader-\allowbreak taskmaster-\allowbreak forks.md} §B1) are + +satisfied: + + +\begin{enumerate} +\setlength{\itemsep}{0pt} +\setlength{\parskip}{0pt} + \item Codex CLI exposes \texttt{SessionStart}, \texttt{UserPromptSubmit}, and \texttt{Stop} ✅ + \item The native \texttt{Stop} hook supports \texttt{decision: "block"} continuation ✅ + \item \texttt{last\_assistant\_message} is populated on Stop events ✅ (with one +\end{enumerate} + caveat — see §3) + + +The answer to the open question is: **proceed with the port. The + +wrapper layer is dead weight on Codex 0.122+.** Two caveats worth + +gating on are documented in §3 and flow into the punch list. + + +\subsection{Research Findings} + +\subsubsection{Hook events explicitly documented} + +The official Codex hooks reference + +(\href{https://developers.openai.com/codex/hooks}{developers.openai.com/codex/hooks}) + +lists six hook events: \texttt{SessionStart}, \texttt{PreToolUse}, \texttt{PermissionRequest}, + +\texttt{PostToolUse}, \texttt{UserPromptSubmit}, \texttt{Stop}. The three Taskmaster needs + +are all present and have dedicated sections. + + +Hooks are configured via \texttt{\textasciitilde{}/\allowbreak .codex/\allowbreak hooks.json} (user scope) or + +\texttt{/\allowbreak .codex/\allowbreak hooks.json} (project scope), with optional inline + +configuration in \texttt{config.toml}. Per-layer hooks are merged, not + +overridden — higher-precedence layers add to lower ones. Project-local + +hooks only load when the \texttt{.codex/\allowbreak } layer is trusted. + + +\subsubsection{\texttt{Stop} hook semantics match Claude Code} + +The docs explicitly state, for the \texttt{Stop} event: + + +> "For this event, \texttt{decision: "block"} doesn't reject the turn. + +> Instead, it tells Codex to continue and automatically creates a new + +> continuation prompt" + + +The \texttt{reason} field becomes the continuation prompt. This is the same + +contract Claude Code's Stop hook uses, which is exactly what + +\texttt{taskmaster-\allowbreak stop.sh} needs in order to push a TASKMASTER continuation + +prompt back into the same running session. + + +\texttt{Stop}'s stdin payload includes: + + +\begin{itemize} +\setlength{\itemsep}{0pt} +\setlength{\parskip}{0pt} + \item \texttt{turn\_id} — the active Codex turn ID + \item \texttt{stop\_hook\_active} — whether continuation has already occurred +\end{itemize} + (the standard guard against infinite re-fire loops; matches Claude's + + field of the same name) + +\begin{itemize} +\setlength{\itemsep}{0pt} +\setlength{\parskip}{0pt} + \item \texttt{last\_assistant\_message} — "Latest assistant message text, if available" +\end{itemize} + +The "if available" caveat on \texttt{last\_assistant\_message} is the only + +non-trivial parity gap with Claude Code. It is the basis for caveat + +(C2) in §3. + + +\subsubsection{Release timeline} + +From the Codex changelog + +(\href{https://developers.openai.com/codex/changelog}{developers.openai.com/codex/changelog}): + + +\vspace{0.4em} +\noindent +\small +\begin{tabularx}{\columnwidth}{@{}XXX@{}} +\toprule +\textbf{Version} & \textbf{Date} & \textbf{Hook-related change} \\ +\midrule +v0.116.0 & 2026-03 & Hooks present in experimental form (referenced in issue \#15266 reproductions) \\ +v0.122.0 & 2026-04-20 & \texttt{PermissionRequest} hooks added (\#17563); OTEL metrics for hook runs (\#18026) \\ +v0.123.0 & 2026-04-23 & Hooks in \texttt{config.toml} / \texttt{requirements.toml} (\#18893); MCP tool support in hooks (\#18385); \textbf{\texttt{codex\_hooks} marked stable (\#19012)} \\ +v0.124.0 & 2026-04-23 & \texttt{apply\_patch} emits hooks (\#18391); Bash \texttt{PostToolUse} on \texttt{exec\_command} (\#18888); \textbf{regression: hooks broke at startup if config used map syntax (\#19199)} \\ +v0.125.0 & 2026-04 & (locally installed; current) \\ +\bottomrule +\end{tabularx} +\vspace{0.4em} + +The stable marker landed eight days before this report. mickn's rewrite + +(repo timeline aligns with v5.0.0 around the same window) targets the + +post-stable surface, not pre-release behavior. + + +\subsubsection{Known issues that don't block adoption but warrant gating} + +**Issue \#15266 — SessionStart + UserPromptSubmit fire simultaneously on + +first prompt** (\href{https://github.com/openai/codex/issues/15266}{github.com/openai/codex/issues/15266}). + +Filed against v0.116.0 (March 2026). Closed, but the closing + +commit/version is not visible in the page content. Behavior described: + +on the first prompt of a session, both hooks fire concurrently rather + +than \texttt{SessionStart} completing before \texttt{UserPromptSubmit}. On subsequent + +prompts, only \texttt{UserPromptSubmit} fires correctly. + + +Implication for Taskmaster: if \texttt{taskmaster-\allowbreak session-\allowbreak start.sh} writes + +state that \texttt{taskmaster-\allowbreak user-\allowbreak prompt-\allowbreak submit.sh} reads (e.g., + +seeding the per-session state file), there's a race on the first + +prompt. mickn's \texttt{taskmaster-\allowbreak session-\allowbreak start.sh} is 27 LOC — small enough + +to inspect for whether it depends on this ordering. We should verify + +on 0.125.0 before merging. + + +\textbf{Issue \#19199 — v0.124.0 hook config parsing regression} + +(\href{https://github.com/openai/codex/issues/19199}{github.com/openai/codex/issues/19199}). + +\texttt{codex-\allowbreak cli} failed to start when hooks were configured in + +\texttt{config.toml} using map syntax (the documented form). Closed; resolution + +version not shown. The local install is 0.125.0, which post-dates the + +fix, so this is informational only — but it's a reminder that hook + +config schemas are still in flux at the toml-vs-json boundary. + + +\subsubsection{Third-party confirmation} + +Independent projects already shipping against Codex's native hooks: + + +\begin{itemize} +\setlength{\itemsep}{0pt} +\setlength{\parskip}{0pt} + \item \textbf{\texttt{hatayama/\allowbreak codex-\allowbreak hooks}} — a hooks runner that reuses Claude +\end{itemize} + Code's hooks settings against Codex CLI + + (\href{https://github.com/hatayama/codex-hooks}{github.com/hatayama/codex-hooks}). + + Existence of this project confirms the surface is real and + + Claude-Code-compatible enough to be adapted. + +\begin{itemize} +\setlength{\itemsep}{0pt} +\setlength{\parskip}{0pt} + \item \textbf{\texttt{Yeachan-\allowbreak Heo/\allowbreak oh-\allowbreak my-\allowbreak codex} (OmX)} — a Codex enhancement framework +\end{itemize} + with an active roadmap issue (\#1307) about mapping its hook surfaces + + onto Codex's native hooks, indicating a community migration from + + bespoke wrappers to native is in progress. + +\begin{itemize} +\setlength{\itemsep}{0pt} +\setlength{\parskip}{0pt} + \item \textbf{ArcKit v4} — released March 2026 with first-class Codex hooks +\end{itemize} + support + + (\href{https://medium.com/arckit/arckit-v4-first-class-codex-and-gemini-support-with-hooks-mcp-servers-and-native-policies-abdf9569e00e}{medium.com/arckit/arckit-v4}). + + +The pattern across all three: bespoke PTY/wrapper hacks are being + +deleted in favor of the native hook surface throughout April 2026. + +mickn's rewrite is the same move applied to Taskmaster. + + +\subsection{Analysis} + +The fork review's §B1 set three preconditions for adopting mickn's + +wholesale wrapper deletion. All three are satisfied: + + +\begin{enumerate} +\setlength{\itemsep}{0pt} +\setlength{\parskip}{0pt} + \item \textbf{Codex CLI exposes SessionStart/UserPromptSubmit/Stop hooks} +\end{enumerate} + in the version the user is on. Local install: \texttt{codex-\allowbreak cli 0.125.0}. + + Hook events documented since v0.122 stable; we are on 0.125. + + ✅ confirmed. + + +\begin{enumerate} +\setlength{\itemsep}{0pt} +\setlength{\parskip}{0pt} + \item \textbf{The native \texttt{Stop} hook supports \texttt{decision: "block"} continuation} +\end{enumerate} + in the same way Claude Code does. Documented verbatim in + + \texttt{developers.openai.com/\allowbreak codex/\allowbreak hooks}: "doesn't reject the turn. + + Instead, it tells Codex to continue and automatically creates a new + + continuation prompt." ✅ confirmed. + + +\begin{enumerate} +\setlength{\itemsep}{0pt} +\setlength{\parskip}{0pt} + \item \textbf{\texttt{last\_assistant\_message} is populated by Codex on stop events.} +\end{enumerate} + Documented as a Stop-event stdin field, with the qualifier "if + + available." This matches Claude Code's behavior, which also has + + cases where the field is empty (e.g., when the assistant emits no + + message text on stop). ⚠️ confirmed-with-caveat. + + +The caveat on (3) is meaningful but not blocking. The current fork's + +detection is layered: \texttt{last\_assistant\_message} for primary detection, + +transcript-grep for explicit \texttt{TASKMASTER\_DONE::} token as + +fallback. That layering should survive the port unchanged — the + +fallback handles the "if available" gap. + + +The PTY wrapper, expect bridge, and queue emitter become genuinely + +redundant at 0.122+. They cost LOC, complexity, and a \texttt{expect} runtime + +dependency. Their only remaining justification was as a portability + +floor for Codex versions without native hooks — which now means + +versions older than April 2026, a window users will only stay in + +deliberately. + + +The right migration shape mirrors §B1's hedge: keep both code paths, + +gate on \texttt{codex -\allowbreak -\allowbreak version} or a feature probe (`codex --help | grep -q + +hook\texttt{ or test for the }\textasciitilde{}/.codex/hooks.json` schema), and let + +\texttt{install.sh} choose. Default to native on detection, fall back to + +wrapper on older Codex. Delete the wrapper path only after a deprecation + +window where logs confirm zero installs are using it. + + +\subsection{Recommendations} + +Adopt mickn's native-hooks architecture, but stage it. Two safety + +rails make this safe rather than risky: + + +\begin{enumerate} +\setlength{\itemsep}{0pt} +\setlength{\parskip}{0pt} + \item \textbf{Version-gated install.} Probe Codex version in \texttt{install.sh}. +\end{enumerate} + If \texttt{>= 0.122.0}, install native hooks; if \texttt{< 0.122.0} or no codex + + detected, install the existing wrapper path. The user's machine + + (0.125.0) gets the native path automatically. + +\begin{enumerate} +\setlength{\itemsep}{0pt} +\setlength{\parskip}{0pt} + \item \textbf{Keep the wrapper path on disk.} Don't \texttt{git rm} the PTY wrapper, +\end{enumerate} + expect bridge, or their tests in the same PR. Mark them + + \texttt{legacy/\allowbreak }-prefixed and have \texttt{install.sh} install from \texttt{legacy/\allowbreak } + + when version-gated. Plan to remove them after one minor release if + + no one reports using them. + + +The \texttt{last\_assistant\_message} + transcript-token layered detection + +already in the fork is the right pattern and ports cleanly. Do not + +collapse to a single detection mode. + + +\subsection{Punch List (for \texttt{/\allowbreak mei})} + +Each item is a self-contained adoption decision. Numbered for + +priority. Phase tags only — no time estimates. + + +\begin{enumerate} +\setlength{\itemsep}{0pt} +\setlength{\parskip}{0pt} + \item \textbf{[Phase 1, HIGH] Add Codex version probe to \texttt{install.sh}.} +\end{enumerate} + Detect \texttt{codex -\allowbreak -\allowbreak version} and parse semver; expose as + + \texttt{$CODEX\_HOOKS\_NATIVE} (true if \texttt{>= 0.122.0}). Touches only + + \texttt{install.sh}. No behavior change yet — just the detection. + + +\begin{enumerate} +\setlength{\itemsep}{0pt} +\setlength{\parskip}{0pt} + \item **[Phase 1, HIGH] Port \texttt{taskmaster-\allowbreak session-\allowbreak start.sh} from +\end{enumerate} + \texttt{mickn:main}.** 27 LOC. Place at \texttt{hooks/\allowbreak taskmaster-\allowbreak session-\allowbreak start.sh}. + + Verify it does not depend on completing-before-\texttt{UserPromptSubmit} + + ordering (issue \#15266). If it does, add a state-file lock that + + both hooks honor. + + +\begin{enumerate} +\setlength{\itemsep}{0pt} +\setlength{\parskip}{0pt} + \item \textbf{[Phase 1, HIGH] Port \texttt{taskmaster-\allowbreak stop.sh} from \texttt{mickn:main}.} +\end{enumerate} + 356 LOC. Replaces the wrapper's stop-detection role. Must emit + + \texttt{decision: "block"} JSON with the shared compliance prompt as + + \texttt{reason}. Reuses \texttt{taskmaster-\allowbreak compliance-\allowbreak prompt.sh}. Keep the + + \texttt{last\_assistant\_message} → transcript-token fallback layered + + detection. + + +\begin{enumerate} +\setlength{\itemsep}{0pt} +\setlength{\parskip}{0pt} + \item **[Phase 1, HIGH] Port \texttt{taskmaster-\allowbreak user-\allowbreak prompt-\allowbreak submit.sh} from +\end{enumerate} + \texttt{mickn:main}.** 107 LOC. Implements per-turn external user prompt + + capture with \texttt{} filtering. This is pattern A1 from the + + fork review and the highest-value behavioral upgrade. + + +\begin{enumerate} +\setlength{\itemsep}{0pt} +\setlength{\parskip}{0pt} + \item \textbf{[Phase 1, MEDIUM] Move existing wrapper artifacts to \texttt{legacy/\allowbreak }.} +\end{enumerate} + \texttt{hooks/\allowbreak inject-\allowbreak continue-\allowbreak codex.sh}, \texttt{hooks/\allowbreak run-\allowbreak codex-\allowbreak expect-\allowbreak bridge.exp}, + + \texttt{run-\allowbreak taskmaster-\allowbreak codex.sh}, plus their tests in + + \texttt{tests/\allowbreak inject-\allowbreak continue-\allowbreak codex.test.sh} etc. Update \texttt{install.sh} to + + choose \texttt{hooks/\allowbreak } (native) or \texttt{legacy/\allowbreak } (wrapper) based on the + + version probe. + + +\begin{enumerate} +\setlength{\itemsep}{0pt} +\setlength{\parskip}{0pt} + \item \textbf{[Phase 1, MEDIUM] Write \texttt{\textasciitilde{}/\allowbreak .codex/\allowbreak hooks.json} template} in +\end{enumerate} + \texttt{install.sh} mapping the three event names to the three new + + \texttt{hooks/\allowbreak taskmaster-\allowbreak *.sh} scripts. Merge-safe with any existing + + user hooks (don't clobber). + + +\begin{enumerate} +\setlength{\itemsep}{0pt} +\setlength{\parskip}{0pt} + \item \textbf{[Phase 2, MEDIUM] Add a feature smoke test:} create `taskmaster +\end{enumerate} + selftest --codex-hooks` that fires a no-op session, asserts each of + + the three hooks executed via state-file markers, and reports OK/FAIL. + + Exercised in \texttt{install.sh -\allowbreak -\allowbreak verify} and CI. + + +\begin{enumerate} +\setlength{\itemsep}{0pt} +\setlength{\parskip}{0pt} + \item **[Phase 2, LOW] Document the version gate in +\end{enumerate} + \texttt{docs/\allowbreak SPEC.md}.** Single section explaining the dual code paths, + + the 0.122.0 cutover, the issue-\#15266 race awareness, and the + + deprecation plan for \texttt{legacy/\allowbreak }. + + +\begin{enumerate} +\setlength{\itemsep}{0pt} +\setlength{\parskip}{0pt} + \item \textbf{[Phase 3, LOW] Sunset \texttt{legacy/\allowbreak } after one minor release} if no +\end{enumerate} + reports of installs using it. Track via an \texttt{install.sh} + + instrumentation line that logs which path was selected. Drop after + + evidence justifies it. + + +\begin{enumerate} +\setlength{\itemsep}{0pt} +\setlength{\parskip}{0pt} + \item \textbf{[Phase 3, LOW] Watch upstream issue \#15266} for a definitive +\end{enumerate} + fix-version. If the simultaneous-fire race is confirmed fixed in + + a known version, tighten \texttt{$CODEX\_HOOKS\_NATIVE} lower bound to + + that version and remove any race-mitigation lock added in (2). + + +\subsection{Sources} + +\begin{itemize} +\setlength{\itemsep}{0pt} +\setlength{\parskip}{0pt} + \item \href{https://developers.openai.com/codex/hooks}{Hooks – Codex | OpenAI Developers} — primary reference; documents all six hook events including \texttt{SessionStart}, \texttt{UserPromptSubmit}, \texttt{Stop}. Contains the verbatim specification of \texttt{decision: "block"} for \texttt{Stop} and the \texttt{last\_assistant\_message} field. + \item \href{https://developers.openai.com/codex/changelog}{Changelog – Codex | OpenAI Developers} — release timeline confirming hooks went stable in v0.122.0 (April 20, 2026) via PR \#19012 "Mark codex\_hooks stable." + \item \href{https://github.com/openai/codex/issues/15266}{Issue \#15266 — UserPromptSubmit and SessionStart hooks fire simultaneously on first prompt} — known caveat, filed v0.116.0 (March 2026), closed. + \item \href{https://github.com/openai/codex/issues/19199}{Issue \#19199 — codex-cli 0.124.0 fails to start when hook config is present and codex\_hooks is enabled} — informational; not a blocker on 0.125.0. + \item \href{https://github.com/openai/codex/pull/14867}{PR \#14867 — hooks: use a user message > developer message for prompt continuation} — early-development context for hook continuation semantics. + \item \href{https://github.com/openai/codex/pull/15118}{PR \#15118 — hooks: turn\_id extension for Stop \& UserPromptSubmit} — confirms \texttt{turn\_id} field added to the stdin payload of these specific hooks. + \item \href{https://github.com/openai/codex/discussions/2150}{Discussion \#2150 — Hook would be a great feature} — historical context for how hooks landed. + \item \href{https://github.com/hatayama/codex-hooks}{hatayama/codex-hooks} — third-party hooks runner reusing Claude Code's hooks settings against Codex; corroborates Claude-compatible surface. + \item \href{https://github.com/Yeachan-Heo/oh-my-codex/issues/1307}{Yeachan-Heo/oh-my-codex issue \#1307 — roadmap: map OMC hook surfaces onto OMX native Codex hooks} — community-wide migration pattern from bespoke wrappers to native hooks. + \item \href{https://medium.com/arckit/arckit-v4-first-class-codex-and-gemini-support-with-hooks-mcp-servers-and-native-policies-abdf9569e00e}{ArcKit v4: First-Class Codex and Gemini Support with Hooks} — independent third-party adoption of the same hook surface in March 2026. + \item Local fork review: \texttt{docs/\allowbreak upstream-\allowbreak reviews/\allowbreak blader-\allowbreak taskmaster-\allowbreak forks.md} §B1 — the three preconditions verified by this report. + \item Local Codex install: \texttt{codex-\allowbreak cli 0.125.0} (\texttt{/\allowbreak usr/\allowbreak local/\allowbreak bin/\allowbreak codex}) — three releases past the stable cutoff. +\end{itemize} + + +\balance +\end{document} diff --git a/docs/session-summaries/2026-02-19-121116-default-tries-100.md b/docs/session-summaries/2026-02-19-121116-default-tries-100.md new file mode 100644 index 0000000..28517a4 --- /dev/null +++ b/docs/session-summaries/2026-02-19-121116-default-tries-100.md @@ -0,0 +1,27 @@ +# Session Summary + +**Date:** 2026-02-19 +**Time:** 12:11 +**Focus:** [Auto-generated - please review and complete] + +## Summary + +Session with 1 commits. Please add context about what was accomplished. + +## Completed Work + +### Commits +- `25bf291` - default tries 100 + +## Key Changes + +### Files Modified +[Review git diff for details] + +## Pending/Blocked + +[TODO: Any tasks started but not finished] + +## Next Session Context + +[TODO: What the next session should know] diff --git a/docs/session-summaries/2026-02-19-142857-make-installsh-posix-portable-.md b/docs/session-summaries/2026-02-19-142857-make-installsh-posix-portable-.md new file mode 100644 index 0000000..dc116d7 --- /dev/null +++ b/docs/session-summaries/2026-02-19-142857-make-installsh-posix-portable-.md @@ -0,0 +1,27 @@ +# Session Summary + +**Date:** 2026-02-19 +**Time:** 14:28 +**Focus:** [Auto-generated - please review and complete] + +## Summary + +Session with 1 commits. Please add context about what was accomplished. + +## Completed Work + +### Commits +- `cbff59c` - Make install.sh POSIX-portable (sh shebang, portable pipefail) + +## Key Changes + +### Files Modified +[Review git diff for details] + +## Pending/Blocked + +[TODO: Any tasks started but not finished] + +## Next Session Context + +[TODO: What the next session should know] diff --git a/docs/session-summaries/2026-02-23-145341-docs-add-session-summary-make-.md b/docs/session-summaries/2026-02-23-145341-docs-add-session-summary-make-.md new file mode 100644 index 0000000..e31fb6c --- /dev/null +++ b/docs/session-summaries/2026-02-23-145341-docs-add-session-summary-make-.md @@ -0,0 +1,30 @@ +# Session Summary + +**Date:** 2026-02-23 +**Time:** 14:53 +**Focus:** [Auto-generated - please review and complete] + +## Summary + +Session with 4 commits. Please add context about what was accomplished. + +## Completed Work + +### Commits +- `fde5036` - docs: add session summary (make-installsh-posix-portable-) +- `46f6a44` - Make install.sh POSIX-portable (sh shebang, portable pipefail) +- `31694ca` - docs: add session summary (default-tries-100) +- `0940f36` - default tries 100 + +## Key Changes + +### Files Modified +[Review git diff for details] + +## Pending/Blocked + +[TODO: Any tasks started but not finished] + +## Next Session Context + +[TODO: What the next session should know] diff --git a/docs/session-summaries/2026-02-25-043407-hide-verbose-checklist-from-us.md b/docs/session-summaries/2026-02-25-043407-hide-verbose-checklist-from-us.md new file mode 100644 index 0000000..79f2973 --- /dev/null +++ b/docs/session-summaries/2026-02-25-043407-hide-verbose-checklist-from-us.md @@ -0,0 +1,34 @@ +# Session Summary + +**Date:** 2026-02-25 +**Time:** 04:34 +**Focus:** [Auto-generated - please review and complete] + +## Summary + +Session with 1 commits. Please add context about what was accomplished. + +## Completed Work + +### Commits +- `9c25e5d` - hide verbose checklist from user output, use TASKMASTER_DONE signal detection + +## Key Changes + +### Files Modified +- `SKILL.md` +- `check-completion.sh` +- `docs/SPEC.md` +- `docs/session-summaries/2026-02-19-121116-default-tries-100.md` +- `docs/session-summaries/2026-02-19-142857-make-installsh-posix-portable-.md` +- `docs/session-summaries/2026-02-23-145341-docs-add-session-summary-make-.md` +- `hooks/check-completion.sh` +- `install.sh` + +## Pending/Blocked + +[TODO: Any tasks started but not finished] + +## Next Session Context + +[TODO: What the next session should know] diff --git a/docs/session-summaries/2026-03-30-095200-install-hook-to-claudehooks-fo.md b/docs/session-summaries/2026-03-30-095200-install-hook-to-claudehooks-fo.md new file mode 100644 index 0000000..256cb6b --- /dev/null +++ b/docs/session-summaries/2026-03-30-095200-install-hook-to-claudehooks-fo.md @@ -0,0 +1,37 @@ +# Session Summary + +**Date:** 2026-03-30 +**Time:** 09:52 +**Focus:** [Auto-generated - please review and complete] + +## Summary + +Session with 1 commits. Please add context about what was accomplished. + +## Completed Work + +### Commits +- `d10b39e` - Install hook to ~/.claude/hooks/ for consistency with standard Claude Code layout + +## Key Changes + +### Files Modified +- `CHANGELOG.md` +- `LESSONS.md` +- `README.md` +- `SKILL.md` +- `check-completion.sh` +- `docs/blog/2026-02-25-taskmaster-hook-cleanup.md` +- `docs/session-summaries/2026-02-25-043407-hide-verbose-checklist-from-us.md` +- `docs/upstream-reviews/2026-02-25-blader-taskmaster-main.md` +- `hooks/check-completion.sh` +- `install.sh` +- `uninstall.sh` + +## Pending/Blocked + +[TODO: Any tasks started but not finished] + +## Next Session Context + +[TODO: What the next session should know] diff --git a/docs/session-summaries/2026-04-27-235723-fix-expand-tilde-in-transcript.md b/docs/session-summaries/2026-04-27-235723-fix-expand-tilde-in-transcript.md new file mode 100644 index 0000000..87cd0af --- /dev/null +++ b/docs/session-summaries/2026-04-27-235723-fix-expand-tilde-in-transcript.md @@ -0,0 +1,42 @@ +# Session Summary + +**Date:** 2026-04-27 +**Time:** 23:57 +**Focus:** [Auto-generated - please review and complete] + +## Summary + +Session with 12 commits. Please add context about what was accomplished. + +## Completed Work + +### Commits +- `c600b86` - fix: expand tilde in transcript_path; add upstream review docs +- `9997f36` - docs(lessons): hook reason dual-use, last_assistant_message detection +- `bdf29fb` - further edits. the agent --> Claude; it --> he +- `566d780` - add docs link at the end +- `daeed44` - manual blog post edits +- `9a17abf` - Add blog post: taskmaster hook cleanup (3 versions) +- `8eb006c` - release v2.3.0: minimal hook output, TASKMASTER_DONE signal detection +- `6a4bd8a` - docs: add session summary (hide-verbose-checklist-from-us) +- `689d830` - docs: add session summary (docs-add-session-summary-make-) +- `a7542fe` - docs: add session summary (make-installsh-posix-portable-) + +## Key Changes + +### Files Modified +- `CHANGELOG.md` +- `LESSONS.md` +- `docs/blog/2026-02-25-taskmaster-hook-cleanup.md` +- `docs/session-summaries/2026-02-19-142857-make-installsh-posix-portable-.md` +- `docs/session-summaries/2026-02-23-145341-docs-add-session-summary-make-.md` +- `docs/session-summaries/2026-02-25-043407-hide-verbose-checklist-from-us.md` +- `docs/upstream-reviews/2026-02-25-blader-taskmaster-main.md` + +## Pending/Blocked + +[TODO: Any tasks started but not finished] + +## Next Session Context + +[TODO: What the next session should know] diff --git a/docs/session-summaries/2026-04-28-000126-docs-add-session-summary-fix-e.md b/docs/session-summaries/2026-04-28-000126-docs-add-session-summary-fix-e.md new file mode 100644 index 0000000..b04665d --- /dev/null +++ b/docs/session-summaries/2026-04-28-000126-docs-add-session-summary-fix-e.md @@ -0,0 +1,42 @@ +# Session Summary + +**Date:** 2026-04-28 +**Time:** 00:01 +**Focus:** [Auto-generated - please review and complete] + +## Summary + +Session with 13 commits. Please add context about what was accomplished. + +## Completed Work + +### Commits +- `cfc01ec` - docs: add session summary (fix-expand-tilde-in-transcript) +- `c600b86` - fix: expand tilde in transcript_path; add upstream review docs +- `9997f36` - docs(lessons): hook reason dual-use, last_assistant_message detection +- `bdf29fb` - further edits. the agent --> Claude; it --> he +- `566d780` - add docs link at the end +- `daeed44` - manual blog post edits +- `9a17abf` - Add blog post: taskmaster hook cleanup (3 versions) +- `8eb006c` - release v2.3.0: minimal hook output, TASKMASTER_DONE signal detection +- `6a4bd8a` - docs: add session summary (hide-verbose-checklist-from-us) +- `689d830` - docs: add session summary (docs-add-session-summary-make-) + +## Key Changes + +### Files Modified +- `CHANGELOG.md` +- `LESSONS.md` +- `docs/blog/2026-02-25-taskmaster-hook-cleanup.md` +- `docs/session-summaries/2026-02-23-145341-docs-add-session-summary-make-.md` +- `docs/session-summaries/2026-02-25-043407-hide-verbose-checklist-from-us.md` +- `docs/session-summaries/2026-04-27-235723-fix-expand-tilde-in-transcript.md` +- `docs/upstream-reviews/2026-02-25-blader-taskmaster-main.md` + +## Pending/Blocked + +[TODO: Any tasks started but not finished] + +## Next Session Context + +[TODO: What the next session should know] diff --git a/docs/session-summaries/2026-04-28-021254-docs-add-implementation-plan-f.md b/docs/session-summaries/2026-04-28-021254-docs-add-implementation-plan-f.md new file mode 100644 index 0000000..8954afb --- /dev/null +++ b/docs/session-summaries/2026-04-28-021254-docs-add-implementation-plan-f.md @@ -0,0 +1,39 @@ +# Session Summary + +**Date:** 2026-04-28 +**Time:** 02:12 +**Focus:** [Auto-generated - please review and complete] + +## Summary + +Session with 3 commits. Please add context about what was accomplished. + +## Completed Work + +### Commits +- `dc503c9` - docs: add implementation plan for T1 fork-pattern adoption +- `4751e4b` - Add report: Codex native hooks verification before mickn port +- `bd48770` - docs: add design for fork-pattern adoption (T1-T3) + +## Key Changes + +### Files Modified +- `LESSONS.md` +- `docs/blog/2026-02-25-taskmaster-hook-cleanup.md` +- `docs/designs/2026-04-28-072245-fork-pattern-adoption.md` +- `docs/plans/2026-04-28-083546-t1-fork-pattern-adoption.md` +- `docs/reports/2026-04-28-002421-codex-native-hooks-verification.md` +- `docs/reports/2026-04-28-002421-codex-native-hooks-verification.pdf` +- `docs/reports/2026-04-28-002421-codex-native-hooks-verification.tex` +- `docs/session-summaries/2026-04-27-235723-fix-expand-tilde-in-transcript.md` +- `docs/session-summaries/2026-04-28-000126-docs-add-session-summary-fix-e.md` +- `docs/upstream-reviews/2026-02-25-blader-taskmaster-main.md` +- `docs/upstream-reviews/blader-taskmaster-forks.md` + +## Pending/Blocked + +[TODO: Any tasks started but not finished] + +## Next Session Context + +[TODO: What the next session should know] diff --git a/docs/session-summaries/2026-04-28-104902-t1-fork-pattern-adoption-shipped.md b/docs/session-summaries/2026-04-28-104902-t1-fork-pattern-adoption-shipped.md new file mode 100644 index 0000000..9ce7131 --- /dev/null +++ b/docs/session-summaries/2026-04-28-104902-t1-fork-pattern-adoption-shipped.md @@ -0,0 +1,82 @@ +# Session Summary — T1 Fork-Pattern Adoption Shipped (v4.3.0) + +**Date**: 2026-04-28 +**Outcome**: v4.3.0 released and tagged. Three Tier-1 fork-pattern features ported from `mickn/taskmaster` and shipped on `main`. + +## Summary + +End-to-end SDLC of T1 from fork-network research → design → plan → subagent-driven TDD execution → annotated tag, all within one session. Used `superpowers:writing-plans` to author the plan, `superpowers:subagent-driven-development` to execute it, and beads for cross-session tracking. + +## Completed Work + +| Phase | What | Commits | Beads | +|---|---|---|---| +| Pre-T1 | Fork-network review (32 forks → 1 substantive: mickn) | `7c364b4` | — | +| Pre-T1 | Design doc covering T1+T2+T3 tiers | `bd48770` | — | +| Pre-T1 | Implementation plan for T1 (1509 lines, 4 phases, TDD) | `dc503c9` | — | +| A — T1.1 | `TASKMASTER_VERIFY_COMMAND` shell-verifier gate | `e073665`, `fff29d1`, `17a5240`, `c15f541` | bd-v0or | +| B — T1.3 | Tagged hook-injected prompts + detection lib | `beabd15`, `c2e9f87`, `e350c0d`, `d4c2e5b`, `6db9f14` | bd-vsq0 | +| C — T1.2 | JSON state file with flock + legacy migration | `df1fbe1`, `26392cc`, `55bb54d`, `0b07183`, `96ccb0c` | bd-jmzj | +| D | Release: SKILL.md + SPEC.md to 4.3.0, CHANGELOG, v4.3.0 tag | `718fce7` | bd-he02 | +| Post-D | Final cross-cutting fix (errexit divergence) | `ee3887a` | — | + +**Git tag**: `v4.3.0` (annotated) → `718fce7` (release commit; final HEAD `ee3887a`) +**Branch divergence from origin/main**: 50 ahead, 15 behind (NOT pushed) + +## Key Changes + +**New files (8)**: +- `taskmaster-verify-command.sh` — opt-in shell verifier gate (token-then-verify) +- `taskmaster-prompt-detect.sh` — `[taskmaster:injected v=1 kind=...]` tag generator + detector with legacy substring fallback +- `taskmaster-state.sh` — JSON session state lib (flock + atomic tmp+mv + idempotent additive legacy migration) +- `tests/verify-command.test.sh` (12 assertions) +- `tests/prompt-detect.test.sh` (18 assertions) +- `tests/state.test.sh` (20 assertions) +- `docs/upstream-reviews/blader-taskmaster-forks.md` +- `docs/designs/2026-04-28-072245-fork-pattern-adoption.md` +- `docs/plans/2026-04-28-083546-t1-fork-pattern-adoption.md` + +**Modified**: +- `check-completion.sh` (root) and `hooks/check-completion.sh` (mirror) — wired into all three new libs; aligned to `set -uo pipefail` +- `hooks/inject-continue-codex.sh` — additive write of stop_count to JSON state (guarded with `|| true`) +- `install.sh`, `uninstall.sh` — three new files copied/chmoded/symlinked + cleanup +- `docs/SPEC.md` (§3.5 prompt tag, §3.6 state file, §5.1 verifier env vars) +- `SKILL.md` (version 4.2.0 → 4.3.0; added "note on the injected-prompt tag") +- `CHANGELOG.md` (v4.3.0 entry above v2.3.0) + +## Test counts at HEAD + +- `tests/state.test.sh`: 20/20 +- `tests/prompt-detect.test.sh`: 18/18 +- `tests/verify-command.test.sh`: 12/12 +- **Total**: 50 passing + +Pre-existing macOS-hardcoded test failures (`tests/install.test.sh`, `tests/inject-continue-codex.test.sh`, `tests/run-codex-expect-bridge.test.sh`, `tests/run-taskmaster-codex.test.sh`) fail in identical fashion to base — not introduced by T1. Filed as `bd-d9d6`. + +## Workflow notes (for future reference) + +- **Subagent-driven development worked well.** Each phase: implementer → spec reviewer → code quality reviewer → fix loop → close. The fix loops caught: + - Phase A: tmpfile leak on signal (RETURN trap) + - Phase B: missing legacy substring assertion + readonly re-source guard + - Phase C: 2 Critical issues (jq-exit gate + lock-protected additive migration), plus a test-only false-positive (`[[ ! -f X* ]]` glob doesn't expand) + - Final: errexit divergence between canonical and mirror hooks (pre-existing, surfaced by composition) +- **Per-phase + final cross-cutting reviewer pattern caught issues at each granularity** — the per-phase reviews caught implementation defects; the cross-cutting one caught composition issues that no single phase could have. +- **TDD discipline (failing tests committed standalone, then implementation)** kept commits bisect-friendly across all 4 phases. + +## Pending / Blocked + +Nothing blocking. Five follow-up beads issues filed for v4.3.x polish: +- `bd-d3rl` — T1.1 polish (uninstall symlink validation, SPEC empty-string note, lib header comments) +- `bd-jlyy` — T1.2 polish (boundary tests, log_runtime in injector, taskmaster_state_jq contract, lockfile GC) +- `bd-4wuw` — install.sh stale-file pruning on upgrade +- `bd-ekd6` — Schema lock-in test (`jq keys` exhaustive check) +- `bd-mr30` — TASKMASTER_MAX default divergence (100 vs 0 between hook entry points) +- `bd-d9d6` — macOS-hardcoded path failures in pre-existing tests + +The T2 epic `bd-eguw` (port mickn's native Codex hooks) and T3 (semantic verifier) remain open per the design doc tier ordering. + +## Next Session Context + +If resuming T1 follow-up polish: pull `bd-d3rl` and `bd-jlyy` first — they're the items already-known to be desirable. If proceeding to T2: the schema fields `latest_user_prompt` and `last_verifier_run` are already shaped correctly for T2.2 / T3.1 consumption (verified by the cross-cutting review). + +Branch is **not pushed**. The user has 50 commits ahead of origin/main — pushing is the user's call. diff --git a/docs/session-summaries/2026-05-02-081120-docs-add-session-summary-insta.md b/docs/session-summaries/2026-05-02-081120-docs-add-session-summary-insta.md new file mode 100644 index 0000000..b350056 --- /dev/null +++ b/docs/session-summaries/2026-05-02-081120-docs-add-session-summary-insta.md @@ -0,0 +1,40 @@ +# Session Summary + +**Date:** 2026-05-02 +**Time:** 08:11 +**Focus:** [Auto-generated - please review and complete] + +## Summary + +Session with 2 commits. Please add context about what was accomplished. + +## Completed Work + +### Commits +- `2c497ce` - docs: add session summary (install-hook-to-claudehooks-fo) +- `90df643` - Install hook to ~/.claude/hooks/ for consistency with standard Claude Code layout + +## Key Changes + +### Files Modified +- `.claude/learning-seeds.md` +- `CHANGELOG.md` +- `SKILL.md` +- `check-completion.sh` +- `docs/SPEC.md` +- `docs/session-summaries/2026-03-30-095200-install-hook-to-claudehooks-fo.md` +- `docs/session-summaries/2026-04-28-104902-t1-fork-pattern-adoption-shipped.md` +- `hooks/check-completion.sh` +- `hooks/inject-continue-codex.sh` +- `install.sh` +- `taskmaster-state.sh` +- `tests/state.test.sh` +- `uninstall.sh` + +## Pending/Blocked + +[TODO: Any tasks started but not finished] + +## Next Session Context + +[TODO: What the next session should know] diff --git a/docs/upstream-reviews/2026-02-25-blader-taskmaster-main.md b/docs/upstream-reviews/2026-02-25-blader-taskmaster-main.md new file mode 100644 index 0000000..504c305 --- /dev/null +++ b/docs/upstream-reviews/2026-02-25-blader-taskmaster-main.md @@ -0,0 +1,192 @@ +# Upstream Review: blader/taskmaster main + +**Date:** 2026-02-25 +**Compare URL:** https://github.com/micahstubbs/taskmaster/compare/main...blader:taskmaster:main +**Upstream:** blader/taskmaster@main +**Our fork:** micahstubbs/taskmaster@main +**Status:** diverged — 13 commits ahead in upstream + +--- + +## Context + +Our fork focuses on Claude Code stop hook behavior. The upstream (blader) has moved to +support OpenAI Codex TUI as a first-class target alongside Claude Code, using external +session-log monitoring and tmux/expect PTY injection instead of native hook registration. + +This architectural divergence drives most of the ignore decisions below. + +--- + +## Commit Decisions + +### cbd9443e — chore: sync local skill updates +**Decision: IGNORE** + +Adds the Codex integration layer: +- `hooks/check-completion-codex.sh` (237 lines, Codex monitor) +- `hooks/inject-continue-codex-tmux.sh` (307 lines, tmux transport) +- `hooks/run-codex-expect-bridge.exp` (91 lines, expect PTY bridge) +- `run-taskmaster-codex.sh` (364 lines, Codex session launcher) + +Also rewrites README, SKILL.md, docs/SPEC.md, install.sh, and uninstall.sh from a Codex-first perspective. + +Codex support is out of scope for this fork. Our focus is the Claude Code stop hook. Adding tmux/expect infrastructure would significantly increase complexity with no benefit to Claude Code users. + +--- + +### 755d165f — chore: sync local skill updates +**Decision: IGNORE** + +Refinements to the Codex layer introduced in cbd9443e: renames +`inject-continue-codex-tmux.sh` → `inject-continue-codex.sh`, simplifies +`run-taskmaster-codex.sh`, and trims docs. + +Depends on Codex infrastructure we're not adopting. + +--- + +### 88ffd335 — feat: support codex+claude auto install and cleanup docs +**Decision: IGNORE** + +Rewrites `install.sh` to auto-detect and install for both Codex (`~/.codex`) and +Claude (`~/.claude`). The new installer is 215 lines vs our 83 lines. While +auto-detection is a nice concept, the upstream now defaults to the Codex path +and the Claude path is a secondary target. Our simpler installer is better +suited to this fork's Claude-only focus. + +--- + +### 452417af — docs: rewrite README for clarity +**Decision: IGNORE** + +README is rewritten to be Codex-first, describing the Codex session-log +monitoring approach. Our README is accurate and Claude-focused. Nothing to port. + +--- + +### 4e5075fd — docs: add taskmaster philosophy and compliance rationale +**Decision: IGNORE** + +Adds 35 lines to README covering Taskmaster's philosophy. The content is already +present in our `SKILL.md` (the 6-item checklist including HONESTY CHECK). Our +approach of keeping the compliance text in SKILL.md (always loaded as system +context) is architecturally correct for Claude Code — no need to duplicate it +in the README. + +--- + +### 547bfa74 — refactor: remove monitor-only mode +**Decision: N/A (IGNORE)** + +Removes `hooks/check-completion-codex.sh` (235 lines) which was added in +cbd9443e and which we never adopted. Also trims SPEC.md. No action needed. + +--- + +### ca471bd8 — refactor: unify codex and claude compliance prompt +**Decision: IGNORE** + +Extracts the compliance prompt into `taskmaster-compliance-prompt.sh`, which +`hooks/check-completion.sh` now sources. This allows the same prompt to be +shared between Claude and Codex hooks. + +Our architecture keeps the compliance checklist in `SKILL.md`, which Claude Code +loads as system context on every turn — no shell file required. The upstream's +shell-based approach is a workaround for the lack of a native context mechanism +in Codex. We don't have this constraint. + +*Note:* The compliance prompt text in the new file is essentially identical to +what we already have in `SKILL.md`. No content to cherry-pick. + +--- + +### c04eeb18 — fix: restore long canonical compliance prompt +**Decision: IGNORE** + +Restores the longer version of `taskmaster-compliance-prompt.sh`. Depends on +the file introduced in ca471bd8, which we're not adopting. + +--- + +### c2475d9c — fix: default QUIET=1 in inject-continue-codex +**Decision: IGNORE** + +One-line fix in `hooks/inject-continue-codex.sh` — a Codex-specific file we +don't have. + +--- + +### 6814a3f5 — fix: symlink taskmaster-compliance-prompt.sh into hooks dir +**Decision: IGNORE** + +Adds one line to `install.sh` to symlink `taskmaster-compliance-prompt.sh` +into the hooks directory. Depends on `taskmaster-compliance-prompt.sh` which +we're not adopting. + +--- + +### 6598e99d — fix: expand tilde in transcript_path for done signal detection +**Decision: APPLY (rewrite to match our structure)** + +**Bug:** Claude Code passes `transcript_path` with a leading `~` (e.g., +`~/.claude/projects/.../session.jsonl`). Bash does not expand `~` inside +double-quoted strings, so `[ -f "$TRANSCRIPT" ]` always fails. The transcript +fallback never fires, and error detection (via `tail -40`) also silently fails. + +**Fix:** Add `TRANSCRIPT="${TRANSCRIPT/#\~/$HOME}"` immediately after reading +`transcript_path` from the input JSON. + +**Upstream patch (hooks/check-completion.sh):** +```diff + TRANSCRIPT=$(echo "$INPUT" | jq -r '.transcript_path') ++# Expand leading ~ to $HOME (tilde not expanded inside quotes by bash) ++TRANSCRIPT="${TRANSCRIPT/#\~/$HOME}" +``` + +**Action:** Apply to both `check-completion.sh` (root) and `hooks/check-completion.sh`. + +*See commit 6598e99d in blader/taskmaster for original.* + +--- + +### 71ff69c4 — fix: check last_assistant_message for done signal before transcript +**Decision: ALREADY IMPLEMENTED** + +This fix checks `last_assistant_message` from the hook input JSON before +falling back to transcript search. The transcript file may not be flushed yet +when the Stop hook fires. + +**Status:** Our v2.3.0 release (commit 1ae2daf, 2026-02-23) independently +implemented this same fix. Both `check-completion.sh` files already check +`last_assistant_message` first. No action needed. + +--- + +### 77c71bbf — fix: honor QUIET for transport banner +**Decision: IGNORE** + +Fixes a QUIET flag check in `run-taskmaster-codex.sh` — a Codex-specific +wrapper we don't have. + +--- + +## Summary + +| Commit | Decision | Reason | +|--------|----------|--------| +| cbd9443e | IGNORE | Codex integration, out of scope | +| 755d165f | IGNORE | Codex refinements, depends on above | +| 88ffd335 | IGNORE | Codex+Claude installer, Codex-first design | +| 452417af | IGNORE | Codex-centric README | +| 4e5075fd | IGNORE | Already in our SKILL.md | +| 547bfa74 | N/A | Removes file we never added | +| ca471bd8 | IGNORE | Shell-based compliance prompt, workaround we don't need | +| c04eeb18 | IGNORE | Depends on ca471bd8 | +| c2475d9c | IGNORE | Codex-only file | +| 6814a3f5 | IGNORE | Depends on ca471bd8 | +| **6598e99d** | **APPLY** | Tilde expansion bug in transcript_path | +| 71ff69c4 | ALREADY DONE | Implemented in our v2.3.0 | +| 77c71bbf | IGNORE | Codex-only file | + +**Net actions: 1 apply, 1 already done, 11 ignored** diff --git a/docs/upstream-reviews/blader-taskmaster-forks.md b/docs/upstream-reviews/blader-taskmaster-forks.md new file mode 100644 index 0000000..2f5d90a --- /dev/null +++ b/docs/upstream-reviews/blader-taskmaster-forks.md @@ -0,0 +1,314 @@ +# Fork Network Review: blader/taskmaster + +**Date**: 2026-04-28 +**Upstream baseline**: `blader/taskmaster@a1f3feb` ("chore: sync local skill updates", 2026-03-11) +**Our fork**: `micahstubbs/taskmaster` at v4.2.0 (codex wrapper + expect PTY bridge architecture) +**Scope**: All 32 forks of `blader/taskmaster`, default and feature branches. + +## Methodology + +For every fork in the network, compared default branch to upstream `main` via +`gh api repos/blader/taskmaster/compare/blader:main...:`. Then +enumerated all non-`main` branches and compared those too. Ignored forks that +were even with or only behind upstream. + +## Fork Activity Summary + +| Bucket | Count | Notes | +|---|---|---| +| Even with upstream | 6 | Pure forks, no original commits | +| Behind upstream only | 21 | Stale forks, no rebase needed | +| **Ahead of upstream** | **5** | Worth examining (1 fork, multiple branches) | + +The five branches with original commits (excluding our own fork): + +| Fork / branch | Ahead | Behind | Theme | +|---|---|---|---| +| `mickn:main` | 5 | 0 | **Native Codex hooks + semantic verifier** (major rewrite, v5.0.0) | +| `mickn:feat/codex-native-hooks` | 3 | 0 | Subset of `mickn:main` | +| `gjlondon:fix/stop-hook-feedback-loop` | 1 | 18 | **Single-fire-by-design** philosophy | +| `levi-openclaw:claude/openclaw-agent-skill-7JoVl` | 1 | 18 | OpenClaw platform port (not adoptable) | +| `Semenka:claude/create-claude-guide-NvneU` | 1 | 18 | Auto-generated CLAUDE.md (not adoptable) | + +Effectively, **mickn** is the only fork with substantial original engineering; +**gjlondon** contributes one focused architectural insight. + +--- + +## mickn/taskmaster — Native Hooks + Semantic Verifier (v5.0.0) + +5-commit chain that takes the project from "Codex via PTY wrapper" to "Codex via +native hooks." Files removed: + +- `hooks/inject-continue-codex.sh` (414 LOC queue emitter) +- `hooks/run-codex-expect-bridge.exp` (84 LOC expect bridge) +- `run-taskmaster-codex.sh` (215 LOC wrapper) +- All tests for the above + +Files added: + +- `hooks/taskmaster-session-start.sh` (27 LOC) +- `hooks/taskmaster-user-prompt-submit.sh` (107 LOC) +- `hooks/taskmaster-stop.sh` (356 LOC) +- `taskmaster-completion-verifier.py` (311 LOC) +- `taskmaster-state.sh` (15 LOC) +- New test suite + +Premise: the OpenAI Codex CLI now supports a Claude-Code-style hooks model +(`~/.codex/hooks.json` with `SessionStart`, `UserPromptSubmit`, `Stop` events). +This obviates the entire PTY-injection architecture our fork currently relies +on. **This is the most consequential change in the fork network and worth +verifying independently** (see "Open Questions" below). + +### Patterns worth adopting + +#### A1. Per-turn user prompt capture via UserPromptSubmit hook (HIGH VALUE) + +`hooks/taskmaster-user-prompt-submit.sh` writes the latest external user prompt +to `~/.codex/taskmaster/state/.json`, with explicit filtering of: + +1. **Hook-injected reprompts** — strings starting with `...` +3. **AGENTS.md preludes** — `# AGENTS.md instructions for ...` + +Why it matters: solves the "is this a real user goal or just a hook re-prompt?" +problem cleanly. The current fork has no equivalent — it relies on transcript +parsing inside the stop hook, which is brittle and can re-anchor onto its own +output. + +**Adopt this pattern** even if we keep the wrapper architecture — we can write +to a state file from the wrapper layer the same way. + +#### A2. Semantic completion verifier (HIGH VALUE, MEDIUM RISK) + +`taskmaster-completion-verifier.py` calls an OpenAI model +(`TASKMASTER_COMPLETION_MODEL`, default `gpt-5.4-mini`) with: + +- the captured user goal (from A1) +- `last_assistant_message` +- a clipped transcript excerpt (`TASKMASTER_COMPLETION_MAX_CONTEXT_CHARS`, + default 20000) + +Returns JSON `{complete: bool, reason: str, next_action: str}`. If +`complete=false`, the stop hook blocks with the verifier's `reason` and +`next_action` injected into the block reason. + +Notable engineering details: + +- **Secret redaction** before sending to the model (regexes for + `Authorization: bearer`, `api_key=`, `sk-...`, `lin_api_...`, `phx_...`, + `xox[baprs]-...`) +- **Loads `.env`** for `OPENAI_API_KEY` if not already in env +- **Pluggable**: `TASKMASTER_COMPLETION_VERIFIER_COMMAND` lets you swap in any + command that reads the same JSON stdin and returns the same JSON shape — so + users without an OpenAI key can wire in a local model +- **Fail-open on disable**: `TASKMASTER_COMPLETION_VERIFY=0|false|off|no` + reverts to the legacy `TASKMASTER_DONE::` token flow + +Why it matters: replaces "agent self-reports done" with "second-agent verifies +done." The legacy token approach trusts the agent's own assessment; the +verifier doesn't. For long-running autonomous work this is the difference +between "agent declared victory after 2/5 sub-tasks" and a hard machine check. + +**Adopt with care.** Two concrete concerns: (1) every stop attempt now costs +an OpenAI API call — at 30+ stop attempts per long session and gpt-5.4-mini +input pricing, this adds up; (2) `gpt-5.4-mini` is referenced as a default — +verify availability/pricing before defaulting; consider `claude-haiku-4-5` as +the Anthropic-side default with `OPENAI_API_KEY` as an alternative. + +#### A3. Optional repo-local verifier command (HIGH VALUE, LOW RISK) + +`TASKMASTER_VERIFY_COMMAND`: stop is blocked until the named shell command +exits 0. Output is captured (capped at `TASKMASTER_VERIFY_MAX_OUTPUT`, default +4000 bytes) and echoed back to the agent. + +Use cases: `cargo test`, `pnpm typecheck`, `make ci`, custom smoke scripts. +Pure win — pairs with the semantic verifier (or replaces it for repos with a +strong test suite). + +**Adopt.** Cheap to add, no external dependencies, immediately useful. + +#### A4. JSON state-file architecture (MEDIUM VALUE) + +`taskmaster-state.sh` exposes `taskmaster_turn_state_path "$session_id"` that +returns `$TASKMASTER_STATE_DIR/.json` (default +`~/.codex/taskmaster/state/`). All hooks read/write through this single API. + +Cleaner than our current scatter (counter file in `$TMPDIR/taskmaster/`, +queue files in another directory, no shared schema). + +**Adopt the pattern** even if the storage location stays separate per +platform. + +#### A5. `safe_copy` helper that no-ops on same-path source/dest (LOW VALUE) + +`install.sh:30-46` resolves `cd -P` absolute paths for both source and +destination and skips copy when they match. Prevents `cp: 'X' and 'X' are the +same file` errors when running install from inside the install target. Our +install.sh already has this — confirmed it's already in HEAD. + +**Already adopted.** + +### Patterns to consider but not adopt as-is + +#### B1. Wholesale replacement of the PTY wrapper + +`mickn` deletes the wrapper, expect bridge, and queue emitter outright. **Do +not adopt verbatim** without first verifying that: + +1. Codex CLI actually exposes `SessionStart`, `UserPromptSubmit`, and `Stop` + hooks in the version the user is on (`codex --version`) +2. The native `Stop` hook supports `decision: "block"` continuation in the + same way Claude Code does +3. The `last_assistant_message` field is populated by Codex on stop events + +If all three hold, the wrapper is dead weight and we should follow `mickn`. If +any are uncertain, keep both paths and gate on `command -v codex && codex +--help | grep -q hooks` or similar. + +#### B2. Removed test files + +`mickn` removes the wrapper test suite (`tests/inject-continue-codex.test.sh`, +`tests/run-codex-expect-bridge.test.sh`, `tests/run-taskmaster-codex.test.sh`). +If we decide to keep the wrapper as a fallback, keep the tests. + +--- + +## gjlondon/taskmaster — Single-fire by design + +One commit (`30ec9bd` "Fix stop hook feedback loop"). Diagnoses a real bug in +the upstream-style transcript-grep approach: the hook's own checklist text +(containing "status: in_progress") gets written into the transcript, which +then matches the hook's own grep on the next fire — guaranteeing +`HAS_INCOMPLETE_SIGNALS=true` forever. Infinite loop. + +Fix: make `stop_hook_active=true` an unconditional early exit before any +transcript analysis. Hook becomes single-fire — fires once with the checklist +prompt, allows stop on the next attempt regardless of transcript contents. + +### Pattern: rethink the "repeat until token" philosophy (LOW VALUE for our fork) + +Our fork already sidesteps this bug in a different way — we use +`last_assistant_message` for primary detection and only fall back to +transcript parsing for an explicit `TASKMASTER_DONE::` token (not +generic "in_progress" matches). The contamination problem doesn't apply. + +But the underlying philosophy question is worth a beat: + +- **gjlondon's claim**: "If the agent saw the checklist and still tries to + stop, either the work is done or re-firing won't help." +- **Our claim**: "Repeat until the agent emits an explicit done signal, + because some agents will try to bail before reading the prompt fully." + +Empirically, our position is correct for adversarial-stop cases, but +gjlondon's is correct in 95% of real sessions. The cost of being wrong on our +side is one extra reprompt cycle; the cost of being wrong on gjlondon's side +is a session that stops with work undone. + +**Do not adopt.** Keep the repeat-until-token model. But consider documenting +the design tradeoff in `docs/SPEC.md` so it doesn't get re-litigated. + +--- + +## levi-openclaw — Not adoptable + +Single commit "Rework Taskmaster as an OpenClaw agent skill" — wholesale port +to a different platform (`~/.openclaw/` paths, OpenClaw skill frontmatter, +`scripts/` subfolder convention). Has zero overlap with what we're trying to +do. Skip. + +## Semenka — Not adoptable + +Single commit adding a 95-line `CLAUDE.md` that's a generic Claude-Code +project guide for the upstream repo (auto-generated by the +`claude/create-claude-guide-NvneU` workflow). No taskmaster-specific +engineering content. Skip. + +--- + +## Recommended adoption plan + +In order of value/effort ratio: + +### Tier 1 — adopt now (low risk, high value) + +1. **`TASKMASTER_VERIFY_COMMAND`** (A3 above). Pure config addition. ~30 LOC + to wire into the existing `check-completion.sh` and Codex stop path. +2. **JSON state-file layout** (A4). Replace the bare counter file in + `$TMPDIR/taskmaster/` with a JSON state file under + `$TASKMASTER_STATE_DIR/.json` that holds counter, last-known + user prompt, and any future fields. Backward-compatible if the migration + reads the legacy counter file once and discards it. +3. **Hook-internal-prompt detection** in user-facing hooks (subset of A1). + Even without a full UserPromptSubmit hook, we can teach the wrapper layer + to recognize and not reprocess our own injected reprompts. + +### Tier 2 — adopt after upstream-reality check + +4. **Native Codex hooks path** (B1). Conditional on verifying that + `codex` actually supports the three hook types `mickn` assumes. If yes, + add a parallel native-hooks code path and let `install.sh` choose between + wrapper and native at install time based on `codex` capability detection. + Don't delete the PTY wrapper yet — keep as fallback for older Codex + installs. +5. **UserPromptSubmit hook for goal capture** (A1). Only useful with a native + hooks path — depends on (4). + +### Tier 3 — adopt with explicit knobs + +6. **Semantic completion verifier** (A2). High-value but introduces an LLM + dependency and per-stop API cost. Recommended shape for our fork: + - Default OFF (`TASKMASTER_COMPLETION_VERIFY=0`) — opt-in via env + - Default model `claude-haiku-4-5` (cheaper, lower latency, keeps us on + Anthropic infra) when `ANTHROPIC_API_KEY` is set; fall back to + OpenAI `gpt-5.4-mini` when only `OPENAI_API_KEY` is present + - Port the secret-redaction regex set verbatim — that's a free correctness + improvement + - Port `TASKMASTER_COMPLETION_VERIFIER_COMMAND` pluggable interface so + local-model users can wire in `llama-server` or similar + +### Not adopting + +7. Single-fire philosophy (gjlondon) — incompatible with our explicit-token + contract. +8. PTY-wrapper deletion (B1 verbatim) — premature until native hooks + verified. +9. OpenClaw port (levi-openclaw) — different platform. + +--- + +## Open questions for follow-up + +1. **Does `codex` actually support native `SessionStart`, `UserPromptSubmit`, + and `Stop` hooks?** Mickn's install.sh writes to `~/.codex/hooks.json`, + which implies yes, but we should verify on the version our users are + pinned to. If not, his entire architecture is conditional on a future + Codex release. + +2. **Is `gpt-5.4-mini` the right default verifier model?** Mickn picked it + without comment. We should benchmark cost-per-stop and verifier accuracy + against `claude-haiku-4-5` before defaulting. + +3. **Should the verifier short-circuit on transcript size?** A 20k-char + transcript at every stop attempt × 30 stops × dozens of users is real + tokens. Worth a "skip verifier if transcript hasn't changed since last + verifier call" cache. + +## Reproducing this review + +```bash +# enumerate forks ahead of upstream +for fork in $(gh api repos/blader/taskmaster/forks --paginate -q '.[].full_name'); do + ahead=$(gh api "repos/blader/taskmaster/compare/blader:main...${fork#*/}:main" \ + -q '.ahead_by' 2>/dev/null || echo 0) + [ "$ahead" -gt 0 ] && echo "$fork ahead=$ahead" +done + +# for each interesting fork, also check non-main branches +for fork in ; do + gh api "repos/${fork}/taskmaster/branches" -q '.[].name' \ + | grep -v '^main$' +done +``` diff --git a/hooks/check-completion.sh b/hooks/check-completion.sh index c0da960..06626c1 100755 --- a/hooks/check-completion.sh +++ b/hooks/check-completion.sh @@ -13,6 +13,12 @@ set -u SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" # shellcheck disable=SC1091 source "$SCRIPT_DIR/../taskmaster-compliance-prompt.sh" +# shellcheck disable=SC1091 +source "$SCRIPT_DIR/../taskmaster-verify-command.sh" +# shellcheck disable=SC1091 +source "$SCRIPT_DIR/../taskmaster-prompt-detect.sh" +# shellcheck disable=SC1091 +source "$SCRIPT_DIR/../taskmaster-state.sh" INPUT=$(cat) SESSION_ID=$(echo "$INPUT" | jq -r '.session_id') @@ -30,16 +36,13 @@ if [ -f "$TRANSCRIPT" ]; then fi fi -# --- counter --- -COUNTER_DIR="${TMPDIR:-/tmp}/taskmaster" -mkdir -p "$COUNTER_DIR" -COUNTER_FILE="${COUNTER_DIR}/${SESSION_ID}" -MAX=${TASKMASTER_MAX:-0} +# --- counter (state-file backed) --- +taskmaster_state_migrate_legacy_counter "$SESSION_ID" +taskmaster_state_init "$SESSION_ID" -COUNT=0 -if [ -f "$COUNTER_FILE" ]; then - COUNT=$(cat "$COUNTER_FILE" 2>/dev/null || echo "0") -fi +MAX=${TASKMASTER_MAX:-0} +COUNT="$(taskmaster_state_jq "$SESSION_ID" '.stop_count')" +[[ "$COUNT" =~ ^[0-9]+$ ]] || COUNT=0 transcript_has_done_signal() { local transcript_path="$1" @@ -82,16 +85,32 @@ if [ "$HAS_DONE_SIGNAL" = false ] && [ -f "$TRANSCRIPT" ]; then fi if [ "$HAS_DONE_SIGNAL" = true ]; then - rm -f "$COUNTER_FILE" + if [ -n "${TASKMASTER_VERIFY_COMMAND:-}" ]; then + if taskmaster_run_verify_command; then + taskmaster_state_update "$SESSION_ID" '.stop_count = 0' + exit 0 + else + VERIFY_REASON="$(generate_taskmaster_injected_tag verifier-feedback) +TASKMASTER: verifier failed (exit=${TASKMASTER_VERIFY_EXIT_CODE}). Command: ${TASKMASTER_VERIFY_COMMAND} + +Output (last ${TASKMASTER_VERIFY_MAX_OUTPUT:-4000} bytes): +${TASKMASTER_VERIFY_OUTPUT_TAIL} + +Token alone is insufficient when a verifier is configured. Fix the failures and try again." + jq -n --arg reason "$VERIFY_REASON" '{ decision: "block", reason: $reason }' + exit 0 + fi + fi + taskmaster_state_update "$SESSION_ID" '.stop_count = 0' exit 0 fi +taskmaster_state_increment_stop_count "$SESSION_ID" NEXT=$((COUNT + 1)) -echo "$NEXT" > "$COUNTER_FILE" # Optional escape hatch. Default is infinite (0) so hook keeps firing. if [ "$MAX" -gt 0 ] && [ "$NEXT" -ge "$MAX" ]; then - rm -f "$COUNTER_FILE" + taskmaster_state_update "$SESSION_ID" '.stop_count = 0' exit 0 fi @@ -109,7 +128,9 @@ fi # --- reprompt --- SHARED_PROMPT="$(build_taskmaster_compliance_prompt "$DONE_SIGNAL")" -REASON="${LABEL}: ${PREAMBLE} +INJECTED_TAG="$(generate_taskmaster_injected_tag stop-block)" +REASON="${INJECTED_TAG} +${LABEL}: ${PREAMBLE} ${SHARED_PROMPT}" diff --git a/hooks/inject-continue-codex.sh b/hooks/inject-continue-codex.sh index b163280..cb08eca 100755 --- a/hooks/inject-continue-codex.sh +++ b/hooks/inject-continue-codex.sh @@ -15,6 +15,10 @@ set -euo pipefail SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" # shellcheck disable=SC1091 source "$SCRIPT_DIR/../taskmaster-compliance-prompt.sh" +# shellcheck disable=SC1091 +source "$(dirname "${BASH_SOURCE[0]}")/../taskmaster-prompt-detect.sh" +# shellcheck disable=SC1091 +source "$(dirname "${BASH_SOURCE[0]}")/../taskmaster-state.sh" usage() { cat <<'USAGE' @@ -173,7 +177,10 @@ build_reprompt() { shared_prompt="$(build_taskmaster_compliance_prompt "$token")" + local injected_tag + injected_tag="$(generate_taskmaster_injected_tag followup)" cat < "$prompt_file" INJECTION_COUNT=$((INJECTION_COUNT + 1)) + if [[ -n "${SESSION_ID:-}" ]]; then + taskmaster_state_increment_stop_count "$SESSION_ID" 2>/dev/null || true + fi log_runtime "queued continuation prompt turn=${turn_id:-} count=${INJECTION_COUNT} file=${prompt_file}" if [[ "$QUIET" -eq 0 ]]; then echo "[TASKMASTER] queued continuation prompt for turn ${turn_id:-} (count=${INJECTION_COUNT}, file=${prompt_file})." >&2 diff --git a/install.sh b/install.sh index 4c4b382..17d9911 100755 --- a/install.sh +++ b/install.sh @@ -52,6 +52,9 @@ copy_skill_files() { safe_copy "$SCRIPT_DIR/install.sh" "$skill_dir/install.sh" safe_copy "$SCRIPT_DIR/uninstall.sh" "$skill_dir/uninstall.sh" safe_copy "$SCRIPT_DIR/taskmaster-compliance-prompt.sh" "$skill_dir/taskmaster-compliance-prompt.sh" + safe_copy "$SCRIPT_DIR/taskmaster-verify-command.sh" "$skill_dir/taskmaster-verify-command.sh" + safe_copy "$SCRIPT_DIR/taskmaster-prompt-detect.sh" "$skill_dir/taskmaster-prompt-detect.sh" + safe_copy "$SCRIPT_DIR/taskmaster-state.sh" "$skill_dir/taskmaster-state.sh" safe_copy "$SCRIPT_DIR/run-taskmaster-codex.sh" "$skill_dir/run-taskmaster-codex.sh" safe_copy "$SCRIPT_DIR/check-completion.sh" "$skill_dir/check-completion.sh" @@ -62,6 +65,9 @@ copy_skill_files() { chmod +x "$skill_dir/install.sh" chmod +x "$skill_dir/uninstall.sh" chmod +x "$skill_dir/taskmaster-compliance-prompt.sh" + chmod +x "$skill_dir/taskmaster-verify-command.sh" + chmod +x "$skill_dir/taskmaster-prompt-detect.sh" + chmod +x "$skill_dir/taskmaster-state.sh" chmod +x "$skill_dir/run-taskmaster-codex.sh" chmod +x "$skill_dir/check-completion.sh" chmod +x "$skill_dir/hooks/check-completion.sh" @@ -119,6 +125,25 @@ if not isinstance(stop_hooks, list): stop_hooks = [] container["Stop"] = stop_hooks +# Migrate stale entries pointing at the old in-skill hook path +# (~/.claude/skills/taskmaster/hooks/check-completion.sh) to the +# user-level $HOME/.claude/hooks/ path. Idempotent. +migrated = 0 +stale_marker = "skills/taskmaster/hooks/check-completion.sh" +for entry in stop_hooks: + if not isinstance(entry, dict): + continue + hooks = entry.get("hooks") + if not isinstance(hooks, list): + continue + for hook in hooks: + if not isinstance(hook, dict): + continue + cmd = hook.get("command") + if hook.get("type") == "command" and isinstance(cmd, str) and stale_marker in cmd: + hook["command"] = hook_command + migrated += 1 + exists = False for entry in stop_hooks: if not isinstance(entry, dict): @@ -153,8 +178,12 @@ with open(settings_path, "w", encoding="utf-8") as f: json.dump(data, f, indent=2) f.write("\n") +if migrated: + print(f" Claude: migrated {migrated} stale Stop hook entr{'y' if migrated == 1 else 'ies'} to {hook_command}") + if exists: - print(" Claude: Stop hook already configured") + if not migrated: + print(" Claude: Stop hook already configured") else: print(" Claude: added Stop hook to settings") PY @@ -230,6 +259,9 @@ install_claude() { mkdir -p "$CLAUDE_HOOKS_DIR" ln -sf "$CLAUDE_SKILL_DIR/check-completion.sh" "$CLAUDE_HOOK_LINK" ln -sf "$CLAUDE_SKILL_DIR/taskmaster-compliance-prompt.sh" "$CLAUDE_HOOKS_DIR/taskmaster-compliance-prompt.sh" + ln -sf "$CLAUDE_SKILL_DIR/taskmaster-verify-command.sh" "$CLAUDE_HOOKS_DIR/taskmaster-verify-command.sh" + ln -sf "$CLAUDE_SKILL_DIR/taskmaster-prompt-detect.sh" "$CLAUDE_HOOKS_DIR/taskmaster-prompt-detect.sh" + ln -sf "$CLAUDE_SKILL_DIR/taskmaster-state.sh" "$CLAUDE_HOOKS_DIR/taskmaster-state.sh" chmod +x "$CLAUDE_HOOK_LINK" echo " Claude: installed skill files to $CLAUDE_SKILL_DIR" diff --git a/taskmaster-prompt-detect.sh b/taskmaster-prompt-detect.sh new file mode 100755 index 0000000..8cb264f --- /dev/null +++ b/taskmaster-prompt-detect.sh @@ -0,0 +1,55 @@ +#!/usr/bin/env bash +# +# Detect prompts that Taskmaster itself injected, so they don't get +# treated as fresh user goals by downstream consumers (T2.2 user-prompt +# capture, T3 verifier). +# +# Two-tier detection: +# 1. Forward path: explicit `[taskmaster:injected v= kind=]` tag +# on the first non-empty line. Forward-compatible across schema bumps. +# 2. Legacy fallback: substring match against known wording from this +# project and from mickn/taskmaster's fork. +# + +# Idempotent re-source: avoid `readonly` re-declaration error under `set -e`. +[[ -n "${TASKMASTER_PROMPT_DETECT_LOADED:-}" ]] && return 0 +readonly TASKMASTER_PROMPT_DETECT_LOADED=1 +readonly TASKMASTER_INJECTED_TAG_VERSION=1 + +# Emit the canonical tag for a given kind. Caller prepends to their prompt. +# Kinds: stop-block, followup, compliance, session-start, verifier-feedback. +generate_taskmaster_injected_tag() { + local kind="${1:-unknown}" + printf '[taskmaster:injected v=%d kind=%s]' \ + "$TASKMASTER_INJECTED_TAG_VERSION" "$kind" +} + +is_taskmaster_injected_tag_line() { + local text="$1" + [[ "$text" =~ ^\[taskmaster:injected[[:space:]]v=[0-9]+[[:space:]]kind=[a-zA-Z0-9_-]+\] ]] +} + +is_taskmaster_legacy_injected_prompt() { + local text="$1" + case "$text" in + ".json +# +# Schema (v1): +# { +# "schema_version": 1, +# "session_id": "", +# "created_at": "", +# "updated_at": "", +# "stop_count": 0, +# "latest_user_prompt": null | {captured_at, turn_id, prompt}, +# "last_verifier_run": null | {ran_at, input_hash, complete, reason, next_action}, +# "metadata": {} +# } +# +# Atomicity: all writes go through tmp+mv guarded by flock on .lock. +# + +# Idempotent re-source guard (matches Phase B prompt-detect pattern). +[[ -n "${TASKMASTER_STATE_LOADED:-}" ]] && return 0 +readonly TASKMASTER_STATE_LOADED=1 + +taskmaster_state_dir() { + printf '%s\n' "${TASKMASTER_STATE_DIR:-${TMPDIR:-/tmp}/taskmaster/state}" +} + +taskmaster_state_path() { + local sid="$1" + printf '%s/%s.json\n' "$(taskmaster_state_dir)" "$sid" +} + +taskmaster_state_now() { + date -u +"%Y-%m-%dT%H:%M:%SZ" +} + +taskmaster_state_init() { + local sid="$1" + local path tmp lock now + path="$(taskmaster_state_path "$sid")" + mkdir -p "$(dirname "$path")" + [[ -f "$path" ]] && return 0 + + lock="${path}.lock" + tmp="${path}.tmp.$$" + now="$(taskmaster_state_now)" + + exec 9>"$lock" + flock 9 + if [[ ! -f "$path" ]]; then + if jq -n \ + --arg sid "$sid" \ + --arg now "$now" \ + '{ + schema_version: 1, + session_id: $sid, + created_at: $now, + updated_at: $now, + stop_count: 0, + latest_user_prompt: null, + last_verifier_run: null, + metadata: {} + }' >"$tmp"; then + mv "$tmp" "$path" + else + rm -f "$tmp" + exec 9>&- + return 1 + fi + fi + exec 9>&- +} + +taskmaster_state_jq() { + local sid="$1" expr="$2" + local path + path="$(taskmaster_state_path "$sid")" + [[ -f "$path" ]] || return 0 + jq -r "$expr" <"$path" 2>/dev/null +} + +# Run jq with a transformation expression and atomically write the result back. +taskmaster_state_update() { + local sid="$1" expr="$2" + local path tmp lock now + path="$(taskmaster_state_path "$sid")" + taskmaster_state_init "$sid" + + lock="${path}.lock" + tmp="${path}.tmp.$$" + now="$(taskmaster_state_now)" + + exec 9>"$lock" + flock 9 + if jq --arg now "$now" "$expr | .updated_at = \$now" "$path" >"$tmp"; then + mv "$tmp" "$path" + else + rm -f "$tmp" + exec 9>&- + return 1 + fi + exec 9>&- +} + +taskmaster_state_increment_stop_count() { + local sid="$1" + taskmaster_state_update "$sid" '.stop_count = (.stop_count + 1)' +} + +taskmaster_state_capture_prompt() { + local sid="$1" turn_id="$2" prompt="$3" + local path tmp lock now + path="$(taskmaster_state_path "$sid")" + taskmaster_state_init "$sid" + + lock="${path}.lock" + tmp="${path}.tmp.$$" + now="$(taskmaster_state_now)" + + exec 9>"$lock" + flock 9 + if jq \ + --arg now "$now" \ + --arg turn "$turn_id" \ + --arg prompt "$prompt" \ + '.latest_user_prompt = {captured_at: $now, turn_id: $turn, prompt: $prompt} + | .updated_at = $now' \ + "$path" >"$tmp"; then + mv "$tmp" "$path" + else + rm -f "$tmp" + exec 9>&- + return 1 + fi + exec 9>&- +} + +taskmaster_state_record_verifier_run() { + local sid="$1" input_hash="$2" complete="$3" reason="$4" next_action="$5" + case "$complete" in + true|false) ;; + *) return 64 ;; + esac + local path tmp lock now + path="$(taskmaster_state_path "$sid")" + taskmaster_state_init "$sid" + + lock="${path}.lock" + tmp="${path}.tmp.$$" + now="$(taskmaster_state_now)" + + exec 9>"$lock" + flock 9 + if jq \ + --arg now "$now" \ + --arg hash "$input_hash" \ + --argjson complete "$complete" \ + --arg reason "$reason" \ + --arg next "$next_action" \ + '.last_verifier_run = { + ran_at: $now, + input_hash: $hash, + complete: $complete, + reason: $reason, + next_action: $next + } | .updated_at = $now' \ + "$path" >"$tmp"; then + mv "$tmp" "$path" + else + rm -f "$tmp" + exec 9>&- + return 1 + fi + exec 9>&- +} + +# One-time migration: absorb legacy ${TMPDIR}/taskmaster/ counter into the +# state file's stop_count, then delete the legacy file. Idempotent — safe to +# call on every hook entry. +taskmaster_state_migrate_legacy_counter() { + local sid="$1" + local legacy="${TMPDIR:-/tmp}/taskmaster/${sid}" + [[ -f "$legacy" ]] || return 0 + + local count + count="$(cat "$legacy" 2>/dev/null || echo 0)" + [[ "$count" =~ ^[0-9]+$ ]] || count=0 + # Cap length to prevent int overflow on downstream NEXT=$((COUNT + 1)). + [[ ${#count} -le 12 ]] || count=0 + + taskmaster_state_init "$sid" + + local path tmp lock now + path="$(taskmaster_state_path "$sid")" + lock="${path}.lock" + tmp="${path}.tmp.$$" + now="$(taskmaster_state_now)" + + exec 9>"$lock" + flock 9 + # Re-check legacy file under lock — if a peer migrated already, no-op. + if [[ -f "$legacy" ]]; then + if jq --arg now "$now" --argjson n "$count" \ + '.stop_count = (.stop_count + $n) | .updated_at = $now' \ + "$path" >"$tmp"; then + mv "$tmp" "$path" + rm -f "$legacy" + else + rm -f "$tmp" + exec 9>&- + return 1 + fi + fi + exec 9>&- +} diff --git a/taskmaster-verify-command.sh b/taskmaster-verify-command.sh new file mode 100755 index 0000000..f5ec09c --- /dev/null +++ b/taskmaster-verify-command.sh @@ -0,0 +1,55 @@ +#!/usr/bin/env bash +# +# Optional shell verifier gate for the Taskmaster stop hook. +# +# When TASKMASTER_VERIFY_COMMAND is set, calling taskmaster_run_verify_command +# runs the command with a timeout, captures combined output (truncated), and +# sets: +# TASKMASTER_VERIFY_EXIT_CODE the command's exit code +# TASKMASTER_VERIFY_OUTPUT_TAIL last $TASKMASTER_VERIFY_MAX_OUTPUT bytes of output +# It returns the command's exit code (0 = pass, non-zero = block). +# When unset, returns 0 with empty fields (no-op pass). +# +# Env knobs: +# TASKMASTER_VERIFY_COMMAND command string; empty/unset = skip +# TASKMASTER_VERIFY_TIMEOUT seconds before SIGTERM (default 60); +5s grace SIGKILL +# TASKMASTER_VERIFY_MAX_OUTPUT bytes of output kept (default 4000) +# TASKMASTER_VERIFY_CWD optional cwd override +# + +taskmaster_run_verify_command() { + TASKMASTER_VERIFY_OUTPUT_TAIL="" + TASKMASTER_VERIFY_EXIT_CODE="" + + local cmd="${TASKMASTER_VERIFY_COMMAND:-}" + if [[ -z "$cmd" ]]; then + return 0 + fi + + local timeout_sec="${TASKMASTER_VERIFY_TIMEOUT:-60}" + local max_output="${TASKMASTER_VERIFY_MAX_OUTPUT:-4000}" + local cwd="${TASKMASTER_VERIFY_CWD:-}" + local out_file rc=0 + local prev_errexit=0 + case $- in *e*) prev_errexit=1;; esac + set +e + + out_file="$(mktemp "${TMPDIR:-/tmp}/taskmaster-verify.XXXXXX")" + trap 'rm -f "$out_file"' RETURN + + if [[ -n "$cwd" ]]; then + ( cd "$cwd" && timeout --kill-after=5 "$timeout_sec" bash -c "$cmd" ) \ + >"$out_file" 2>&1 + rc=$? + else + timeout --kill-after=5 "$timeout_sec" bash -c "$cmd" >"$out_file" 2>&1 + rc=$? + fi + + TASKMASTER_VERIFY_OUTPUT_TAIL="$(tail -c "$max_output" "$out_file" 2>/dev/null || true)" + TASKMASTER_VERIFY_EXIT_CODE="$rc" + + rm -f "$out_file" + if [[ "$prev_errexit" == "1" ]]; then set -e; fi + return "$rc" +} diff --git a/tests/prompt-detect.test.sh b/tests/prompt-detect.test.sh new file mode 100644 index 0000000..cb2c03a --- /dev/null +++ b/tests/prompt-detect.test.sh @@ -0,0 +1,80 @@ +#!/usr/bin/env bash +# +# Tests for taskmaster-prompt-detect.sh. +# +set -euo pipefail + +REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" +LIB="$REPO_ROOT/taskmaster-prompt-detect.sh" + +# shellcheck disable=SC1090 +source "$LIB" + +PASS_COUNT=0 +FAIL_COUNT=0 + +assert_detected() { + local name="$1" + local text="$2" + if is_taskmaster_injected_prompt "$text"; then + printf 'ok %s\n' "$name" + PASS_COUNT=$((PASS_COUNT + 1)) + else + printf 'FAIL %s\n' "$name" >&2 + FAIL_COUNT=$((FAIL_COUNT + 1)) + fi +} + +assert_not_detected() { + local name="$1" + local text="$2" + if is_taskmaster_injected_prompt "$text"; then + printf 'FAIL %s (false positive)\n' "$name" >&2 + FAIL_COUNT=$((FAIL_COUNT + 1)) + else + printf 'ok %s\n' "$name" + PASS_COUNT=$((PASS_COUNT + 1)) + fi +} + +# --- Tag detection --- +assert_detected "tagged stop-block" \ + "[taskmaster:injected v=1 kind=stop-block] +TASKMASTER (1): Stop is blocked..." + +assert_detected "tagged followup" \ + "[taskmaster:injected v=1 kind=followup] +continue" + +assert_detected "tagged compliance" "[taskmaster:injected v=1 kind=compliance]" + +# --- Forward-compat: future schema version still detected --- +assert_detected "future schema v=99" "[taskmaster:injected v=99 kind=anything]" + +# --- Legacy substring matches (back-compat with mickn's prompts and our own) --- +assert_detected "legacy: ..." +assert_detected "legacy: Stop is blocked" "Stop is blocked until completion is explicitly confirmed." +assert_detected "legacy: Completion check before stopping" "Completion check before stopping." +assert_detected "legacy: TASKMASTER (N) label" "TASKMASTER (5/100): Stop is blocked..." +assert_detected "legacy: TASKMASTER (N) label, no max" "TASKMASTER (5): Stop is blocked..." +assert_detected "legacy: Goal not yet verified complete" "Goal not yet verified complete." +assert_detected "legacy: Recent tool errors were detected" "Recent tool errors were detected." + +# --- Negatives --- +assert_not_detected "empty string" "" +assert_not_detected "real user prompt" "fix the failing test in foo_test.go" +assert_not_detected "user mentions taskmaster word" "I want to use taskmaster for this project" +assert_not_detected "tag-like but malformed" "[taskmaster:injected]" +assert_not_detected "tag-like but missing v=" "[taskmaster:injected kind=stop-block]" + +# --- generate_taskmaster_injected_tag produces a parseable tag --- +TAG="$(generate_taskmaster_injected_tag stop-block)" +assert_detected "generated tag is detectable" "$TAG" +[[ "$TAG" == "[taskmaster:injected v=1 kind=stop-block]" ]] && { + printf 'ok generated tag exact format\n'; PASS_COUNT=$((PASS_COUNT + 1)); +} || { + printf 'FAIL generated tag exact format (got: %s)\n' "$TAG" >&2; FAIL_COUNT=$((FAIL_COUNT + 1)); +} + +printf '\n%d passed, %d failed\n' "$PASS_COUNT" "$FAIL_COUNT" +[[ "$FAIL_COUNT" == 0 ]] diff --git a/tests/state.test.sh b/tests/state.test.sh new file mode 100644 index 0000000..e2dc1cb --- /dev/null +++ b/tests/state.test.sh @@ -0,0 +1,149 @@ +#!/usr/bin/env bash +# +# Tests for taskmaster-state.sh. +# +set -euo pipefail + +REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" +LIB="$REPO_ROOT/taskmaster-state.sh" + +# Isolate state under a temp dir +TEST_HOME="$(mktemp -d "${TMPDIR:-/tmp}/taskmaster-state-test.XXXXXX")" +trap 'rm -rf "$TEST_HOME"; rm -f "${TMPDIR:-/tmp}/taskmaster/sess-legacy-$$" "${TMPDIR:-/tmp}/taskmaster/sess-migrate-additive-$$" "${TMPDIR:-/tmp}/taskmaster/sess-overflow-$$"' EXIT +export TASKMASTER_STATE_DIR="$TEST_HOME/state" + +# shellcheck disable=SC1090 +source "$LIB" + +PASS=0; FAIL=0 +ok() { printf 'ok %s\n' "$1"; PASS=$((PASS+1)); } +fail() { printf 'FAIL %s\n' "$1" >&2; FAIL=$((FAIL+1)); } + +# --- init creates well-formed JSON with schema_version=1 --- +SID="sess-$$" +taskmaster_state_init "$SID" +PATH_OUT="$(taskmaster_state_path "$SID")" +[[ -f "$PATH_OUT" ]] && ok "init creates file" || fail "init creates file" +SV="$(jq -r .schema_version <"$PATH_OUT")" +[[ "$SV" == "1" ]] && ok "schema_version is 1" || fail "schema_version is 1 (got $SV)" +SI="$(jq -r .session_id <"$PATH_OUT")" +[[ "$SI" == "$SID" ]] && ok "session_id stamped" || fail "session_id stamped" +SC="$(jq -r .stop_count <"$PATH_OUT")" +[[ "$SC" == "0" ]] && ok "stop_count starts at 0" || fail "stop_count starts at 0 (got $SC)" + +# --- increment --- +taskmaster_state_increment_stop_count "$SID" +SC="$(jq -r .stop_count <"$PATH_OUT")" +[[ "$SC" == "1" ]] && ok "stop_count after one increment is 1" || fail "increment 1 (got $SC)" + +taskmaster_state_increment_stop_count "$SID" +taskmaster_state_increment_stop_count "$SID" +SC="$(jq -r .stop_count <"$PATH_OUT")" +[[ "$SC" == "3" ]] && ok "stop_count after three increments is 3" || fail "increment 3 (got $SC)" + +# --- concurrent increments --- +SID2="sess-conc-$$" +taskmaster_state_init "$SID2" +PATH_C="$(taskmaster_state_path "$SID2")" +N=50 +for i in $(seq 1 "$N"); do + ( taskmaster_state_increment_stop_count "$SID2" ) & +done +wait +SC="$(jq -r .stop_count <"$PATH_C")" +[[ "$SC" == "$N" ]] && ok "concurrent $N increments reach $N" \ + || fail "concurrent increments lost some (got $SC, expected $N)" + +# --- legacy migration --- +LEGACY_DIR="${TMPDIR:-/tmp}/taskmaster" +mkdir -p "$LEGACY_DIR" +SID3="sess-legacy-$$" +LEGACY_FILE="$LEGACY_DIR/$SID3" +echo "7" > "$LEGACY_FILE" + +# State doesn't exist yet; migration should pull the 7 +taskmaster_state_migrate_legacy_counter "$SID3" +[[ -f "$(taskmaster_state_path "$SID3")" ]] && ok "legacy migration creates state file" \ + || fail "legacy migration creates state file" +SC="$(jq -r .stop_count <"$(taskmaster_state_path "$SID3")")" +[[ "$SC" == "7" ]] && ok "legacy counter value migrated" \ + || fail "legacy counter value migrated (got $SC, expected 7)" +[[ ! -f "$LEGACY_FILE" ]] && ok "legacy file deleted after migration" \ + || fail "legacy file deleted after migration" + +# --- migration is idempotent (rerun without legacy file is a no-op) --- +taskmaster_state_migrate_legacy_counter "$SID3" +SC="$(jq -r .stop_count <"$(taskmaster_state_path "$SID3")")" +[[ "$SC" == "7" ]] && ok "second migration call is a no-op" \ + || fail "second migration mutated state (got $SC)" + +# --- jq read of nonexistent path returns null --- +SID4="sess-empty-$$" +VAL="$(taskmaster_state_jq "$SID4" '.latest_user_prompt.prompt' 2>/dev/null || echo "MISSING")" +[[ "$VAL" == "null" || -z "$VAL" || "$VAL" == "MISSING" ]] && ok "nonexistent path read is safe" \ + || fail "nonexistent path read is safe (got $VAL)" + +# --- Critical #1 regression: corrupted state file is preserved, not clobbered --- +SID5="sess-corrupt-$$" +PATH5="$(taskmaster_state_path "$SID5")" +mkdir -p "$(dirname "$PATH5")" +printf 'this is not json' > "$PATH5" +PRE_BYTES=$(wc -c < "$PATH5") +set +e +taskmaster_state_update "$SID5" '.stop_count = 99' 2>/dev/null +RC=$? +set -e +POST_BYTES=$(wc -c < "$PATH5") +[[ "$RC" != "0" ]] && ok "corrupted file: update returns non-zero" \ + || fail "corrupted file: update returns non-zero (got $RC)" +[[ "$POST_BYTES" == "$PRE_BYTES" ]] && ok "corrupted file: bytes preserved exactly" \ + || fail "corrupted file: bytes changed (pre=$PRE_BYTES, post=$POST_BYTES)" +if compgen -G "${PATH5}.tmp.*" >/dev/null; then + fail "corrupted file: tmp file leaked: $(echo "${PATH5}".tmp.*)" +else + ok "corrupted file: tmp file cleaned up" +fi + +# --- Critical #2 regression: migrate is additive (doesn't rewind a peer increment) --- +LEGACY_DIR="${TMPDIR:-/tmp}/taskmaster" +mkdir -p "$LEGACY_DIR" +SID6="sess-migrate-additive-$$" +PATH6="$(taskmaster_state_path "$SID6")" +# Pre-populate state as if a peer had already migrated AND incremented +taskmaster_state_init "$SID6" +taskmaster_state_update "$SID6" '.stop_count = 100' +# Plant a legacy file simulating a stale handle +echo "5" > "$LEGACY_DIR/$SID6" +# Now migrate — should absorb additively and not rewind +taskmaster_state_migrate_legacy_counter "$SID6" +SC="$(jq -r .stop_count <"$PATH6")" +[[ "$SC" == "105" ]] && ok "migrate is additive (100 + 5 = 105)" \ + || fail "migrate is additive — expected 105, got $SC" +[[ ! -f "$LEGACY_DIR/$SID6" ]] && ok "additive migrate still removes legacy file" \ + || fail "additive migrate did not remove legacy file" + +# --- Important #3 regression: oversize legacy counter is capped to 0 --- +SID7="sess-overflow-$$" +echo "999999999999999999999999999999" > "$LEGACY_DIR/$SID7" +taskmaster_state_migrate_legacy_counter "$SID7" +SC="$(jq -r .stop_count <"$(taskmaster_state_path "$SID7")")" +[[ "$SC" == "0" ]] && ok "oversize legacy counter capped to 0" \ + || fail "oversize legacy counter not capped (got $SC)" + +# --- Important #6 regression: record_verifier_run rejects non-boolean complete --- +SID8="sess-verifier-$$" +taskmaster_state_init "$SID8" +set +e +taskmaster_state_record_verifier_run "$SID8" "hash1" '"true"' "ok" "next" 2>/dev/null +RC=$? +set -e +[[ "$RC" == "64" ]] && ok "record_verifier_run rejects string complete with EX_USAGE" \ + || fail "record_verifier_run accepted string complete (rc=$RC)" +# Sanity: bare true is accepted +taskmaster_state_record_verifier_run "$SID8" "hash2" 'true' "ok" "next" +COMPLETE="$(jq -r .last_verifier_run.complete <"$(taskmaster_state_path "$SID8")")" +[[ "$COMPLETE" == "true" ]] && ok "record_verifier_run accepts bare true" \ + || fail "record_verifier_run rejected bare true (got $COMPLETE)" + +printf '\n%d passed, %d failed\n' "$PASS" "$FAIL" +[[ "$FAIL" == 0 ]] diff --git a/tests/verify-command.test.sh b/tests/verify-command.test.sh new file mode 100644 index 0000000..81c2767 --- /dev/null +++ b/tests/verify-command.test.sh @@ -0,0 +1,82 @@ +#!/usr/bin/env bash +# +# Tests for taskmaster-verify-command.sh. +# +set -euo pipefail + +REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" +LIB="$REPO_ROOT/taskmaster-verify-command.sh" + +# shellcheck disable=SC1090 +source "$LIB" + +PASS_COUNT=0 +FAIL_COUNT=0 + +assert() { + local name="$1" + local condition="$2" + if eval "$condition"; then + printf 'ok %s\n' "$name" + PASS_COUNT=$((PASS_COUNT + 1)) + else + printf 'FAIL %s\n' "$name" >&2 + FAIL_COUNT=$((FAIL_COUNT + 1)) + fi +} + +# --- Unset command is a no-op pass --- +unset TASKMASTER_VERIFY_COMMAND TASKMASTER_VERIFY_TIMEOUT TASKMASTER_VERIFY_MAX_OUTPUT TASKMASTER_VERIFY_CWD +TASKMASTER_VERIFY_OUTPUT_TAIL="" +TASKMASTER_VERIFY_EXIT_CODE="" +taskmaster_run_verify_command +assert "unset command returns 0" "[[ \"$?\" == \"0\" ]]" +assert "unset command leaves exit code blank" "[[ -z \"$TASKMASTER_VERIFY_EXIT_CODE\" ]]" + +# --- Successful command --- +TASKMASTER_VERIFY_COMMAND="true" +taskmaster_run_verify_command +rc=$? +assert "successful command returns 0" "[[ \"$rc\" == \"0\" ]]" +assert "successful command sets exit code 0" "[[ \"$TASKMASTER_VERIFY_EXIT_CODE\" == \"0\" ]]" + +# --- Failing command --- +TASKMASTER_VERIFY_COMMAND="exit 7" +set +e; taskmaster_run_verify_command; rc=$?; set -e +assert "failing command propagates exit code" "[[ \"$rc\" == \"7\" ]]" +assert "failing command captures exit code 7" "[[ \"$TASKMASTER_VERIFY_EXIT_CODE\" == \"7\" ]]" + +# --- Output captured --- +TASKMASTER_VERIFY_COMMAND='echo hello-world; echo to-stderr >&2' +taskmaster_run_verify_command +assert "stdout captured" '[[ "$TASKMASTER_VERIFY_OUTPUT_TAIL" == *hello-world* ]]' +assert "stderr captured (combined)" '[[ "$TASKMASTER_VERIFY_OUTPUT_TAIL" == *to-stderr* ]]' + +# --- Output truncation --- +TASKMASTER_VERIFY_COMMAND='yes hello | head -c 50000' +TASKMASTER_VERIFY_MAX_OUTPUT=200 +taskmaster_run_verify_command +unset TASKMASTER_VERIFY_MAX_OUTPUT +assert "output truncated to MAX_OUTPUT bytes" "[[ \"\${#TASKMASTER_VERIFY_OUTPUT_TAIL}\" -le 200 ]]" + +# --- Timeout --- +TASKMASTER_VERIFY_COMMAND='sleep 30' +TASKMASTER_VERIFY_TIMEOUT=1 +set +e; START=$(date +%s); taskmaster_run_verify_command; rc=$?; END=$(date +%s); set -e +unset TASKMASTER_VERIFY_TIMEOUT +ELAPSED=$((END - START)) +assert "timeout fires within 10s" "[[ \"$ELAPSED\" -lt 10 ]]" +assert "timeout produces non-zero exit" "[[ \"$rc\" != \"0\" ]]" + +# --- CWD respected --- +TMPCWD="$(mktemp -d)" +trap 'rm -rf "$TMPCWD"' EXIT +TASKMASTER_VERIFY_COMMAND='pwd' +TASKMASTER_VERIFY_CWD="$TMPCWD" +taskmaster_run_verify_command +unset TASKMASTER_VERIFY_CWD +TMPCWD_REAL="$(cd "$TMPCWD" && pwd -P)" +assert "cwd honored" '[[ "$TASKMASTER_VERIFY_OUTPUT_TAIL" == *"$TMPCWD_REAL"* ]]' + +printf '\n%d passed, %d failed\n' "$PASS_COUNT" "$FAIL_COUNT" +[[ "$FAIL_COUNT" == 0 ]] diff --git a/uninstall.sh b/uninstall.sh index c1a2c5e..abb5151 100755 --- a/uninstall.sh +++ b/uninstall.sh @@ -201,6 +201,10 @@ uninstall_claude() { echo "Removing Taskmaster from Claude..." remove_claude_stop_hook_from_settings "$CLAUDE_SETTINGS_PATH" remove_symlink_if_target "$CLAUDE_HOOK_LINK" "$CLAUDE_CHECK_SCRIPT" + rm -f "$CLAUDE_ROOT/hooks/taskmaster-compliance-prompt.sh" + rm -f "$CLAUDE_ROOT/hooks/taskmaster-verify-command.sh" + rm -f "$CLAUDE_ROOT/hooks/taskmaster-prompt-detect.sh" + rm -f "$CLAUDE_ROOT/hooks/taskmaster-state.sh" remove_dir_if_exists "$CLAUDE_SKILL_DIR" }