Skip to content

feat(cursor): operator-driven legacy stream-json -> ACP session migrator (transplant)#34

Open
heavygee wants to merge 29 commits into
mainfrom
spike/cursor-legacy-to-acp-migrator
Open

feat(cursor): operator-driven legacy stream-json -> ACP session migrator (transplant)#34
heavygee wants to merge 29 commits into
mainfrom
spike/cursor-legacy-to-acp-migrator

Conversation

@heavygee
Copy link
Copy Markdown
Owner

@heavygee heavygee commented Jun 6, 2026

Summary

Adds invisible, automatic, per-session ACP migration triggered on first
open of a legacy stream-json Cursor session. The hub transplants the legacy
~/.cursor/chats/<wsh>/<uuid>/store.db into ~/.cursor/acp-sessions/<uuid>/,
verifies it loads via agent acp, flips cursorSessionProtocol = 'acp',
removes the legacy source — all before resumeSession returns its target.
The user sees one upgrade pause; subsequent opens are pure ACP.

Sees tiann/hapi#824 for the design and the spike report at
docs/plans/2026-06-06-cursor-legacy-to-acp-spike.md for the underlying
evidence (identical SQLite schema, only the directory layout differs
between legacy and ACP).

Why this is safety, not feature-parity

This PR's primary justification is remediation for #784 (cursor-agent
fabricates Questions skipped by the user responses in legacy stream-json
mode). Despite #801's mitigation, the bug still fires regularly in
dogfood: the agent ships destructive side effects against fabricated
consent. Migration to ACP closes the protocol-level door — the
AskQuestion tool doesn't exist on the ACP side, so there's nothing to
fabricate.

#799 deliberately left existing sessions on legacy "because they keep
working." That tradeoff was reasonable at the time. The accumulated #784
evidence makes legacy sessions actively unsafe; this PR makes the upgrade
path invisible enough that operators stop avoiding it.

Coordination with upstream ACP mop-up

This PR stacks on top of swear01's three open ACP mop-up PRs:

Local soup is already running this PR's branch stacked on top of tiann#835/tiann#836/tiann#837
so the migrator is exercised against the eventual upstream shape.

Live dogfood on the soup (2026-06-07)

After the original PR draft landed I dogfooded against the live hub
(hapi-hub.service on the operator's box) instead of trusting the unit
suite alone. Four migrations on real sessions:

Time (UTC+1) Cursor session id Path Replay notifications Duration sourceRemoved
18:08:07 5a2fba34-... (jellybot-dev, active operator session) jellybot-dev 8650 19.9s true
18:08:42 6c63c1f4-... androidsideloader 169 15.0s true
18:28:35 3dd30dc4-... sparling 3629 16.2s true

All four triggered via the existing Reopen path in the Web UI (operator
clicks; hub auto-fires the helper inside resumeSession). All four:

  • ended at cursorSessionProtocol=acp in hapi.db
  • created the ACP twin at ~/.cursor/acp-sessions/<curId>/{store.db,meta.json,WAL}
  • removed the legacy ~/.cursor/chats/<wsh>/<curId>/store.db (except where
    an active runner held an open fd — see "Known wart" below)
  • preserved the user's typed-but-unsent draft across the chat re-render

The active-session migration (jellybot-dev: operator's own working
session, 374MB store.db, 8650 messages of history) succeeded mid-flight
— the conversation continued in the upgraded ACP session without the
operator having to do anything beyond clicking Reopen.

Post-mortem: three follow-up commits after the original draft

The first live dogfood surfaced an environmental gap the unit suite did
not catch. Three follow-up commits:

3cc4c037 — observability

The helper's success path and all four early-return guards were silent.
On the first dogfood run, zero log lines meant I could not distinguish
"helper bailed at guard X" from "helper succeeded silently" from "helper
was never called." Added five log points:

  • [auto-migrate] considering (unconditional entry)
  • [auto-migrate] skipped: <reason> per guard
  • [auto-migrate] starting transplant
  • [auto-migrate] success (structured: durationMs, sourceRemoved,
    replayNotifications, lastUsedModelPreserved)
  • [auto-migrate] legacy ... left as stream-json (soft fail; existed)

This is now the source of the receipts in the table above.

5d61dcf9 (attempt 1) → 6c682fac (fix): PATH augmentation for verify-probe

Second dogfood run produced:

[auto-migrate] legacy cursor session left as stream-json
  reason: "verify_load_failed"
  message: "initialize failed: Executable not found in $PATH: \"agent\""

Root cause: hapi-hub.service ships systemd's minimal PATH
(/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin) and
never sees Cursor's standard install location at $HOME/.local/bin/agent.
hapi-runner.service hand-fixes this in its drop-in via
Environment=PATH=$HOME/.local/bin:.... The hub had no equivalent,
so every spawn('agent', ...) inside the verify probe returned ENOENT.

5d61dcf9 prepended $HOME/.local/bin and $HOME/.npm-global/bin to
baseEnv.PATH before spawning. Wrong fix: the migrator passes
HOME=<tmpdir> to the probe for HAPI_HOME isolation, so
baseEnv.HOME pointed at a tmpdir that doesn't contain .local/bin.
6c682fac switched the lookup to process.env.HOME (the hub process's
own home, set by systemd User=) and added a regression test that
explicitly distinguishes process.env.HOME from a per-spawn HOME
override. Right fix: the agent binary lives at the operator's real
home regardless of any per-verify HOME override (Cursor installs at
the user's $HOME/.local/bin at install time, not at spawn time).

The safety contract held across all of this: every failed verify left
the legacy session exactly as it was; no rm fired without a verify
success.

In-flight upgrade banner (UX A++, ships in this PR)

The migration takes 15-20 seconds on long sessions (copy a multi-hundred-MB
store, spawn agent acp, replay thousands of notifications, tear down the
probe). The first dogfood pass shipped this window with no progress
indicator — and the operator's lived feedback made clear that was a
non-starter:

"It does take a little bit longer than normal and from user's
perspective it looks like it's broken and nothing is working and then
suddenly it just refreshes the entire session presumably as a new
session and then seamlessly replaces it and then carries on it's not
fantastic but it absolutely works"

Shipping the migrator without the banner means every first-time reviewer
forms a sticky bad first impression — "looks broken." The polish IS the
feature surface here, not an optional follow-up.

Design (intentionally minimal, no SSE event-type expansion):

  • Hub sets metadata.cursorMigrationState='in_progress' BEFORE the
    long-running transplant call. The session-cache refresh emits the
    existing session-updated SSE event (no new event type), so the web
    client picks it up in milliseconds. No client-side polling change
    needed; the SSE channel is already real-time.

  • Hub clears the flag in the SAME metadata write that flips
    cursorSessionProtocol to 'acp' on success. A single SSE event
    carries both transitions, so the banner disappears in the same render
    tick the chat re-renders as ACP — no flicker window between
    "banner gone" and "ACP content rendered".

  • Hub clears the flag explicitly in the auto-migrate helper's finally
    on failure / exception, so the banner never gets stuck if migration
    soft-fails and the legacy launcher takes over.

  • Web renders an accessible (role=status, aria-live=polite) banner
    with an indeterminate spinner and short explanation when the flag is
    set. Deliberately no fake percentage — we don't have phase data and a
    fake progress bar would lie. The minimal-information design beats
    engineering-brain over-design (per-phase SSE events, progress
    percentages) which doesn't actually solve the lived rendering-timing
    pain.

Files added (UX A++ slice on top of the migrator core):

  • shared/src/schemas.ts: add cursorMigrationState: z.enum(['in_progress']) to MetadataSchema
  • hub/src/sync/syncEngine.ts: flipCursorSessionProtocolToAcp drops the flag in the atomic write; setCursorMigrationStateInProgress / clearCursorMigrationState helpers with version-mismatch retry; maybeAutoMigrateLegacyCursorSession sets the flag before transplant and finally-clears on failure/exception only
  • hub/src/sync/syncEngineAutoMigrate.test.ts: 4 new tests pin the flag transitions (failure / exception / success-no-clear / flip-atomic-clears)
  • web/src/components/CursorMigrationBanner.tsx: thin banner component (50 lines)
  • web/src/components/CursorMigrationBanner.test.tsx: 10 tests covering all visibility paths + a11y
  • web/src/components/SessionChat.tsx: render banner between SessionHeader and TeamPanel
  • web/src/lib/locales/{en,zh-CN}.ts: copy strings (English + Simplified Chinese)
  • web/src/types/api.ts: re-export Metadata for component typing

Surfaces (drastically reduced from earlier draft)

Per the cost critique on #824, the bulk-migration surface area is
stripped. No CLI subcommand, no web button, no REST bulk endpoint, no
Store.backupTo, no dryRun, no candidate-listing API.

  • Hub: SyncEngine.resumeSession calls maybeAutoMigrateLegacyCursorSession
    before resolving the target. Migration is transparent.
  • Env kill-switch: HAPI_CURSOR_LEGACY_AUTO_MIGRATE=0|false|no|off
    disables auto-migration (default = ON).
  • Operator escape-hatch: POST /api/sessions/:id/migrate-to-acp for
    manual one-off triggers (used by tests + operator scripts).

Verify-probe isolation

The probe spawn uses an isolated HOME and HAPI_HOME (both pointed at
a per-probe mkdtempSync directory). This means:

  • the probe's agent acp cannot see or mutate the user's real
    ~/.cursor/acp-sessions/
  • the probe's lock acquisition cannot collide with the host's live
    agent-acp-active lock

The real cp of the legacy store into ~/.cursor/acp-sessions/<uuid>/
happens after the probe succeeds in the temp HOME. The legacy source
is rmd only after the hapi.db protocol flip lands cleanly.

Refusal contract (silent — caller falls back to legacy launcher)

  • not a cursor session
  • already on ACP
  • missing cursorSessionId
  • env flag explicitly disabled
  • migrator returns refusal (logged at info, soft fail)
  • unexpected throw (logged at warn, soft fail)

The session is never left in a half-migrated state. Verify failure rolls
back the placement before the protocol flip; WAL checkpoint + size
checks ensure store.db consistency across the copy.

Verification

  • 53 unit tests across cursorLegacyMigrator.test.ts (45 tests covering
    refusals, rollback, telemetry, force-archive, ACP-lock dependency
    injection, fingerprint protection) + syncEngineAutoMigrate.test.ts
    (8 tests covering the guard matrix + soft-fail behaviour).
  • 3 integration tests spawning a real agent acp (opt-in via
    CURSOR_AGENT_INTEGRATION=1):
    • synthetic legacy store from buildSyntheticLegacyStore
    • operator-supplied fodder UUIDs via LEGACY_FODDER_WSH + LEGACY_FODDER_UUID
  • Full hub suite: 296/296 pass + 3 integration-skipped (CI-default).
  • bun run typecheck clean.

Out of scope (mentioned in tiann#824 for future work)

  • hapi cursor import --from-cursor-disk for the ~329 chats in
    ~/.cursor/chats/ that HAPI has no metadata row for. Separate discovery
    • metadata-synthesis concern.
  • Forced migration of running sessions. Current implementation refuses
    cleanly when active=true; the next resume on an inactive row triggers
    the migration invisibly.

Test plan (reviewer)

  • Confirm maybeAutoMigrateLegacyCursorSession runs before the
    legacy launcher resolves its target (otherwise the user gets a stale
    legacy spawn).
  • Confirm refusal paths in cursorLegacyMigrator.test.ts match the
    contract in apiTypes.ts (CursorMigrateRefusalReason).
  • Verify the verify probe's HOME + HAPI_HOME isolation prevents
    pollution of the user's real ~/.cursor/acp-sessions/.
  • Confirm HAPI_CURSOR_LEGACY_AUTO_MIGRATE=0 cleanly disables auto-migration.

heavygee added 2 commits June 6, 2026 21:28
…tor (transplant)

Adds a per-session and bulk migrator that moves pre-ACP Cursor sessions
(left over from tiann#799's cutover) into the on-disk location `agent acp` reads
from, then flips `cursorSessionProtocol = 'acp'` so the existing
cursorAcpRemoteLauncher picks the session up on next resume.

The migration is a transplant, not a replay: the legacy `store.db` is
byte-identical in schema to the ACP one; only the directory layout differs.
The migrator copies the legacy `~/.cursor/chats/<wsh>/<uuid>/store.db` into
`~/.cursor/acp-sessions/<uuid>/`, synthesizes `meta.json`, drives
`initialize + session/load + session/prompt` against a real `agent acp` in
an isolated `$HOME` (staging - no pollution of the operator's real store),
preserves `lastUsedModel` from the legacy meta record onto the HAPI
sessions.model column, then removes the legacy source. Each step has a
rollback path.

Surfaces:
  - POST /api/sessions/:id/migrate-to-acp (web button)
  - POST /api/cursor/migrate-legacy + GET /api/cursor/legacy-candidates
  - Mirror endpoints under /cli/* (Bearer cliApiToken) for the CLI
  - `hapi cursor migrate <id> | --all | --dry-run | --keep-source |
    --force-archive-running | --list`
  - Web UI menu item on legacy session rows
  - Auto hapi.db backup before any bulk run

Closes tiann#824

Refused-by-default cases:
  - non-cursor flavor; already on ACP; missing on-disk store;
    pre-existing ACP target dir; lifecycleState=running (override with
    --force-archive-running, which archives the session first and waits
    for the launcher to release the store.db lock).

Verified end-to-end:
  - 25 unit tests covering every refusal and rollback branch (mocked
    `agent acp` probe + mocked metadata update)
  - 3 integration tests spawning real `agent acp` (opt-in via
    `CURSOR_AGENT_INTEGRATION=1`); one drives a synthetic store, two
    accept operator-supplied fodder uuids via env so real on-disk
    stores can be exercised without committing private fixtures.
…rict TS

CI typecheck (stricter than the local Bun-resolved tsc) flagged accesses
to ok-success fields (acpSessionId, sourceRemoved, etc.) as unsafe
because the success branch only had `dryRun?: false` while the dry-run
branch had `dryRun: true`. Making `dryRun: false` REQUIRED on the
production-success branch turns the union into a hard discriminator and
lets call sites narrow with `if (out.ok && !out.dryRun) {...}`.

No runtime change to the migrator's behavior - production-success
already returned the full payload; the test assertions just couldn't
prove it to tsc.

Also wires the explicit `dryRun: false` into the migrator's success
return and updates the tests to use the now-discriminator-friendly
narrowing.
@heavygee heavygee marked this pull request as ready for review June 6, 2026 20:33
@heavygee heavygee marked this pull request as draft June 6, 2026 20:39
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2e3d0a18fc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread hub/src/cursor/cursorLegacyMigrator.ts
Comment thread hub/src/cursor/cursorLegacyMigrator.ts Outdated
Comment thread web/src/api/client.ts
@heavygee heavygee marked this pull request as ready for review June 6, 2026 20:39
Self-cold-review pass: tryRm(dirname(legacy.storeDbPath)) with
recursive=true would silently destroy any non-store.db artifact a
future cursor-agent version might drop into the chat dir. Use
rmdirSync (only succeeds on empty dirs) so unknown files survive.

In practice today only store.db, store.db-wal, store.db-shm ever live
there (verified against operator's fodder), so behavior is unchanged
on current cursor-agent; this is a defense-in-depth tweak.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a731c1ec55

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread hub/src/cursor/cursorLegacyMigrator.ts Outdated
Comment thread hub/src/cursor/cursorLegacyMigrator.ts Outdated
Comment thread hub/src/cursor/cursorLegacyMigrator.ts Outdated
Comment thread hub/src/sync/syncEngine.ts Outdated
Comment thread hub/src/cursor/cursorLegacyMigrator.ts Outdated
heavygee added 2 commits June 6, 2026 21:50
…, P2 401-refresh)

P1 (WAL preservation) — Without checkpoint, un-checkpointed transactions in
the legacy store.db-wal sibling would either appear stale in the ACP target
or be lost outright when the cleanup step removes the WAL file. Added a
PRAGMA wal_checkpoint(TRUNCATE) flush of the legacy store before any cp,
inside the same pre-flight window where the launcher has been confirmed
quiescent. Idempotent on non-WAL stores (no-op).

P2 (skipVerify must still run session/load) — The schema docstring
described skipVerify as skipping the prompt step. The implementation was
bypassing the entire verify block, which would have let a corrupted
transplant flip HAPI metadata. Split the verify into 'always-load,
optionally-prompt'; skipVerify now skips ONLY the prompt. Tests updated:
the existing skipVerify-happy-path expectation rewritten + a new test
verifies skipVerify still refuses on session/load failure.

P2 (web client 401 refresh) — migrateCursorSessionToAcp now triggers the
same onUnauthorized refresh dance as the shared request() helper instead
of hard-failing on an expired JWT. Matches existing session-action UX.
…ues)

P1 — Lock release must do a real SQLite busy-probe, not just stat-poll.
  killSession kicks off cleanup asynchronously; an idle-but-not-yet-closed
  runner can let the stat-based size-stability heuristic report 'released'
  while still owning the file lock. Added sqliteLockHeldByOtherProcess()
  which opens the legacy store with BEGIN IMMEDIATE; SQLITE_BUSY is the
  authoritative signal that another process holds a write lock. The
  size-poll is retained as a secondary signal for partial writes.

P2 — preflightSession now refuses on session.active=true as well as
  lifecycleState==='running'. Some legacy rows may lack lifecycleState
  but still be active in the hub cache; the CLI/API paths relied on the
  lifecycle-only check, while the web UI already hid the button for
  active sessions. Now uniform across all entry points.

P2 — Atomic ACP target creation. Replaced the existsSync + mkdirSync
  (recursive: true) TOCTOU with an exclusive mkdirSync(recursive: false);
  EEXIST is caught and converted to a target_already_exists refusal. The
  rollback path only fires for the dir we exclusively created.

P2 — cursorSessionId basename validation. Reject anything that isn't
  [A-Za-z0-9_.-]+ before any path join, with explicit rejection of '.'
  and '..'. Prevents path-traversal in either the chats lookup or the
  acp-sessions target.

P2 — hapi.db backup now uses SQLite's serialize() so committed pages in
  hapi.db-wal are included. The raw cp from before could omit recent
  sessions or pre-migration metadata. Added Store.backupTo(destPath)
  exposing the consistent-snapshot path.

Tests:
  - session.active=true refusal
  - cursorSessionId path-traversal refusal (../escape, has/slash, .,
    .., has\backslash)
  - existing tests unchanged; full hub suite 299 pass.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2cbe6edb41

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread hub/src/cursor/cursorLegacyMigrator.ts Outdated
Comment thread hub/src/cursor/cursorLegacyMigrator.ts
Comment thread hub/src/cursor/acpVerifyProbe.ts
Comment thread hub/src/sync/syncEngine.ts Outdated
Comment thread cli/src/commands/cursorMigrate.ts Outdated
P1 — Lock release uses session.active poll as the primary signal.
  An archive() kicks off async cleanup; the closest hub-side proxy for
  "runner process has exited" is sessionCache.active flipping false (set
  when the runner socket disconnects). The migrator now polls that
  before falling back to the SQLite busy-probe + size-poll, then sleeps
  a 500ms grace after all three pass so any in-flight kernel syscall
  on the WAL/SHM can complete. SyncEngine injects the cache-poll fn.

P2 — Abort bulk migration when the hapi.db backup fails. The previous
  code logged + continued; without the backup, operators have no rescue
  handle if a metadata flip lands wrong. Now throws unless the DB is
  in-memory (test fixtures). API handlers wrap in try/catch and surface
  the error as HTTP 500 with the backup-failure message.

P2 — Windows agent-binary spawn. AcpVerifyProbe.start() now matches
  cursorAcpRemoteLauncher's shape with shell:true + windowsHide:true on
  win32, so the `agent.cmd` shim is reachable. Without this the verify
  probe fails with ENOENT even though normal Cursor ACP sessions work.

P2 — Bulk CLI timeout removed. A finite 30-minute axios timeout could
  cut off a slow operator run mid-flight (verifyTimeoutMs is 120s per
  session by default; ~15 sessions hits the cap). The hub-side bulk
  operation is idempotent per session, so the request can safely block
  until completion. Operators that want to cap a run can split it with
  a smaller --lifecycle filter.

Tests: full hub suite 299 pass + integration (real agent acp) 3 pass
on the 34-msg fodder fixture. Typecheck clean across cli/hub/web.

Co-authored-by: Cursor <cursoragent@cursor.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@heavygee
Copy link
Copy Markdown
Owner Author

heavygee commented Jun 6, 2026

@codex review

Operator has topped up Codex credits after hitting the 5h Code Review cap. Current HEAD is 96578a55 (third round of Codex findings addressed) followed by:

  • a731c1ec fix(cursor): only rmdir empty parent on legacy source cleanup
  • 2e3d0a18 fix(cursor): tighten CursorMigrateOutcome dryRun discriminator
  • c0961375 feat(cursor): operator-driven legacy stream-json → ACP session migra…

Last automatic review verdict was on commit 2cbe6edb41 (2026-06-06T21:00:37Z) before the credit-cap kicked in - so the four commits above haven't been reviewed yet. Please pass over the current HEAD.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 96578a55b9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread hub/src/sync/syncEngine.ts Outdated
Comment thread hub/src/sync/syncEngine.ts Outdated
Comment thread hub/src/cursor/acpVerifyProbe.ts
P1 — Wait for real runner exit before transplanting. The previous
  awaitSessionInactive() poll was bogus: archiveSession() synchronously
  calls handleSessionEnd() which sets sessionCache.active=false, so any
  cache-based wait returns immediately. An idle Cursor process with no
  SQLite write lock could also pass the busy-probe instantly while
  still mid-SIGTERM-cleanup. Replaced with a real minimum dwell time
  (2s default) inside awaitLockRelease, gated on SQLite busy-probe +
  size-stability. Removed the SyncEngine-side awaitSessionInactive
  injection (it was the source of the false reassurance).

P1 — Honor the agent ACP active-process lock. Cursor's `agent` binary
  enforces single-instance semantics (see
  cli/src/agent/backends/acp/agentCliGuard.ts); spawning the verify
  probe while another `agent acp` is live can SIGTERM the live one and
  crash the operator's Cursor session. The migrator now reads the
  $HAPI_HOME/locks/agent-acp-active/pid file and refuses with a new
  'acp_transport_active' reason if the lock points to a live PID. The
  check is injectable for tests.

P2 — Exclude active sessions from archived bulk candidates. The default
  archived bulk path filtered only on lifecycleState==='running', but
  preflightSession also treats session.active===true as running for
  legacy rows that lack lifecycle metadata. Without mirroring that
  here, `hapi cursor migrate --all` would include an active session,
  migrateOne would refuse it 'running_refused', and the bulk run would
  exit as failed. Extracted the filter to a pure exported predicate
  `isLegacyCursorMigrationCandidate()` and added unit tests.

Tests: 35 pass (was 28) + integration 3 pass on the 34-msg fodder.
Typecheck clean across cli/hub/web.

Co-authored-by: Cursor <cursoragent@cursor.com>
@heavygee
Copy link
Copy Markdown
Owner Author

heavygee commented Jun 6, 2026

@codex review

Round 4: addresses the 3 P1/P2 findings from the previous Codex pass on commit 96578a5:

  • P1 syncEngine.ts:572 (real runner exit signal): removed bogus cache-based awaitSessionInactive; replaced with minimum-dwell + SQLite busy-probe + size-stability inside awaitLockRelease.
  • P2 syncEngine.ts:618 (active filter): extracted as exported predicate isLegacyCursorMigrationCandidate, added unit tests.
  • P1 acpVerifyProbe.ts:69 (ACP transport lock): migrator now reads $HAPI_HOME/locks/agent-acp-active/pid and refuses with new reason acp_transport_active when the lock holds a live PID.

HEAD is now eb446b3. 35 unit pass + 3 integration pass on the 34-msg fodder fixture.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eb446b391b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread hub/src/cursor/acpVerifyProbe.ts
Comment thread hub/src/cursor/cursorLegacyMigrator.ts
Comment thread hub/src/cursor/cursorLegacyMigrator.ts Outdated
Comment thread cli/src/commands/cursorMigrate.ts Outdated
Comment thread hub/src/store/index.ts Outdated
heavygee and others added 2 commits June 7, 2026 00:14
Co-authored-by: Cursor <cursoragent@cursor.com>
P1 — Block resumes while mutating the store. Between preflight and
  the destructive flip+rm, a session could be resumed via the existing
  hub paths. The resume reads cursorSessionProtocol from hapi.db (still
  'legacy' until the flip), spawns a stream-json runner against the
  legacy source — which we are about to remove. Added getCurrentSession
  dependency that the migrator calls right before the metadata flip;
  if it sees active=true / lifecycleState=running / a concurrent flip
  to acp, it rolls back the ACP placement and refuses with new reason
  'session_resumed_during_migrate' (or 'already_acp' for the concurrent
  flip case). SyncEngine injects a real impl reading the session cache.

P2 — Register ACP verify probes in the active lock. The probe now
  acquires the same $HAPI_HOME/locks/agent-acp-active/ lock the CLI
  guard uses, just before spawning agent acp, and releases it AFTER
  the SIGTERM in stop(). Without this, a concurrent
  `hapi cursor migrate` or model-list request could see no active
  ACP lock, spawn another agent acp, and the second spawn would
  SIGTERM this verify per Cursor's single-instance enforcement
  (cli/src/agent/backends/acp/agentCliGuard.ts). New `hapiHome`
  option on AcpVerifyProbe lets tests pin the lock to an isolated
  dir so they don't clobber the operator's real lock.

P2 — Override Windows home variables for the verify spawn. HOME alone
  is insufficient on Windows; the agent binary may resolve via
  USERPROFILE / HOMEDRIVE / HOMEPATH and end up reading the operator's
  real .cursor tree. The probe env now sets all three on win32.

P2 — Reject single-session --dry-run. The CLI accepted --dry-run for
  per-session use but the per-session API has no dryRun field, so the
  flag was silently dropped and the real migration ran. Now throws a
  clear error instructing the operator to use --keep-source or the
  bulk path.

P2 — Preserve private permissions on hapi.db backups. Store.backupTo()
  now passes mode 0o600 to writeFileSync and chmods after as
  belt-and-braces. Without this the rescue copy (which contains
  session transcripts) could land world-readable under a permissive
  umask, even though the live hapi.db is locked down to 0600.

Tests: 37 pass (was 35) — added rollback test for resume-mid-migration
and concurrent-flip-to-acp scenarios. Integration 3 pass on the 34-msg
fodder fixture; integration tests now use isolated $HAPI_HOME to avoid
touching the operator's real ACP lock. Typecheck clean.

Co-authored-by: Cursor <cursoragent@cursor.com>
@heavygee
Copy link
Copy Markdown
Owner Author

heavygee commented Jun 6, 2026

@codex review

Round 5: addresses the 5 P1/P2 findings from the previous Codex pass on commit eb446b3:

  • P1 resume-race: added getCurrentSession recheck before metadata flip; new reason session_resumed_during_migrate.
  • P2 ACP active lock: AcpVerifyProbe now acquires/releases the same agent-acp-active lock the CLI guard uses.
  • P2 Windows home: verifyInTempHome overrides USERPROFILE / HOMEDRIVE / HOMEPATH on win32.
  • P2 single-session dry-run: CLI now throws on --dry-run without --all/--list.
  • P2 backup perms: Store.backupTo() writes with mode 0o600 + chmods after.

HEAD is now be7696a. 37 unit pass (+2 rollback tests) + 3 integration pass.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: be7696a857

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread hub/src/cursor/acpVerifyProbe.ts Outdated
Comment thread hub/src/sync/syncEngine.ts Outdated
P2 v2 — Atomic agent-acp-active lock acquisition. The previous
  mkdirSync(recursive:true) + writeFileSync was not race-safe: two
  concurrent migrations could both pass isAgentAcpTransportActive()
  and then BOTH 'acquire' the lock by writing their pid in succession.
  Now uses mkdirSync(recursive:false) which is atomic — exactly one
  caller wins on EEXIST contention. Stale locks (dead pid) are
  cleared and retried once. start() throws a typed error on race
  loss; verifyInTempHome catches it and the migrator surfaces it as
  the acp_transport_active refusal so the operator sees the
  authoritative reason. Added 4 unit tests for the lock primitive
  (acquire / refuse-when-held / stale-cleanup / no-clobber-on-stop).

P1 v2 — Authoritative active-check inside the metadata flip. The
  migrator's getCurrentSession recheck (round 5) was best-effort: a
  resume landing AFTER the recheck but BEFORE the DB update would
  still slip through. Moved the same check INSIDE
  flipCursorSessionProtocolToAcp so the read-active and write-protocol
  pair are within a single synchronous Bun call (no JS interleaving).
  Added new flip result 'session-active' + corresponding
  UpdateAfterMigrateResult variant + corresponding refusal mapping
  back to 'session_resumed_during_migrate'. Added a unit test that
  drives the path with updateOverride returning session_active.

Tests: 313 unit pass (was 308; +4 probe lock, +1 atomic flip) plus 3
integration on the 34-msg fodder fixture. Typecheck clean across
cli/hub/web.

Co-authored-by: Cursor <cursoragent@cursor.com>
@heavygee
Copy link
Copy Markdown
Owner Author

heavygee commented Jun 6, 2026

@codex review

Round 6: addresses the 2 P1/P2 findings from previous pass on be7696a:

  • P2 atomic lock: mkdirSync(recursive:false) for race-safe acquire; throws on EEXIST + live holder; 4 new unit tests for the lock primitive.
  • P1 atomic flip: active-check moved INSIDE flipCursorSessionProtocolToAcp; new 'session-active' result type; rolled-back via session_resumed_during_migrate refusal.

HEAD is now a6908df. 313 unit pass + 3 integration pass on 34-msg fodder fixture.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a6908df576

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread hub/src/cursor/cursorLegacyMigrator.ts Outdated
Comment thread hub/src/cursor/cursorLegacyMigrator.ts Outdated
Comment thread hub/src/cursor/acpVerifyProbe.ts Outdated
P1 v3 — Detect resume-then-exit during the migration window. The
  active-flag rechecks (both the migrator's getCurrentSession and the
  flip-time atomic check) only catch resumes that are STILL active at
  the moment of the check. A brief resume that came in, wrote turns
  to store.db/store.db-wal, and exited before the check would slip
  through — the store.db would have diverged from the snapshot we
  copied to the ACP target. Added an mtime+size fingerprint of the
  legacy store.db captured right after the WAL checkpoint; compared
  immediately before the metadata flip and rolled back with new
  reason 'legacy_store_modified_during_migrate' on any divergence.

P2 — Respect metadata.homeDir + metadata.host. Hub processes can run
  under a different account than the user who created the Cursor
  session (service-account / multi-machine setups). The legacy chats
  live under the recorded session-owner home, not the hub's. The
  migrator now prefers metadata.homeDir for resolving both the legacy
  source (~/.cursor/chats) and the ACP target (~/.cursor/acp-sessions),
  falling back to homeDir() when the field is absent. Cross-host
  sessions (metadata.host != local hostName) are refused with new
  reason 'cross_host_session' so we never touch a same-id store under
  the wrong user account. hostName is a new injectable dep with default
  process.env.HAPI_HOSTNAME || os.hostname().

P2 v3 — Don't clobber freshly-starting CLI ACP locks. The previous
  retry path treated 'lock dir exists but no pid file' as stale and
  cleared it — but the CLI guard's registerActiveAcpTransport does
  mkdir BEFORE writeFileSync, so a probe landing in that small window
  would delete the live caller's lock. probeLockHolder now returns
  'starting' for the pidless case, which falls through to the refuse-
  path. Only a present-but-dead pid file is treated as stale.

Tests: 317 hub pass (was 313; +cross-host refusal, +fingerprint
divergence rollback, +homeDir resolution, +mid-startup probe lock
refusal). Integration 3 pass on the 34-msg fodder fixture (with
injected hostName).

Co-authored-by: Cursor <cursoragent@cursor.com>
@heavygee
Copy link
Copy Markdown
Owner Author

heavygee commented Jun 7, 2026

@codex review

Round 7: addresses the 3 P1/P2 findings from pass on a6908df:

  • P1 v3 (resume-then-exit): fingerprint legacy store.db post-checkpoint, compare pre-flip; new reason legacy_store_modified_during_migrate.
  • P2 (homeDir): prefer metadata.homeDir over hub HOME; new cross_host_session refusal via hostName dep.
  • P2 v3 (probe lock): distinguish 'dead' from 'starting' (pidless mid-startup); refuse instead of clearing.

HEAD is now 8145b73. 317 hub pass (+4 new tests) + 3 integration pass.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8145b73551

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread hub/src/cursor/cursorLegacyMigrator.ts Outdated
Comment thread hub/src/cursor/cursorLegacyMigrator.ts
Comment thread hub/src/cursor/cursorLegacyMigrator.ts Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5010bd77b9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread hub/src/cursor/acpVerifyProbe.ts Outdated
Comment thread hub/src/cursor/acpVerifyProbe.ts
Six P2/P3 findings from chatgpt-codex-connector on commit 5010bd7:

1. P3 — Treat partially-written pid files as active (cli/src/agent/backends/acp/agentCliGuard.ts)
   The previous round's pid-less check only protected the "no file" case.
   An empty/malformed pid file (mid-write window between mkdirSync and
   writeFileSync flush) would fall through to readLockPid() returning
   null and removeAcpLockDir() would clobber the live holder. Add
   probeLockPidFile() that returns missing | starting | live | dead.
   Only the 'dead' state clears the lock.

2. P2 — Clear stale running lifecycle during ACP flip (hub/src/sync/syncEngine.ts)
   When migrating a stale-running row (active=false, lifecycleState=
   running) the flip used to preserve all old metadata and only add
   cursorSessionProtocol:'acp'. Stale lifecycle survived and downstream
   code keyed on metadata.lifecycleState would keep treating the
   archived ACP session as live. Set lifecycleState:'archived' when
   the old value was 'running'.

3. P2 — Kill Windows process tree before releasing ACP lock (hub/src/cursor/acpVerifyProbe.ts)
   On Windows we spawn 'agent acp' through a shell (shell: true), so
   proc.kill only signals the wrapper. The agent child can survive and
   the next verifier acquires the freed lock + spawns a second agent
   while ours is still running. Use 'taskkill /F /T /PID' to kill the
   tree on win32 with SIGTERM fallback.

4. P2 — Reserve ACP lock before archiving (hub/src/cursor/cursorLegacyMigrator.ts)
   isAgentAcpTransportActive() was a point-in-time check; the actual
   lock acquisition didn't happen until verifyInTempHome much later.
   In the gap another agent acp could start and a --force-archive
   migration would kill the legacy session before refusing. Extract
   tryAcquireAcpActiveLock() as a free helper, acquire it in
   migrateOne BEFORE archive, hold across the entire mutation window,
   release in finally. Verify probe gets skipLockAcquire:true so it
   inherits the caller's hold instead of double-acquiring.

5. P2 — Refuse when ACP lock cannot be claimed (hub/src/cursor/acpVerifyProbe.ts)
   Non-EEXIST mkdir errors (permission denied, read-only fs, disk
   full) used to fall through to lockHeld=false and start() spawned
   agent acp UNGUARDED. Throw instead so verifyInTempHome surfaces
   the refusal cleanly.

6. P2 — Remove pid-less locks owned by the probe (hub/src/cursor/acpVerifyProbe.ts)
   If we successfully mkdir'd the lock dir but pid write failed, the
   old releaseLock() skipped removal because the pid file was absent.
   With the CLI guard now treating pid-less dirs as 'starting' (active),
   that left a permanently-stuck lock. We own the mkdir; release()
   must rmdir even when pidless. Only skip removal when the pid file
   contains a DIFFERENT valid pid (another holder somehow took over).

Tests:
- hub/cursor: 50 unit + 2 new lifecycle tests (lock acquired/released
  on success and on mid-flow refusal)
- hub/probe: 12 unit tests including new tryAcquireAcpActiveLock helper,
  skipLockAcquire bypass behavior, pidless-owned release
- cli/guard: 7 tests including empty pid file + malformed pid file
  treated as starting (mid-write race)
- integration: 3 pass with injected acquireAcpActiveLock pointing at
  tmp dir (isolates from operator's live HAPI_HOME lock)

Typecheck clean across hub/cli/web. Total: 335 hub tests + 7 cli +
3 integration pass; 0 unresolved threads.

Co-authored-by: Cursor <cursoragent@cursor.com>
@heavygee
Copy link
Copy Markdown
Owner Author

heavygee commented Jun 7, 2026

@codex review

HEAD is now 5899a00. Addressed all 6 round-11 findings:

  1. P3 — pid-less / partially-written pid file detection: probeLockPidFile() distinguishes missing | starting | live | dead. Only 'dead' clears the lock.
  2. P2 — Stale lifecycleState='running' is now cleared alongside the ACP flip.
  3. P2 — Windows process tree kill via taskkill /F /T /PID before releasing lock.
  4. P2 — ACP lock is reserved BEFORE archive in migrateOne and held across the entire mutation window. Verify probe inherits via skipLockAcquire:true.
  5. P2 — Non-EEXIST mkdir failures now THROW; verifyInTempHome refuses cleanly instead of running unguarded.
  6. P2 — releaseLock removes the lock dir even when pid file is missing (probe owns the mkdir).

335 hub + 7 cli guard + 3 integration tests pass. Typecheck clean across hub/cli/web. 0 unresolved threads. Looking for a clean pass.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5899a00870

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread hub/src/cursor/cursorLegacyMigrator.ts
Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fc7b2e15af

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread hub/src/cursor/cursorLegacyMigrator.ts
Two P2 findings from chatgpt-codex-connector on commit fc7b2e1:

1. P2 — Make real ACP spawns honor the migration lock
   (cli/src/agent/backends/acp/AcpStdioTransport.ts + agentCliGuard.ts)

   When the hub-side migrator held the agent-acp-active lock, a normal
   Cursor ACP resume STILL spawned `agent acp`: registerActiveAcpTransport()
   used mkdirSync(recursive:true), which silently succeeds on an existing
   dir and overwrote the migrator's pid file. Worse, AcpStdioTransport
   spawned agent FIRST and called register SECOND, so even an atomic
   register couldn't prevent the racy spawn.

   - registerActiveAcpTransport() now uses mkdirSync(recursive:false)
     and throws AcpLockHeldError when another live (or mid-startup)
     process holds the lock. Stale (dead-pid) locks are still cleared
     and retried once. In-process refcount is preserved so the same
     CLI process can re-enter without re-claiming the on-disk lock.
   - AcpStdioTransport reordered: register FIRST, spawn SECOND. On
     spawn failure the register is released. AcpLockHeldError now
     propagates to the caller so resume/start refuses cleanly instead
     of spawning an agent that will get killed by Cursor's single-
     instance enforcement.
   - unregisterActiveAcpTransport() now only removes the lock dir when
     the pid file confirms we own it (or no pid file exists, which
     means we own the mkdir-but-pid-write-failed case).

2. P2 — Reject WAL reappearance before setting the baseline
   (hub/src/cursor/cursorLegacyMigrator.ts)

   PRAGMA wal_checkpoint(TRUNCATE) zeros store.db-wal. The post-
   checkpoint fingerprint captured whatever was there at the time;
   if a legacy process resumed and wrote a turn between checkpoint
   return and our stat(), the WAL grew above zero and became the
   baseline. Since the verify-copy and ACP-placement copy only
   store.db, those frames would never transplant — and the post-
   fingerprint mtime/size comparison would still match, allowing
   cleanup to delete the legacy WAL with frames lost.

   Added a post-checkpoint stat: if store.db-wal exists with size
   > 0, refuse the migration with legacy_store_modified_during_migrate
   ('between checkpoint and fingerprint capture'). Extracted
   checkpointLegacyStore as an injectable dep so the unit test can
   simulate a writer landing in the gap.

Tests:
- hub/cursor: 53 unit tests, including new 'refuses early when WAL
  has content immediately after the checkpoint' test that injects a
  checkpoint stub which leaves 4 bytes in -wal.
- cli/guard: 11 tests, including 4 new ones for
  registerActiveAcpTransport throws-on-held, refuses-on-pidless,
  in-process-refcount-reuse, and stale-lock-clearing.
- cli/vitest: 944 tests (full suite), all green.
- integration: 3 pass with real agent acp + injected lock target.
Typecheck clean across hub/cli/web. 0 unresolved threads.

Co-authored-by: Cursor <cursoragent@cursor.com>
@heavygee
Copy link
Copy Markdown
Owner Author

heavygee commented Jun 7, 2026

@codex review

HEAD is now cb614b6. Round 12 addressed both P2 findings:

  1. CLI ACP spawns now honor the migration lock: registerActiveAcpTransport uses atomic mkdirSync(recursive:false) + throws AcpLockHeldError; AcpStdioTransport claims the lock BEFORE spawn; in-process refcount preserved.
  2. Post-checkpoint WAL size check refuses with legacy_store_modified_during_migrate when WAL grows in the checkpoint→fingerprint window.

336 hub + 944 cli (vitest) + 11 cli (bun:test) + 3 integration tests pass. Typecheck clean. 0 unresolved threads. Looking for a clean pass.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. You're on a roll.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Operator feedback: the operator-driven "Migrate to ACP" button and its
acp_transport_active error dialog leak implementation details to users
who never agreed to the cursor-json->ACP protocol switch in the first
place. Migration is the debt WE owe for changing the underlying protocol;
the user must not be made aware of cursorSessionProtocol, single-instance
agent-acp locks, or any of the rest of it.

Pivot to **invisible, automatic, per-session migration on first open** of
a legacy stream-json session, with three coordinated changes:

1. Rip the UI surface. Remove the "Migrate to ACP" menu entry,
   ConfirmDialog flow, MigrateIcon, the matching state + handler in
   SessionHeader, and the en/zh-CN locale strings. The REST endpoint
   POST /api/sessions/:id/migrate-to-acp stays as an operator escape
   hatch; the API client method also stays.

2. Isolate the verify probe's HAPI_HOME (tiann#824 follow-up).
   verifyInTempHome now overrides HAPI_HOME (in addition to HOME) so
   the spawned `agent acp` child registers its agent-acp-active lock
   under the per-verify temp dir, NOT under the host's real HAPI_HOME
   where other live transports (peer agents, IDE chats) already hold
   the lock. Without this isolation the auto-migration path on a busy
   machine would always refuse with acp_transport_active, defeating
   its own purpose. The default `acquireAcpActiveLock` is now a no-op
   grant; tests that need the conservative refusal can still inject
   `() => null`, and operator CLI/REST callers that want the pre-
   isolation behavior can inject `tryAcquireAcpActiveLock(home)`.

3. Wire sync-on-open in `SyncEngine.resumeSession`. New helper
   `maybeAutoMigrateLegacyCursorSession` runs the existing transplant
   migrator synchronously when a user opens an inactive legacy
   stream-json cursor session. The user sees the regular session-
   loading state for ~3-5s longer; the session opens as ACP and the
   next open is instant. Guards: env flag `HAPI_CURSOR_LEGACY_AUTO_
   MIGRATE` defaults on (set to `0`/`false`/`no`/`off` to disable);
   non-cursor + already-ACP + missing-cursorSessionId all bypass; if
   the host's agent-acp-active lock is held by a live PID the migration
   is also skipped (the post-migration ACP launcher would fail to
   spawn anyway, so we spare the user the wasted wait). All failures
   are soft — the legacy launcher continues to handle the open.

Tests:
- 10 new SyncEngine.maybeAutoMigrateLegacyCursorSession tests covering
  the full guard matrix (non-cursor, already-acp, no-id, env-flag-off,
  env-flag=false, live-lock-held, dead-lock, no-lock, soft-fail,
  unexpected-throw).
- 1 new migrator test asserting HOME==HAPI_HOME==fakeHome and
  HAPI_HOME != real HAPI_HOME on the verify probe's spawn env.
- All 54 existing migrator tests still green.
- All 350 hub tests + 944 cli + 841 web + 77 shared still green.

Closes-via-replace: the operator-facing manual button flow from earlier
revisions of this PR. tiann#824 issue body to be rewritten to
match.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0058220731

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread hub/src/sync/syncEngine.ts Outdated
Comment thread hub/src/cursor/cursorLegacyMigrator.ts
Comment thread hub/src/cursor/cursorLegacyMigrator.ts Outdated
…edback

swear01 pushback on tiann#824 flagged "three surfaces is too much for
a remediation PR". This commit concedes the bulk path entirely so the PR
focuses on the load-bearing core for tiann#784 safety remediation.

Removed:
- CLI: `hapi cursor migrate` subcommand (cursorMigrate.ts + dispatcher entry)
- Hub: `POST /api/cursor/migrate-legacy` (createCursorRoutes) + the CLI-mirror
  `POST /cli/cursor/migrate-legacy` + `GET /cli/cursor/legacy-candidates` routes
- SyncEngine: `bulkMigrateLegacyCursorSessions`, `backupHapiDb`,
  `getLegacyCursorSessionCandidates`, and the now-unused
  `isLegacyCursorMigrationCandidate` export
- Store: `backupTo()` (only used by the bulk pre-flight backup)
- Migrator: `dryRun` option (only the bulk path used it for previews)
- Types: `CursorMigrateBulkRequest{,Schema}`, `CursorMigrateBulkResult`,
  and the `dryRun: true` variant of `CursorMigrateOutcome`

Retained (the safety-critical core):
- Transplant migrator + verify + rollback machinery
- SyncEngine.maybeAutoMigrateLegacyCursorSession sync-on-open trigger
  (invisible auto-migration the first time a user opens a legacy session)
- `POST /api/sessions/:id/migrate-to-acp` (per-session operator escape hatch)
- HAPI_CURSOR_LEGACY_AUTO_MIGRATE env kill-switch

Tests: 338 hub tests pass (3 integration skipped without LEGACY_FODDER_UUID).
Typecheck clean on hub + cli. ~650-line reduction in PR surface.

Hold for swear01 review before pushing.

Co-authored-by: Cursor <cursoragent@cursor.com>
@heavygee
Copy link
Copy Markdown
Owner Author

heavygee commented Jun 7, 2026

@codex review

Scope-strip pass: dropped the bulk migration surface (CLI subcommand, bulk REST endpoints, SyncEngine.bulkMigrateLegacyCursorSessions, backupHapiDb, Store.backupTo, dryRun machinery, CursorMigrateBulk* types). +10/-663 lines. Retained the safety-critical core: transplant migrator + verify + rollback, maybeAutoMigrateLegacyCursorSession sync-on-open trigger, per-session /migrate-to-acp REST endpoint, HAPI_CURSOR_LEGACY_AUTO_MIGRATE kill-switch.

Rationale: upstream maintainer pushback on tiann#824 flagged the 3-surface scope as too costly for a remediation PR. This narrows the diff to the load-bearing path.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d26739a7d3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread hub/src/sync/syncEngine.ts
Comment thread hub/src/sync/syncEngine.ts Outdated
heavygee and others added 3 commits June 7, 2026 16:22
…nn#835

Revert agentCliGuard.ts + AcpStdioTransport.ts + agentCliGuard.test.ts to
upstream/main. Strip the agent-acp-active lock pre-check from
maybeAutoMigrateLegacyCursorSession + simplify its test suite to match.

Per tiann#832 (filed by swear01), the assumption underpinning my
v8 strict-mkdirSync refusal — that Cursor's `agent` binary SIGTERMs a
second `agent acp` process on the same host — is incorrect. Two
`agent acp` processes coexist fine. The lock's actual job is narrower:
prevent `agent --list-models` from killing an active `agent acp` child.

swear01's open PR tiann#835 ("fix(cursor): merge SKU catalog under ACP lock
and refcount agent guard") refactors agent-acp-active from a
single-PID file to a cross-process refcount with live-pid
reconciliation. We expect tiann#835 to land before tiann#824 (this PR) and have
added swear01's three open ACP mop-up PRs (tiann#835/tiann#836/tiann#837) to the
local driver-manifest soup so the migrator is exercised against the
eventual upstream shape.

Net effect on tiann#824:
  * agent-acp-active lock concerns are someone else's problem now
    (the refcount lock in tiann#835 just works for our migrator flow).
  * Verify probe still runs in isolated HAPI_HOME (unchanged) — that
    isolation is independent of the host-side lock semantics and
    remains the correct safety boundary for the probe.
  * Auto-migrate trigger in resumeSession is unchanged behaviourally;
    only the now-obsolete lock pre-check is gone.

53/53 unit tests pass (cursorLegacyMigrator + syncEngineAutoMigrate).

Co-authored-by: Cursor <cursoragent@cursor.com>
Orchestrator caught a gap: the helper had zero observability. Every
early-return path AND the success path emitted nothing. Only soft-fail
and unexpected-throw logged. End result: operator couldn't distinguish
"helper fired and skipped at guard X" from "helper fired and migrated
successfully" from "helper never called at all".

Add five log points per session:
  [auto-migrate] considering       — unconditional entry log
  [auto-migrate] skipped: <reason> — one per guard (env, flavor, proto, cursorId)
  [auto-migrate] starting transplant — all guards passed
  [auto-migrate] success           — migration completed (was previously silent)
  [auto-migrate] legacy ... left as stream-json — soft fail (existed)
  [auto-migrate] unexpected error  — throw (existed)

The 8 syncEngineAutoMigrate tests still pass.

Surfacing this gap is also a process lesson: I declared "soup live, go
dogfood" without confirming the helper actually fires on the live hub
resume path. Receipts now possible the moment the next hub restart
loads this commit.

Co-authored-by: Cursor <cursoragent@cursor.com>
Live dogfood on hapi-hub.service surfaced verify_load_failed soft-fail:
"initialize failed: Executable not found in $PATH: \"agent\"".

hapi-runner.service ships an explicit
Environment=PATH=$HOME/.local/bin:$HOME/.npm-global/bin:/usr/local/bin:
/usr/bin:/bin in its unit drop-in so its spawned cursor-agents find
the Cursor binary at ~/.local/bin/agent (the standard install location).
hapi-hub.service has no such drop-in and inherits systemd's minimal
PATH (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin),
so every hub-side `spawn('agent', ...)` ENOENTs.

Move the PATH augmentation into the verify probe itself: prepend
$HOME/.local/bin and $HOME/.npm-global/bin to baseEnv.PATH before
spawning. Now hub installs work out of the box without the operator
hand-tuning a systemd dropin.

Pin the behaviour with a regression test: stub `agent` binary in a
fake $HOME/.local/bin, baseline PATH=/usr/bin:/bin (no ~/.local/bin),
confirm the probe still finds and runs the stub (exit 99).

The migrator's safety contract was preserved end-to-end during the
dogfood: cp succeeded, verify failed, the soft-fail path fired and
the legacy session stayed exactly where it was. The fix unblocks
the verify step so migration can complete.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5d61dcf92f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread hub/src/cursor/acpVerifyProbe.ts Outdated
Comment thread hub/src/web/routes/sessions.ts Outdated
Comment thread hub/src/cursor/acpVerifyProbe.ts Outdated
…nv.HOME

Second dogfood attempt (same hub session, fresh fodder b52e3f51-...)
reproduced the same `Executable not found in $PATH: "agent"` failure
even with the previous PATH-augmentation commit (5d61dcf) loaded.

Root cause: my own fix had a bug. cursorLegacyMigrator passes a
HOME-overridden env to the probe for HAPI_HOME isolation (HOME=fakeHome,
HAPI_HOME=fakeHome, both pointing to /tmp/hapi-acp-verify-<token>).
The first patch derived bin paths from baseEnv.HOME — so on dogfood
runs it augmented PATH with /tmp/.../local/bin and /tmp/.../npm-global/bin,
both nonexistent. The real ~/.local/bin/agent stayed unfindable.

Fix: resolve the binary lookup against process.env.HOME (the hub
process's own home, set by systemd User= to the operator account).
The HOME override is for cache/state isolation — it doesn't move
the agent binary install location.

Test updates: pin both the systemd-PATH dogfood AND the HOME-override
isolation case. The new test spreads them: stub agent at
process.env.HOME/.local/bin/agent, override env HOME to a separate
tmpdir without .local/bin, assert spawn exit code = stub's 99.

Co-authored-by: Cursor <cursoragent@cursor.com>
@heavygee
Copy link
Copy Markdown
Owner Author

heavygee commented Jun 7, 2026

@codex review

Three new commits since the bot's last pass at 2cbe6edb41:

  • 3cc4c037 — observability inside maybeAutoMigrateLegacyCursorSession (5 log points, all auto-migrate failure/success modes now structured)
  • 5d61dcf9 — first attempt at PATH augmentation for the verify probe (used baseEnv.HOME, wrong)
  • 6c682fac — corrected PATH augmentation to use process.env.HOME, with a regression test that explicitly distinguishes the two HOMEs

Live dogfood receipts (4 successful auto-migrations against real archived legacy cursor sessions on the operator's box, including the operator's own 374MB jellybot-dev session) are in the updated PR body under 'Live dogfood on the soup (2026-06-07)'. The post-mortem on the 5d61dcf96c682fac chain is in 'Post-mortem: three follow-up commits after the original draft'.

Please focus the new pass on the verify-probe spawn changes in acpVerifyProbe.ts and the regression test additions in acpVerifyProbe.test.ts — those are the highest-churn surfaces since your last review. The auto-migrate logging is observability only and changes no behaviour.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6c682facc1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread hub/src/cursor/acpVerifyProbe.ts Outdated
Comment thread hub/src/cursor/acpVerifyProbe.ts
Comment thread hub/src/cursor/acpVerifyProbe.ts Outdated
Comment thread hub/src/sync/syncEngine.ts Outdated
F1 — acpVerifyProbe.ts: use path.delimiter for PATH joining
  Windows PATH entries are `;`-separated. Previous `:` join produced
  invalid combined entries on win32 (`C:\...\.local\bin:C:\Windows\...`)
  that cmd.exe treated as one segment.

F2 — acpVerifyProbe.ts: accept session-owner home via agentLookupHome
  Deployment shape where hub runs as a service account whose
  process.env.HOME differs from the human user who installed Cursor
  (recorded in `metadata.homeDir`). Migrator threads `this.deps.homeDir()`
  into the probe so the binary lookup uses the right home. Falls back
  to process.env.HOME when no explicit value supplied.

F3 — acpVerifyProbe.ts: preserve explicit options.env.PATH precedence
  Previous fix PREPENDED cursor-bin fallback — overrode a caller's
  intentionally pinned `agent`. Switch to APPEND so the existing PATH
  wins; fallback only kicks in when the existing PATH doesn't already
  contain `agent`. Still fixes the minimal-systemd-PATH dogfood case.

F4 — sessions.ts + cli.ts: reject malformed JSON on migrate-to-acp
  `c.req.json().catch(() => ({}))` silently converted parse errors
  into `{}` which passed Zod (all-optional schema) and ran the
  migration with destructive defaults. Distinguish "no body" from
  "malformed JSON" (return 400 on the latter).

F5 — syncEngine.ts auto-migrate helper: pass forceArchiveRunning: true
  Inactive rows with stale `lifecycleState=running` (orphaned by hub
  crashes that didn't finish the lifecycle transition) were being
  refused with `running_refused` and silently fell back to the legacy
  launcher forever. resumeSession's caller-side `if (active) return`
  guard already ensured no live runner — the stale lifecycle is
  exactly the kind of stuck row sync-on-open is meant to clean up.

Tests: 2 new tests pin precedence (F3) + path.delimiter (F1) via stubbed
competing agent binaries in priority vs fallback dirs. Existing live-
dogfood regression test extended to cover both explicit agentLookupHome
AND fallback to process.env.HOME. 339 hub tests pass, typecheck clean.

Co-authored-by: Cursor <cursoragent@cursor.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Codex review #34 P2 (round 13, finding 3369444056): on a multi-user
host with a default 022 umask and a non-private ~/.cursor tree, the
new ACP session directory was being created with the default 0o755
mode and the transplanted store.db inherited the source file's mode
bits (typically 0o644). Other local users could traverse the session
dir and read the transplanted transcript — even when the source store
and HAPI DB were private.

  - mkdirSync(acpSessionDir, { recursive: false, mode: 0o700 })
  - chmodSync(store.db, 0o600) after copyFileSync (copy inherits source)
  - writeFileSync(meta.json, ..., { mode: 0o600 })

339 hub tests pass, typecheck clean.

Co-authored-by: Cursor <cursoragent@cursor.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@heavygee
Copy link
Copy Markdown
Owner Author

heavygee commented Jun 7, 2026

@codex review

Round 13 (HEAD e6e8b9c3) addresses your 5 round-12 P2 findings + 1 stale permissions finding from round 1:

  • F1 acpVerifyProbe.ts: use path.delimiter for Windows compat
  • F2 acpVerifyProbe.ts: new agentLookupHome option; migrator threads this.deps.homeDir() for service-account-hub deployments
  • F3 acpVerifyProbe.ts: append (not prepend) the cursor-bin fallback so explicit options.env.PATH wins
  • F4 sessions.ts + cli.ts: distinguish empty body from malformed JSON on /migrate-to-acp (no more silent destructive defaults on truncated bodies)
  • F5 syncEngine.ts: pass forceArchiveRunning: true from the auto-migrate helper for stale active=false + lifecycle=running rows
  • Permissions cursorLegacyMigrator.ts: 0o700 on ACP session dir, 0o600 on store.db + meta.json (multi-user host hardening)

Two new tests pin the PATH-precedence (F3) and path.delimiter (F1) behaviour. 339 hub tests pass, typecheck clean. All 12 review threads (5 stale lock-pre-check moot since c1cee22e + your tiann#835 coordination) resolved.

Please review the latest commits:

  • 5c13a2bc round-13 P2 fixes
  • e6e8b9c3 private permissions on transplanted files

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Without an in-flight progress indicator the 15-20s dark wait while the
migrator transplants store.db, spawns agent acp, replays notifications,
and tears down the verify probe reads as "broken / nothing's happening"
to a fresh reviewer. Operator dogfood quote: "it looks like it's broken
and nothing is working and then suddenly it just refreshes the entire
session... it's not fantastic but it absolutely works". Shipping the
migrator without this banner means every first-time reviewer forms a
sticky bad first impression — the polish IS the feature surface.

Design (UX A++):

- Hub adds metadata.cursorMigrationState='in_progress' BEFORE the
  long-running transplant. The session-cache refresh emits the existing
  session-updated SSE event so the web client picks it up in
  milliseconds. No client-side polling needed; SSE is already real-time.

- Hub clears the flag in the SAME metadata write that flips
  cursorSessionProtocol to 'acp' on success. A single SSE event carries
  both transitions, so the banner disappears in the same render tick
  the chat re-renders as ACP — no flicker window.

- Hub clears the flag explicitly in the auto-migrate helper's finally
  on failure/exception so the banner never gets stuck if migration
  falls back to the legacy launcher.

- Web renders a thin accessible (role=status, aria-live=polite) banner
  with an indeterminate spinner and short explanation when the flag is
  set. No fake percentage — we don't have phase data and a fake
  progress bar would lie.

Files:

- shared/src/schemas.ts: add cursorMigrationState to MetadataSchema.
- hub/src/sync/syncEngine.ts:
  + flipCursorSessionProtocolToAcp drops cursorMigrationState in the
    same write that flips protocol to 'acp'.
  + maybeAutoMigrateLegacyCursorSession sets the flag before transplant;
    finally-clears on failure/exception only (success path delegates to
    the atomic flip writer).
  + setCursorMigrationStateInProgress / clearCursorMigrationState
    helpers handle the version-mismatch retry.
- hub/src/sync/syncEngineAutoMigrate.test.ts: 4 new tests pin flag
  transitions across failure / exception / success-no-clear /
  flip-atomic-clears paths.
- web/src/components/CursorMigrationBanner.tsx: new component.
- web/src/components/CursorMigrationBanner.test.tsx: 10 tests covering
  all visibility paths + a11y role/aria-live.
- web/src/components/SessionChat.tsx: render banner between
  SessionHeader and TeamPanel.
- web/src/lib/locales/en.ts + zh-CN.ts: copy strings.
- web/src/types/api.ts: re-export Metadata for component typing.

Hub 343 pass (was 339; +4), web banner 10/10, typecheck clean across
all three workspaces.

Co-authored-by: Cursor <cursoragent@cursor.com>
@heavygee
Copy link
Copy Markdown
Owner Author

heavygee commented Jun 7, 2026

@codex review

Round 14 (HEAD 35daa9f2) folds the in-flight upgrade banner into PR scope (UX A++).

  • Hub: metadata.cursorMigrationState='in_progress' set before transplant, cleared atomically in the same write that flips cursorSessionProtocol to 'acp' on success, finally-cleared on failure/exception. Uses the existing session-updated SSE event — no new event types.
  • Web: CursorMigrationBanner (accessible, indeterminate spinner) renders when the flag is set. Banner disappears in the same render tick the chat re-renders as ACP (no flicker window).
  • No client-side polling added — the SSE channel is already real-time.

New tests: hub 4 new (flag transitions across failure/exception/success-no-clear/flip-atomic-clears), web banner 10 (visibility paths + a11y).

Hub 343 pass (was 339; +4), web banner 10/10, typecheck clean across cli/web/hub. Note: bot may be at quota cap; if so this triggers a queued pass for when credits refill.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 35daa9f2eb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

// host's live ACP transports cannot block migration. Operator
// CLI/REST callers that want the pre-isolation conservative
// behavior can inject `() => tryAcquireAcpActiveLock(home)`.
acquireAcpActiveLock: deps.acquireAcpActiveLock ?? (() => ({ release() {} })),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Serialize production verify probes with a real ACP lock

In the SyncEngine path no acquireAcpActiveLock override is passed, so production uses this no-op handle while the default probe is also constructed with skipLockAcquire: true. If two legacy Cursor sessions are opened/migrated at about the same time, both verifiers can spawn agent acp concurrently; the existing ACP guard documents that Cursor's agent CLI is single-process and a second spawn can terminate the first, so one or both migrations can fail and fall back to legacy despite being otherwise migratable. Use a real shared lock here, or only skip the probe's internal lock when the caller actually acquired one.

Useful? React with 👍 / 👎.

Comment on lines +655 to +662
if (latest && !wasActive && latest.lifecycleState === 'running') {
// Lifecycle says running but active=false and we did NOT
// archive this session ourselves. Something external lifted
// it back to running between preflight and now (rare but
// possible if the session was preflight-archived and then
// a peer un-archived it via the lifecycle API).
tryRm(acpSessionDir)
return refusal(session.id, 'session_resumed_during_migrate', `session lifecycle became running during migration (lifecycleState=running, active=false, wasActive=false); rolled back ACP placement`, start, this.deps.now)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Let stale-running inactive rows reach the ACP flip

When an inactive legacy Cursor row still has metadata.lifecycleState === 'running' after a hub crash, the auto-migrate path passes forceArchiveRunning: true and intentionally skips the archive RPC because active is false, but this recheck later treats the same stale lifecycle value as a resume race and rolls back the ACP placement. In that stale-row case the session will silently fall back to the legacy launcher every time instead of being migrated and having the lifecycle cleared by flipCursorSessionProtocolToAcp.

Useful? React with 👍 / 👎.

// PATH. Thread the migrator's homeDir dep through so the
// probe doesn't fall back to process.env.HOME when a more
// specific lookup home is available.
agentLookupHome: this.deps.homeDir()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use the recorded owner home for agent binary lookup

For sessions with metadata.homeDir different from the hub process home, migrateOneWithLock correctly reads the legacy store under the recorded owner home, but the default verifier still augments PATH from this.deps.homeDir() rather than that recorded home. In service-account hub deployments where Cursor's agent binary is installed in the human user's ~/.local/bin, verification will fail with agent not found even though the store was located successfully; thread the chosen source home into the probe's agentLookupHome.

Useful? React with 👍 / 👎.

Comment thread shared/src/apiTypes.ts
export type RenameSessionRequest = z.infer<typeof RenameSessionRequestSchema>

/** Per-session legacy stream-json → ACP migrator request. See tiann/hapi#824. */
export const CursorMigrateToAcpRequestSchema = z.object({
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject unknown migration options before destructive defaults

Because this Zod object is not strict, unknown keys are silently stripped; with all fields optional, an operator typo such as {"keep_source": true} validates as {} and the migration proceeds with the destructive default that removes the legacy source after a successful flip. For this destructive endpoint, reject unrecognized keys so malformed-but-valid JSON cannot accidentally drop the safety option.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(cursor): incomplete cliModelSkus during ACP lock — variant picker shows raw wire params

1 participant