Skip to content

Pre-flight: sibling-host parity hook (link+fresh version+SHA) with auto-install option #183

Description

@NewGraphEnvironment

Context

The link R package (NewGraphEnvironment/link, v0.38.0) runs across multiple hosts: M4 (driver / orchestrator), M1 (sibling MacBook), and N DigitalOcean cypher droplets. The autonomous CLI (data-raw/wsgs_run_pipeline.sh) dispatches work in parallel across these hosts. Per-host work runs Rscript wsgs_run_host.R which depends on link + fresh R packages being installed and at compatible versions on every host.

Pre-flight version check at data-raw/wsgs_dispatch.sh:413-456 queries packageVersion("link") + packageVersion("fresh") across all hosts and fails-loud on mismatch. Operator's fix is always the same: Rscript -e 'pak::local_install(upgrade = FALSE, ask = FALSE)' on the lagging host(s).

Today on each PR merge the operator manually:

  1. pak::local_install on M4 (driver).
  2. ssh m1 'git fetch && git checkout main && git pull && pak::local_install' on M1.
  3. (Cyphers self-install via cypher_prep.sh during Step 4, OK.)

We hit this exact case twice during v0.38.0 work (M4 missed install after merge of v0.38.0 release; M1 was on a feature branch when it should have been on main). Both surfaced via the version-mismatch pre-flight error — good catch, but operator-side fix is friction.

Problem

  1. Frequent active development. Operator works on a branch on M4 → tests dispatch → expects M1 + cyphers to match. Active days have multiple branch checkouts. Each one needs M1 sync + reinstall before the next dispatch.
  2. Version match ≠ code match. packageVersion("link") == "0.38.0" on both M4 and M1 doesn't guarantee the source code matches — both could be at v0.38.0 but on different commits (e.g. M4 on a feature branch with patches not in main; M1 on main). Today's check is version-only, not SHA.
  3. Operator has to remember. The pre-flight error is informative but the fix lives outside the umbrella — operator types the same commands every time.

Goals

  1. Parity hook at pre-flight. Driver (M4) is source of truth. Sibling hosts (M1 + cyphers) must match: same packageVersion + commit SHA for link AND fresh.
  2. Auto-install option. When --auto-install is set (or stdin is non-TTY, i.e. background run), umbrella auto-installs on lagging hosts after detecting mismatch.
  3. TTY prompt. When operator is interactive: pre-flight reports mismatch, prompts [y]es / [n]o / [s]kip for each lagging host. Yes → run the install; no → abort; skip → proceed with the warning (--skip-preflight-like behavior, today's escape hatch).

Approach

Driver's source of truth

# At pre-flight start, capture the driver's authoritative state.
DRIVER_LINK_VER=$(Rscript -e 'cat(as.character(packageVersion("link")))')
DRIVER_FRESH_VER=$(Rscript -e 'cat(as.character(packageVersion("fresh")))')
DRIVER_LINK_SHA=$(Rscript -e 'cat(read.dcf(system.file("DESCRIPTION", package="link"), fields="RemoteSha"))' 2>/dev/null || echo "")
DRIVER_FRESH_SHA=$(Rscript -e 'cat(read.dcf(system.file("DESCRIPTION", package="fresh"), fields="RemoteSha"))' 2>/dev/null || echo "")

RemoteSha is set by pak when installing from a remote. For local checkouts via pak::local_install, RemoteSha may be empty — fall back to comparing git rev-parse HEAD on the package's source repo.

Sibling check

For each sibling host:

SIB_LINK_VER=$(ssh "$host" 'Rscript -e "cat(as.character(packageVersion(\"link\")))"')
SIB_LINK_SHA=$(ssh "$host" 'cd ~/Projects/repo/link && git rev-parse HEAD')

if [ "$SIB_LINK_VER" != "$DRIVER_LINK_VER" ] || [ "$SIB_LINK_SHA" != "$DRIVER_LINK_SHA" ]; then
  echo "  [parity] mismatch on $host: link ver=$SIB_LINK_VER sha=$SIB_LINK_SHA (driver: ver=$DRIVER_LINK_VER sha=$DRIVER_LINK_SHA)"
  parity_fail=1
fi

Auto-install decision

AUTO_INSTALL=0
[ ! -t 0 ] && AUTO_INSTALL=1   # non-TTY: auto-install by default
[ "$AUTO_INSTALL_FLAG" = "1" ] && AUTO_INSTALL=1

if [ "$parity_fail" = "1" ]; then
  if [ "$AUTO_INSTALL" = "1" ]; then
    echo "  [parity] auto-install enabled; running pak::local_install on lagging hosts"
    # for each mismatching host: ssh in, git fetch, checkout driver's branch + SHA, pak::local_install
  else
    # interactive prompt loop
    echo "  [parity] interactive mode; please choose action per host"
    # ...
  fi
fi

Branch sync

For sibling hosts to install from the same commit, they need to be on the same branch. Two scenarios:

  • Branch is on origin (already pushed): ssh $host 'git fetch && git checkout $DRIVER_BRANCH && git pull --ff-only'.
  • Branch is local-only on driver (operator is mid-PR, not pushed): warn loud, suggest git push -u origin $DRIVER_BRANCH or set --allow-unpushed flag (auto-pushes from driver before sibling sync).

Acceptance

  • data-raw/wsgs_run_pipeline.sh — new --auto-install flag; default-on when stdin is non-TTY.
  • Pre-flight parity check moves from data-raw/wsgs_dispatch.sh to umbrella's pre-flight block (or a shared helper script).
  • Compares packageVersion + git rev-parse HEAD for both link and fresh per sibling host.
  • On mismatch + auto-install: ssh's the host, fetches/checkouts driver's branch, runs pak::local_install. Re-checks.
  • On mismatch + interactive: prompt per host with [y]es/[n]o/[s]kip.
  • On mismatch + no auto-install + no TTY (e.g. cron): abort with clear FATAL message + the operator's manual-fix commands.
  • Unpushed-branch safety: detect when driver's branch is local-only; abort with --allow-unpushed opt-out (auto-pushes from driver before sibling sync).
  • Smoke tests:
    • Driver on main + sibling on main, both v0.38.0, same SHA → preflight passes silently.
    • Driver on feature-branch + sibling on main (different SHA) → mismatch detected; auto-install fires (or prompt fires); after install, parity restored.
    • Driver on local-only branch + --auto-install set → warn; --allow-unpushed set → push first; without it → FATAL.
  • NEWS entry under v0.39.0 (next patch).

References

  • v0.38.0 Phase 5 + Cypher integration tests for autonomous wrapper (1-cypher -> 3-cypher tiers) #178 Tier 1 surfaced this empirically (twice). Logs:
    • data-raw/logs/wsgs_run_pipeline/20260515_035053_wsgs_run_pipeline.log — M4 link=0.37.0 (operator forgot install after merge).
    • data-raw/logs/wsgs_run_pipeline/20260515_035809_prep_job1.log — M1 v0.38.0 ok but on wrong branch.
  • data-raw/wsgs_dispatch.sh:413-456 — current preflight version check.
  • data-raw/cypher_prep.sh — does the install for cyphers (the model to mirror for M4/M1).
  • data-raw/snapshot_bcfp.sh — IS baked into the umbrella's Step 1+2 (correct pattern; this issue extends the same approach to package installs).

Out of scope

  • Implementing pak::local_install itself (already works).
  • A general "deploy this branch to all hosts" tool (this is scoped to the dispatch pre-flight).
  • Step 0 / --force semantics — separate sibling issue.
  • state_clean.sh --tables= — separate sibling issue.
  • cypher_prep.sh pipefail masking — separate sibling issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions