Skip to content

fix: SF predictor bootstrap cp --remove-destination — survive root-owned stale predictor.yaml (config#1034)#423

Merged
cipher813 merged 1 commit into
mainfrom
fix/sf-bootstrap-cp-remove-destination
Jun 12, 2026
Merged

fix: SF predictor bootstrap cp --remove-destination — survive root-owned stale predictor.yaml (config#1034)#423
cipher813 merged 1 commit into
mainfrom
fix/sf-bootstrap-cp-remove-destination

Conversation

@cipher813

Copy link
Copy Markdown
Owner

What failed

Friday shell run friday-shell-2026-06-12-eod-… failed at Branch B / PredictorTraining bootstrap: cp: cannot create regular file '/home/ec2-user/alpha-engine-predictor/config/predictor.yaml': Permission denied.

Root cause: an ad-hoc SSM edit on 2026-06-07 (SSM runs as root by default; the edit lacked sudo -u ec2-user) left predictor.yaml + predictor.yaml.bak-260607-autopromote owned root:root 644. The SF bootstrap's sudo -u ec2-user cp can't open a root-owned 644 file for truncation, even though ec2-user owns the directory.

Fix

cp --remove-destination at both bootstrap cp sites (PredictorTraining + ModelZooRotation): unlink-then-create succeeds via directory ownership and leaves the new file owned by ec2-user — a stale root-owned dropping self-heals on the next run instead of killing the branch.

The frozen byte-identical fixture (tests/fixtures/sf_prekeystone_spot_commands.json) is regenerated per its documented procedure — this is a deliberate, reviewed change to the spot states' absent-path command.

Already done operationally (no PR possible — box state)

  • chown ec2-user:ec2-user on both files (verified).
  • Shell rerun friday-shell-2026-06-12-rerun-postchown launched to validate before the real Saturday 6/13 09:00 UTC run.

Deploy note

The live SF definition is deployed manually (infrastructure/deploy_step_function.sh) — merging this PR does not update the live state machine. Tomorrow's Saturday run is unblocked by the chown regardless; this lands the guard for the future at the next SF deploy.

Tracking: cipher813/alpha-engine-config#1034

Tests

Full suite green in worktree: 1977 passed, 2 skipped.

🤖 Generated with Claude Code

…ned stale predictor.yaml (config#1034)

The 2026-06-12 Friday shell run failed at PredictorTraining: an ad-hoc
SSM edit on 06-07 (run as root, no sudo -u ec2-user) left
alpha-engine-predictor/config/predictor.yaml root-owned on the
always-on box, and 'sudo -u ec2-user cp' cannot open a root-owned 644
file for truncation even though ec2-user owns the directory.

cp --remove-destination unlinks the target first (allowed via directory
ownership) and recreates it owned by ec2-user, so a stale root-owned
file self-heals on the next run instead of failing the branch. Applied
to both bootstrap cp sites (PredictorTraining + ModelZooRotation).

Frozen byte-identical fixture regenerated per its documented procedure
(deliberate, reviewed change to a spot state's absent-path command).

Box ownership was also fixed directly (chown); this change is the
structural guard against recurrence.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@cipher813 cipher813 merged commit 4ce8383 into main Jun 12, 2026
1 check passed
@cipher813 cipher813 deleted the fix/sf-bootstrap-cp-remove-destination branch June 12, 2026 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant