fix: spot_train.sh stages predictor.yaml to experiment-package path + pins ALPHA_ENGINE_EXPERIMENT_ID (config#1066)#267
Merged
Conversation
…th + pins ALPHA_ENGINE_EXPERIMENT_ID (config#1066) Companion to alpha-engine-data SF cp alignment + the predictor#265 fail-loud guard. The child spot is a BARE predictor clone with NO alpha-engine-config tree, so config.py's experiment-first search path (~/alpha-engine-config/experiments/$ALPHA_ENGINE_EXPERIMENT_ID/predictor/predictor.yaml) is absent on the spot and resolution silently falls through to config/predictor.yaml. That coincidence is the 6/13 inert-rotation fragility (MODEL_SPECS empty -> select_rotation_specs() -> [] -> 0 challengers trained). Fix (deterministic): - Pin ALPHA_ENGINE_EXPERIMENT_ID (default "reference", matching config.py) and EXPORT it into every spot heredoc that imports config (preflight / smoke / model-zoo / full-training), so config.py resolution is explicit, not reliant on the os.environ default + dir-existence coincidence. - Stage the S3-fetched yaml to BOTH the experiment-package path config.py searches FIRST AND config/predictor.yaml (legacy fallback). Both copies are byte-identical from the same staged source, so MODEL_SPECS populates deterministically and the spot now resolves via the SAME path config.py uses on the always-on box. bash 3.2 note: the run_ssm heredocs sit inside "$(cat <<'X' ... X)" command substitution and bash 3.2 scans even a quoted heredoc body for the closing paren, so the in-heredoc comments are kept free of parens and apostrophes. bash -n clean; full predictor suite green (1481 passed) incl. test_model_zoo + the test_spot_train_* battery. Refs config#1066, config#1051. Re-exam 2026-06-20 (next Saturday rotation must register >=1 spec-* challenger; predictor#265 alert fires until then). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Collaborator
Author
|
Companion data PR (SF cp alignment, auto-deploys on merge): nousergon/nousergon-data#427. Both must land for the next Saturday model-zoo rotation to train challengers. |
cipher813
added a commit
that referenced
this pull request
Jun 14, 2026
…lse-fails (config#1073) (#269) The deploy.sh canary invokes the freshly-built image with dry_run=true, which ran the full PredictorPreflight.run() -> check_deploy_drift, comparing the image's baked GIT_SHA against live origin/main HEAD. During a rapid merge burst, main can advance after a deploy's image is built but before its canary runs, so the canary trips the drift RuntimeError on a perfectly good image and the deploy false-fails (+ false flow-doctor ERROR page + false SNS canary-fail). Observed 2026-06-14: 3 PRs merged in ~70 min; #266's canary false-failed when main had just advanced to #267. No correctness/safety problem (the canary gate correctly never promoted a bad image), but the false pages erode alert-signal integrity in a fail-loud system. A dry_run canary writes no predictions and emails nothing, so drift-vs-main is the wrong invariant for it. Gate check_deploy_drift on `not dry_run`. Production (dry_run=false) inference and the SF DeployDriftCheck gate (action=check_deploy_drift -> run_for_drift_gate, runs daily pre-pipeline) are unchanged — real-run drift protection is fully preserved. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Companion to alpha-engine-data#427 (SF cp alignment) + the alpha-engine-predictor#265 fail-loud guard. Makes the model-zoo rotation child spot deterministically load a
predictor.yamlWITHmodel_specspopulated.Diagnosed mismatch
The child spot is a bare predictor clone with NO
alpha-engine-configtree, soconfig.py's experiment-first search path (~/alpha-engine-config/experiments/$ALPHA_ENGINE_EXPERIMENT_ID/predictor/predictor.yaml) is absent on the spot and resolution silently falls through toconfig/predictor.yaml. That coincidence is the 6/13 inert-rotation fragility (MODEL_SPECSempty →select_rotation_specs()→[]→ 0 challengers trained — config#1051).Fix (deterministic, no
config.pyedit)ALPHA_ENGINE_EXPERIMENT_ID(defaultreference, matchingconfig.py:_EXPERIMENT_ID) and EXPORT it into every spot heredoc that imports config (preflight / smoke / model-zoo / full-training), so resolution is explicit rather than relying on theos.environdefault + dir-existence coincidence.config/predictor.yaml(legacy fallback). Byte-identical copies from the same staged source →MODEL_SPECSpopulates deterministically, and the spot resolves via the SAME path config.py uses on the box.bash 3.2note: therun_ssmheredocs sit inside"$(cat <<'X' ... X)"command substitution and bash 3.2 scans even a quoted heredoc body for the closing paren — so the in-heredoc comments are kept free of parens and apostrophes (caught bybash -n).Tests
bash -nclean; full suite green (1481 passed) incl.test_model_zoo(31) + thetest_spot_train_*battery (37). No test asserts the exact staging paths.Companion PR
alpha-engine-data#427 (AUTO-DEPLOYS the SF on merge).
Refs config#1066, config#1051. Re-exam 2026-06-20 — next Saturday rotation must register ≥1
spec-*challenger (the #265 inert-rotation alert fires until then).🤖 Generated with Claude Code