Problem
The #175 tunnel-free per-segment mapping_code parity needs a post-consolidate recompute: a WSG's accessibility (hence its mapping_code token1/token2) depends on whether a blocking barrier exists downstream — possibly in a different WSG (the provincial-accumulation property, RUNBOOK.md §5). When WSGs are distributed across machines, each machine only holds its own bucket's barriers while it runs, so a WSG's access is computed against an incomplete barrier set.
Caught 2026-05-25 (study-area run): FINA 75.5% / PARA 68.6% per-host → both 99%+ after re-modelling on the full consolidated barrier set. Drainage-closed + DS-first bucketing reduces but does not eliminate this (downstream barriers can be cross-bucket or arrive late in DS-first order). So the correct methodology is: distribute (any bucketing) → consolidate → recompute → compare.
The orchestrator has no recompute today because the provincial run compares habitat-km rollups, not per-segment mapping_code — this is new to #175.
Current state (link#175, study_area_run.sh)
The driver recomputes diverged WSGs via the full pipeline (wsg_run_one.R = lnk_pipeline_run(mapping_code=TRUE)). That works but is slow and architecturally wasteful: re-running the full pipeline on the dispatcher re-derives streams + streams_habitat (which are already correct + persisted) just to redo streams_access + streams_mapping_code. Recompute-ALL via full pipeline would defeat distribution (dispatcher redoes everything ≈ running it all single-host twice).
Ask: a cheap access-only recompute
Add a recompute path that redoes only lnk_pipeline_access + lnk_mapping_code for a WSG, reusing the consolidated persisted streams, streams_habitat_<sp>, and barriers — no full working-schema rebuild. Then post-consolidate recompute-ALL becomes cheap (~seconds/WSG) → bulletproof correctness regardless of machine count or WSG bucketing, efficiently. That is the durable methodology #175 is after.
Blocker (per the #175 Plan-agent review): lnk_pipeline_access currently needs observations + crossings working-schema artifacts (not persisted). Options:
lnk_mapping_code() (portable, schema-aware) already recomputes mapping_code from persist tables — the missing half is the persist-based access recompute.
Refs
Problem
The #175 tunnel-free per-segment mapping_code parity needs a post-consolidate recompute: a WSG's accessibility (hence its
mapping_codetoken1/token2) depends on whether a blocking barrier exists downstream — possibly in a different WSG (the provincial-accumulation property,RUNBOOK.md§5). When WSGs are distributed across machines, each machine only holds its own bucket's barriers while it runs, so a WSG's access is computed against an incomplete barrier set.Caught 2026-05-25 (study-area run): FINA 75.5% / PARA 68.6% per-host → both 99%+ after re-modelling on the full consolidated barrier set. Drainage-closed + DS-first bucketing reduces but does not eliminate this (downstream barriers can be cross-bucket or arrive late in DS-first order). So the correct methodology is: distribute (any bucketing) → consolidate → recompute → compare.
The orchestrator has no recompute today because the provincial run compares habitat-km rollups, not per-segment mapping_code — this is new to #175.
Current state (link#175,
study_area_run.sh)The driver recomputes diverged WSGs via the full pipeline (
wsg_run_one.R=lnk_pipeline_run(mapping_code=TRUE)). That works but is slow and architecturally wasteful: re-running the full pipeline on the dispatcher re-derivesstreams+streams_habitat(which are already correct + persisted) just to redostreams_access+streams_mapping_code. Recompute-ALL via full pipeline would defeat distribution (dispatcher redoes everything ≈ running it all single-host twice).Ask: a cheap access-only recompute
Add a recompute path that redoes only
lnk_pipeline_access+lnk_mapping_codefor a WSG, reusing the consolidated persistedstreams,streams_habitat_<sp>, andbarriers— no full working-schema rebuild. Then post-consolidaterecompute-ALLbecomes cheap (~seconds/WSG) → bulletproof correctness regardless of machine count or WSG bucketing, efficiently. That is the durable methodology #175 is after.Blocker (per the #175 Plan-agent review):
lnk_pipeline_accesscurrently needsobservations+crossingsworking-schema artifacts (not persisted). Options:barriers_<sp>_accessviews are already persist-backed — audit what's actually missing), ORlnk_pipeline_accessvariant that reads its inputs from persist, ORlnk_mapping_code()(portable, schema-aware) already recomputes mapping_code from persist tables — the missing half is the persist-based access recompute.Refs
research/study_area_run.md), lnk_persist_init: persist per-source dnstr flag columns in streams_access (mapping_code second-token NONE bug) #196 (double-persist / pre-persist barriers), Persist+consolidate must be host-/species-count-agnostic: persist_init blind to species-column-set drift #204 (persist shape-drift), RUNBOOK.md §5.