Skip to content

Cheap access-only post-consolidate recompute (bulletproof cross-WSG mapping_code parity, efficiently) #205

Description

@NewGraphEnvironment

Problem

The #175 tunnel-free per-segment mapping_code parity needs a post-consolidate recompute: a WSG's accessibility (hence its mapping_code token1/token2) depends on whether a blocking barrier exists downstream — possibly in a different WSG (the provincial-accumulation property, RUNBOOK.md §5). When WSGs are distributed across machines, each machine only holds its own bucket's barriers while it runs, so a WSG's access is computed against an incomplete barrier set.

Caught 2026-05-25 (study-area run): FINA 75.5% / PARA 68.6% per-host → both 99%+ after re-modelling on the full consolidated barrier set. Drainage-closed + DS-first bucketing reduces but does not eliminate this (downstream barriers can be cross-bucket or arrive late in DS-first order). So the correct methodology is: distribute (any bucketing) → consolidate → recompute → compare.

The orchestrator has no recompute today because the provincial run compares habitat-km rollups, not per-segment mapping_code — this is new to #175.

Current state (link#175, study_area_run.sh)

The driver recomputes diverged WSGs via the full pipeline (wsg_run_one.R = lnk_pipeline_run(mapping_code=TRUE)). That works but is slow and architecturally wasteful: re-running the full pipeline on the dispatcher re-derives streams + streams_habitat (which are already correct + persisted) just to redo streams_access + streams_mapping_code. Recompute-ALL via full pipeline would defeat distribution (dispatcher redoes everything ≈ running it all single-host twice).

Ask: a cheap access-only recompute

Add a recompute path that redoes only lnk_pipeline_access + lnk_mapping_code for a WSG, reusing the consolidated persisted streams, streams_habitat_<sp>, and barriers — no full working-schema rebuild. Then post-consolidate recompute-ALL becomes cheap (~seconds/WSG) → bulletproof correctness regardless of machine count or WSG bucketing, efficiently. That is the durable methodology #175 is after.

Blocker (per the #175 Plan-agent review): lnk_pipeline_access currently needs observations + crossings working-schema artifacts (not persisted). Options:

lnk_mapping_code() (portable, schema-aware) already recomputes mapping_code from persist tables — the missing half is the persist-based access recompute.

Refs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions