Skip to content

lnk_persist_init: persist per-source dnstr flag columns in streams_access (mapping_code second-token NONE bug) #196

Description

@NewGraphEnvironment

Context

Post-#187 <persist_schema>.streams_access carries per-species access cols (has_barriers_<sp>_dnstr, access_<sp>) but drops the per-barrier-source flag cols that lnk_pipeline_mapping_code needs to classify the second mapping_code token:

  • has_barriers_anthropogenic_dnstr
  • has_barriers_pscis_dnstr
  • has_barriers_dams_dnstr
  • has_barriers_remediations_dnstr
  • dam_dnstr_ind (sequence-aware dam-detection; resident flavor)
  • remediated_dnstr_ind

These columns exist in working schema's streams_access (lnk_pipeline_access writes them when barrier_sources is passed) but #187 Phase 2 dropped them from cols_streams_access_base on (incorrect) "they're conditional on remediations/observations" reasoning. Actually conditional only on barrier_sources arg, which lnk_pipeline_run always passes.

Impact

lnk_mapping_code reads SELECT * FROM <persist_schema>.streams_access and finds the per-source columns absent → lnk_pipeline_mapping_code's has() helper returns FALSE → any_anth = any_pscis = any_dam = FALSE → second token defaults to NONE.

Empirically PARS BT 2026-05-19:

  • link: ACCESS;NONE 4412 rows, ACCESS;NONE;INTERMITTENT 7080 rows
  • bcfp: ACCESS;DAM 4411, ACCESS;DAM;INTERMITTENT 5457, ACCESS;MODELLED 1775, ACCESS;MODELLED;INTERMITTENT 3113, ACCESS;ASSESSED 900, ACCESS;ASSESSED;INTERMITTENT 1100

Match_pct vs bcfp drops from historic 98%+ to ~30-50%.

Design choices

Column-naming inconsistency (pre-existing; not introduced by #187):

  • has_barriers_<source>_dnstr (boolean prefix has_, suffix _dnstr)
  • dam_dnstr_ind / remediated_dnstr_ind (no has_ prefix, suffix _ind for "indicator")

Two patterns coexist. Could rename to one convention (e.g., all has_<thing>_dnstr or all <thing>_dnstr_ind), but that's a separate cleanup. Scope of this fix: add the columns with existing names so lnk_pipeline_mapping_code's column probes find them.

Where to put the persist DDL:

Picking B: future per-source extension (new barrier classes) lives in one place, mirrors the per-species helper pattern. Less coupling between "scalar base" and "source-driven".

Defaults: source classes are hardcoded today (anthropogenic/pscis/dams/remediations). #189-style data-driving deferred — would mean a parameters_barrier_sources.csv declaring which sources exist per bundle. Not in this hotfix.

Acceptance

  • R/lnk_persist_init.R: new .lnk_cols_streams_access_source_flags() helper returns named vector for has_barriers_<source>_dnstr (boolean, 4 sources) + dam_dnstr_ind (boolean) + remediated_dnstr_ind (boolean) — 6 cols total.
  • lnk_persist_init includes those cols in streams_access CREATE TABLE.
  • lnk_pipeline_persist's INSERT projection picks them up automatically (iterates names(access_cols_v)).
  • Live smoke on PARS post-fix: mapping_code_bt distribution shows ACCESS;DAM / ACCESS;MODELLED / ACCESS;ASSESSED tokens (not just ACCESS;NONE). PARS BT match_pct vs bcfp returns to ~98%.
  • v0.40.3 patch release.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions