Context
lnk_pipeline_mapping_code classifies the second token of mapping_code_<sp> via two hardcoded R case-when chains (R/lnk_pipeline_mapping_code.R:196-210), one each for resident and anadromous flavors:
mc_barrier_resident[any_remed] <- "REMEDIATED"
mc_barrier_resident[!set & any_dam_resident] <- "DAM"
mc_barrier_resident[!set & any_anth & any_pscis & !any_dam_resident] <- "ASSESSED"
mc_barrier_resident[!set & any_anth & !any_pscis & !any_dam_resident] <- "MODELLED"
mc_barrier_resident[!set & !any_anth] <- "NONE"
Inputs are also hardcoded — function probes specific column names in the access tibble (has_barriers_anthropogenic_dnstr, has_barriers_pscis_dnstr, has_barriers_dams_dnstr, dam_dnstr_ind, remediated_dnstr_ind).
What's locked
| Surface |
Hardcoded? |
Where |
Token vocabulary (DAM, MODELLED, ASSESSED, REMEDIATED, NONE) |
Yes |
Function body |
| Precedence order |
Yes |
Function body |
Condition expressions (any_anth & any_pscis & !any_dam) |
Yes |
Function body |
| Source-flag column names |
Yes |
has() probes |
| Residence-flavor chains (resident vs anadromous) |
Yes |
Two parallel chains in function body |
What's flex today
Why this matters
Adding a new token (e.g., FALLS for natural-barrier cases, CLOSURE for regional-management closures, SUBSURFACE for sub-surface) requires:
- Editing function body
- Coordinating across both residence chains
- Adding column-probe entries for new source flags
- Re-running full pipeline + parity
Same flavor as #189 (species residence hardcoded) — config-driven would let bundles override per-project.
Design
Naming convention
R-side: <noun>_<verb> per NGE convention. Function name stays lnk_pipeline_mapping_code (pre-existing, exported). Internal rule-evaluation helper: .lnk_rules_eval(rules, env) — verb-last, returns the per-row classification vector.
CSV: parameters_mapping_code_rules.csv per existing parameters_<noun>.csv convention (parameters_habitat_dimensions.csv, parameters_habitat_thresholds.csv, parameters_fresh.csv).
Param naming follows <type>_<role>:
rules arg on lnk_pipeline_mapping_code — accepts a tibble (loaded from CSV) or named list (programmatic override)
Rule shape
CSV columns:
flavor,precedence,token,when
resident,1,REMEDIATED,any_remed
resident,2,DAM,any_dam_resident
resident,3,ASSESSED,any_anth & any_pscis & !any_dam_resident
resident,4,MODELLED,any_anth & !any_pscis & !any_dam_resident
resident,5,NONE,!any_anth
anadromous,1,REMEDIATED,any_remed
anadromous,2,DAM,any_dam_anadr
anadromous,3,ASSESSED,any_pscis
anadromous,4,MODELLED,any_anth
anadromous,5,NONE,!any_anth
Abstract framing
Rules table is general-purpose. Same shape could drive future analogous classifications (e.g., habitat-class assignment, edge-type bucketing). The rules engine helper .lnk_rules_eval is the reusable primitive — give it a rules tibble + an environment with the bound variables (any_anth, any_pscis, etc.) → returns a length-N character vector.
Engine responsibilities:
- Sort rules by
(flavor, precedence)
- For each row in input data: walk rules in order, first
when that's TRUE assigns its token; subsequent rules are NA-only (don't overwrite)
- Validation:
lnk_rules_validate(rules) checks for duplicate (flavor, precedence), unparseable when expressions, missing tokens, unknown column references
Source-flag column names
Today's six columns (has_barriers_anthropogenic_dnstr, etc.) become the environment variables the when expressions reference. Rules library defines aliases:
any_anth := has_barriers_anthropogenic_dnstr
any_pscis := has_barriers_pscis_dnstr
any_dam_resident := dam_dnstr_ind OR has_barriers_dams_dnstr (fallback when sequence-aware unavailable)
any_dam_anadr := has_barriers_dams_dnstr
any_remed := remediated_dnstr_ind OR has_remediated_dnstr
Aliases declared in parameters_mapping_code_inputs.csv (or as an inputs: block in the rules file). Decouples user-facing rule expressions from the raw column names — column names can be renamed later (#189-flavor cleanup) without touching every rule's when.
Validation
lnk_rules_validate(rules, available_columns) runs before evaluation:
- Each
when parses as R (no syntax errors)
- All identifiers in
when are declared aliases or known columns
- Precedences within flavor are unique
- All declared flavors have a "fallthrough" rule (e.g. precedence-N
when = TRUE) — guarantees no NA tokens leak
Acceptance
Out of scope
- SQL-side translation: rules could in principle be compiled to SQL CASE WHEN, run as
INSERT...SELECT (no R round-trip, ~10× faster). Decision: stay R-side. 1–2 min provincial wall is acceptable; R-side keeps multi-row patterns / NULL semantics / unit-testability simpler. SQL is a follow-on if performance ever bites.
- Per-bundle column aliasing: aliases stay in the package's
parameters_mapping_code_inputs.csv for now; bundle-level alias override is a follow-up.
- Rule-evaluation backend choice (
eval(parse()) vs rlang::eval_tidy): implementation detail; pick at coding time.
References
Context
lnk_pipeline_mapping_codeclassifies the second token ofmapping_code_<sp>via two hardcoded R case-when chains (R/lnk_pipeline_mapping_code.R:196-210), one each for resident and anadromous flavors:Inputs are also hardcoded — function probes specific column names in the
accesstibble (has_barriers_anthropogenic_dnstr,has_barriers_pscis_dnstr,has_barriers_dams_dnstr,dam_dnstr_ind,remediated_dnstr_ind).What's locked
DAM,MODELLED,ASSESSED,REMEDIATED,NONE)any_anth & any_pscis & !any_dam)has()probesWhat's flex today
species_resident/species_anadromousargs (post-mapping_code build decoupled from tunnel — persist streams_access + lnk_pipeline_run phase + rename with_mapping_code → mapping_code #187 rename).has()helper degrades gracefully on missing columns (returns FALSE; doesn't error).Why this matters
Adding a new token (e.g.,
FALLSfor natural-barrier cases,CLOSUREfor regional-management closures,SUBSURFACEfor sub-surface) requires:Same flavor as #189 (species residence hardcoded) — config-driven would let bundles override per-project.
Design
Naming convention
R-side:
<noun>_<verb>per NGE convention. Function name stayslnk_pipeline_mapping_code(pre-existing, exported). Internal rule-evaluation helper:.lnk_rules_eval(rules, env)— verb-last, returns the per-row classification vector.CSV:
parameters_mapping_code_rules.csvper existingparameters_<noun>.csvconvention (parameters_habitat_dimensions.csv,parameters_habitat_thresholds.csv,parameters_fresh.csv).Param naming follows
<type>_<role>:rulesarg onlnk_pipeline_mapping_code— accepts a tibble (loaded from CSV) or named list (programmatic override)Rule shape
CSV columns:
flavor(char):resident/anadromous(extensible per Data-drive species residence (species_resident/anadromous/spawn_only) from dimensions.csv #189 work)precedence(int): order within a flavor; first matching winstoken(char): output token stringwhen(char): R expression evaluating against the access tibble columnsAbstract framing
Rules table is general-purpose. Same shape could drive future analogous classifications (e.g., habitat-class assignment, edge-type bucketing). The rules engine helper
.lnk_rules_evalis the reusable primitive — give it a rules tibble + an environment with the bound variables (any_anth,any_pscis, etc.) → returns a length-N character vector.Engine responsibilities:
(flavor, precedence)whenthat's TRUE assigns its token; subsequent rules are NA-only (don't overwrite)lnk_rules_validate(rules)checks for duplicate (flavor, precedence), unparseablewhenexpressions, missing tokens, unknown column referencesSource-flag column names
Today's six columns (
has_barriers_anthropogenic_dnstr, etc.) become the environment variables thewhenexpressions reference. Rules library defines aliases:any_anth := has_barriers_anthropogenic_dnstrany_pscis := has_barriers_pscis_dnstrany_dam_resident := dam_dnstr_ind OR has_barriers_dams_dnstr(fallback when sequence-aware unavailable)any_dam_anadr := has_barriers_dams_dnstrany_remed := remediated_dnstr_ind OR has_remediated_dnstrAliases declared in
parameters_mapping_code_inputs.csv(or as aninputs:block in the rules file). Decouples user-facing rule expressions from the raw column names — column names can be renamed later (#189-flavor cleanup) without touching every rule'swhen.Validation
lnk_rules_validate(rules, available_columns)runs before evaluation:whenparses as R (no syntax errors)whenare declared aliases or known columnswhen = TRUE) — guarantees no NA tokens leakAcceptance
inst/extdata/configs/bcfishpass/parameters_mapping_code_rules.csvships the current rules verbatim. Loaded intoloaded$parameters_mapping_code_rulesvialnk_load_overrides.inst/extdata/configs/default/parameters_mapping_code_rules.csvsame content (extendable per-bundle).lnk_pipeline_mapping_codetakes arulesarg (default reads fromloadedif available, else falls back to hardcoded chains for back-compat one release)..lnk_rules_eval(rules, env)helper — exported as@noRdinternal initially; could be promoted to user-facing if other classifications adopt it.lnk_rules_validate(rules)helper — validation surface.FALLSafterDAM) without R code change.Out of scope
INSERT...SELECT(no R round-trip, ~10× faster). Decision: stay R-side. 1–2 min provincial wall is acceptable; R-side keeps multi-row patterns / NULL semantics / unit-testability simpler. SQL is a follow-on if performance ever bites.parameters_mapping_code_inputs.csvfor now; bundle-level alias override is a follow-up.eval(parse())vsrlang::eval_tidy): implementation detail; pick at coding time.References
R/lnk_pipeline_mapping_code.R:174-210— current hardcoded logic.bcfishpass/sql/...streams_mapping_code.sql. Same precedence logic, expressed in SQL.