Skip to content

lnk_compare_mapping_code: reference = "link" for link-vs-link comparisons (Comparison B) #213

Description

@NewGraphEnvironment

Problem

lnk_compare_mapping_code() today hard-codes the comparison reference to the bcfp snapshot view (ref_table = "fresh.streams_vw_bcfp"). That works for Comparison A (link/bcfp config vs bcfp upstream — the v0.40.5 baseline, 99.66% median across the 50 study-area WSGs × 7 species).

It doesn't support Comparison Blink/bcfp config vs link/default config, both link-persisted, no bcfp reference involved. That's the comparison that isolates the pure parameter-tuning delta between bcfp-style tuning and newgraph-default tuning on the full species set.

First Comparison B run (v0.41.3, default-config-vs-bcfp-snapshot) showed a clear parameter-tuning signal — median 96.45% vs bcfp's 99.66%, with default's BT taking the biggest hit (−7.4 points) — but the diff was vs bcfp, not vs link/bcfp. To get the apples-to-apples link-vs-link diff (especially across all 11 species after #212 lands the bcfp-style KO/RB/GR rows), we need the comparator to diff two link persist schemas directly.

Proposed

Extend lnk_compare_mapping_code() to accept a second persist schema's streams_mapping_code as the reference:

lnk_compare_mapping_code(
  conn, aoi, cfg,
  reference = c("bcfishpass", "link"),   # "link" is new
  conn_ref  = NULL,
  species   = NULL,
  ref_table = NULL                       # default depends on `reference`
)
  • reference = "bcfishpass" (default, unchanged) — diff vs fresh.streams_vw_bcfp as today
  • reference = "link" (new) — diff vs another link persist schema's streams_mapping_code (e.g. fresh_default.streams_mapping_code). Same row shape, same column conventions, just different table

When reference = "link", ref_table defaults to "fresh.streams_mapping_code" (the link/bcfp-config persist), and conn_ref defaults to conn (same DB; both schemas live in the local fwapg).

Comparison shape: same return tibble (wsg, species, total_segs, match_pct, n_diffs, top_pattern, top_pattern_count) but each row's match_pct now means "fraction of segments where the two LINK runs agree on the per-segment mapping_code," not "fraction matching bcfp."

Driver-side wrapper

data-raw/wsg_compare.R gets a new function:

wsg_compare_link_vs_link <- function(wsg, config, ref_schema, species = NULL)

data-raw/study_area_compare.R learns a new mode (--reference=link --ref-schema=fresh) so the driver can produce a CSV against another link schema.

Acceptance

  • lnk_compare_mapping_code(reference = "link", ref_table = "fresh.streams_mapping_code", ...) returns the same tibble shape as the bcfp-reference variant
  • When the two persist schemas hold identical data (e.g. run with same config twice), match_pct = 100% across all rows (smoke test)
  • Live test: after link-authored bcfp-style rules for KO, RB, GR in bcfishpass config (for link-vs-link comparison on the full species set) #212 lands and bcfp config is re-run on the 50 study-area WSGs (full 11 species), wsg_compare_link_vs_link against fresh_default produces a CSV with 11 species × 50 WSGs of parameter-tuning deltas
  • Driver: study_area_run.sh --compare-mode=link-vs-link --ref-schema=... or similar flag wiring
  • Tests: arg validation + stub-based diff + (gated) live test
  • Per-row notes updated; pkgdown reference page reflects the new reference = "link" branch

Related

  • link#212 (bcfp-style KO/RB/GR rules in bcfishpass config — the species-set side; this issue is the compare-function side)
  • v0.40.5 (the bcfp baseline / current Comparison A)
  • v0.41.3 (the first Comparison B run, currently limited to vs-bcfp comparison)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions