You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
data-raw/schema_consolidate.R (driver-side, runs on M4) iterates per-source per-table:
for (srcinsources) {
wgc_tables<- query_destination(...) # ← enumerated from DESTINATIONfor (tinwgc_tables) {
sshsrc"psql -c 'COPY (SELECT ... FROM <schema>.<t> WHERE wsg IN bucket) TO STDOUT'"...
}
}
wgc_tables is the list of tables on the destination. The destination accumulates tables across runs (e.g. M4's fresh_default carries streams_habitat_ch/sk/st/pk/ko/co residue from prior runs with study-area WSGs whose species presence covered the full set). Source hosts only create habitat tables for species their assigned bucket actually models (lnk_persist_init creates streams_habitat_<species> per species in the WSG's presence list).
Problem
When a source's table set is a strict subset of destination's, the per-table COPY hits the first absent table and breaks the loop:
ERROR: relation "fresh_default.streams_habitat_ch" does not exist
LINE 1: COPY (SELECT * FROM fresh_default.streams_habitat_ch WHERE w...
Effect: tables AFTER the failure point (alphabetically) never get copied even when they exist on the source.
Caught 2026-05-15 on branch 180-step0-additive-default (Peace Tier 2 retry, data-raw/logs/wsgs_run_pipeline/20260515_091243_consolidate.log). Cyphers' Peace bucket only built habitat tables for BT/GR/RB; M4 had 11 tables; loop broke at streams_habitat_ch so _gr and _rb data never copied. 7 WSGs (TOOD, NATR, UOMI, PCEA, FINA, MESI, CARP) ended up in M4's streams_habitat_bt only — missing RB/GR (and KO for NATR/CARP). Per-WSG modelling RDS confirmed the source data existed; consolidate just dropped it.
Goals
Per-source enumeration: each source's COPY loop iterates only tables that exist on that source (intersect with destination tables so we don't try to write to a destination-absent table).
next over break: a single per-table failure logs a warning and continues; doesn't poison other tables for the same source.
Return value's per-source result includes the full copied-table set + any errored tables — no silent partial copies.
Acceptance
Heterogeneous-config dispatch (some sources missing some habitat tables vs destination) succeeds end-to-end: every table that exists on a source lands on destination.
A specific table erroring is logged per-source per-table but does not prevent other tables for the same source from copying.
Result object names every table actually copied + any that errored; per-source ok reflects "all tables either copied or skipped-because-absent".
References
data-raw/schema_consolidate.R — wgc_tables enumeration around L135; per-table loop break around L210.
Consolidate log evidence: data-raw/logs/wsgs_run_pipeline/20260515_091243_consolidate.log (both cyphers stage=source_copy at streams_habitat_ch).
Context
data-raw/schema_consolidate.R(driver-side, runs on M4) iterates per-source per-table:wgc_tablesis the list of tables on the destination. The destination accumulates tables across runs (e.g. M4'sfresh_defaultcarriesstreams_habitat_ch/sk/st/pk/ko/coresidue from prior runs with study-area WSGs whose species presence covered the full set). Source hosts only create habitat tables for species their assigned bucket actually models (lnk_persist_initcreatesstreams_habitat_<species>per species in the WSG's presence list).Problem
When a source's table set is a strict subset of destination's, the per-table COPY hits the first absent table and
breaks the loop:Effect: tables AFTER the failure point (alphabetically) never get copied even when they exist on the source.
Caught 2026-05-15 on branch
180-step0-additive-default(Peace Tier 2 retry,data-raw/logs/wsgs_run_pipeline/20260515_091243_consolidate.log). Cyphers' Peace bucket only built habitat tables for BT/GR/RB; M4 had 11 tables; loop broke atstreams_habitat_chso_grand_rbdata never copied. 7 WSGs (TOOD, NATR, UOMI, PCEA, FINA, MESI, CARP) ended up in M4'sstreams_habitat_btonly — missing RB/GR (and KO for NATR/CARP). Per-WSG modelling RDS confirmed the source data existed; consolidate just dropped it.Goals
nextoverbreak: a single per-table failure logs a warning and continues; doesn't poison other tables for the same source.Acceptance
okreflects "all tables either copied or skipped-because-absent".References
data-raw/schema_consolidate.R—wgc_tablesenumeration around L135; per-table loopbreakaround L210.data-raw/logs/wsgs_run_pipeline/20260515_091243_consolidate.log(both cyphers stage=source_copyatstreams_habitat_ch).