Skip to content

WRB description sync: resync global snapshots, fix image build + rank determinism#369

Merged
johannesparty merged 4 commits into
mainfrom
chore/wrb-description-snapshots
Jun 24, 2026
Merged

WRB description sync: resync global snapshots, fix image build + rank determinism#369
johannesparty merged 4 commits into
mainfrom
chore/wrb-description-snapshots

Conversation

@johannesparty

@johannesparty johannesparty commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Lands the WRB description sync (the <br>-stripped narratives) into the global soil-ID snapshots, and fixes everything that was blocking CI from running against the corrected data.

Commits

  • 26bbe39 test: resync global snapshots to the rebuilt soil-id-db image (no <br> in wrb_fao90_desc). 13/14 files are description-text only; [7.3318,-1.4631] also reflects an HWSD component-data shift in the rebuilt image.
  • 366f76d build: version/platform-robust image builddump_soil_id_db now dumps from inside the PG16 container (a host pg_dump > 16 writes an archive pg_restore 16 can't read), and build_docker_image targets linux/amd64 (the postgis:16-3.5 base has no arm64 manifest).
  • 4642ea9 ci: pin soil-id-db image to 2026-06-23 — CI's docker compose up -d reads the pinned db.image in docker-compose.yml, not :latest, so pushing the image alone never reached CI. This is the change that actually makes CI use the synced descriptions.
  • c7de3e2 fix: deterministic tiebreaker in drop_cokey_horz — when two instances of the same soil tied on distance, the survivor depended on Postgres/Pandas row order, so the ranking was non-deterministic across environments (CI vs local picked different componentID/share). Break ties by cokey. No snapshot values change; it just makes the choice stable. Also de-flakes the US path, which shares the function.

Notes

  • The soil-id-db:latest + dated :2026-06-23 images were rebuilt and pushed manually (see the localization wiki, updated alongside this).
  • Staging/prod are unaffected here — they read soil-id descriptions from the primary app DB and are synced separately via wrb_descriptions_sync --write.

🤖 Generated with Claude Code

Regenerate the global soil-ID snapshots against the rebuilt
ghcr.io/techmatters/soil-id-db:latest, whose wrb_fao90_desc table no
longer contains <br> tags (WRB description sync). All 14 location
snapshots update; the change is description text (Description_en /
Management_en) only, except location [7.3318,-1.4631], which also
reflects a minor HWSD component-data shift (componentID 133621 -> 133618,
share 10 -> 20) present in the rebuilt image.

Generated on main (no algorithm change) so these are purely data-driven.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@johannesparty johannesparty requested a review from knipec June 23, 2026 22:07
johannesparty and others added 3 commits June 23, 2026 21:15
- dump_soil_id_db now dumps from inside the PG16 soil-id-db container
  (docker exec) instead of the host pg_dump. A host pg_dump newer than 16
  writes an archive the image's pg_restore 16 cannot read ("unsupported
  version in file header"), which silently broke build_docker_image.
  Container name overridable via SOIL_ID_DB_CONTAINER.
- build_docker_image now passes --platform linux/amd64: the postgis:16-3.5
  base has no arm64 manifest, and CI/prod run amd64 anyway.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CI's `docker compose up -d` uses docker-compose.yml, which pinned the DB
to :0.3 — the old image still containing <br> in wrb_fao90_desc. That's
why the regenerated (no-<br>) global snapshots kept failing CI regardless
of :latest being rebuilt. Bump the pin to the freshly built/pushed dated
tag so CI runs against the synced descriptions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nces

drop_cokey_horz kept the smallest-distance duplicate but, when two
instances of the same soil tied on distance (same location), the survivor
depended on Postgres/Pandas input row order — which differs across
environments. That made the global ranking non-deterministic: the same
soil could be reported under different componentID/share on CI vs local,
breaking the snapshot tests intermittently (e.g. global location
7.3318,-1.4631 alternating component 133618/133621).

Break distance ties by cokey so the same instance is always kept.
No snapshot values change (they already reflect the lower-cokey choice);
this just makes that choice stable everywhere. Shared by the global and
US paths.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@johannesparty johannesparty changed the title test: resync global snapshots to rebuilt soil-id-db image WRB description sync: resync global snapshots, fix image build + rank determinism Jun 24, 2026
@johannesparty johannesparty merged commit f632d79 into main Jun 24, 2026
7 checks passed
@johannesparty johannesparty deleted the chore/wrb-description-snapshots branch June 24, 2026 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants