WRB description sync: resync global snapshots, fix image build + rank determinism#369
Merged
Merged
Conversation
Regenerate the global soil-ID snapshots against the rebuilt ghcr.io/techmatters/soil-id-db:latest, whose wrb_fao90_desc table no longer contains <br> tags (WRB description sync). All 14 location snapshots update; the change is description text (Description_en / Management_en) only, except location [7.3318,-1.4631], which also reflects a minor HWSD component-data shift (componentID 133621 -> 133618, share 10 -> 20) present in the rebuilt image. Generated on main (no algorithm change) so these are purely data-driven. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- dump_soil_id_db now dumps from inside the PG16 soil-id-db container
(docker exec) instead of the host pg_dump. A host pg_dump newer than 16
writes an archive the image's pg_restore 16 cannot read ("unsupported
version in file header"), which silently broke build_docker_image.
Container name overridable via SOIL_ID_DB_CONTAINER.
- build_docker_image now passes --platform linux/amd64: the postgis:16-3.5
base has no arm64 manifest, and CI/prod run amd64 anyway.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CI's `docker compose up -d` uses docker-compose.yml, which pinned the DB to :0.3 — the old image still containing <br> in wrb_fao90_desc. That's why the regenerated (no-<br>) global snapshots kept failing CI regardless of :latest being rebuilt. Bump the pin to the freshly built/pushed dated tag so CI runs against the synced descriptions. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nces drop_cokey_horz kept the smallest-distance duplicate but, when two instances of the same soil tied on distance (same location), the survivor depended on Postgres/Pandas input row order — which differs across environments. That made the global ranking non-deterministic: the same soil could be reported under different componentID/share on CI vs local, breaking the snapshot tests intermittently (e.g. global location 7.3318,-1.4631 alternating component 133618/133621). Break distance ties by cokey so the same instance is always kept. No snapshot values change (they already reflect the lower-cokey choice); this just makes that choice stable everywhere. Shared by the global and US paths. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
knipec
approved these changes
Jun 24, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Lands the WRB description sync (the
<br>-stripped narratives) into the global soil-ID snapshots, and fixes everything that was blocking CI from running against the corrected data.Commits
26bbe39test: resync global snapshots to the rebuiltsoil-id-dbimage (no<br>inwrb_fao90_desc). 13/14 files are description-text only;[7.3318,-1.4631]also reflects an HWSD component-data shift in the rebuilt image.366f76dbuild: version/platform-robust image build —dump_soil_id_dbnow dumps from inside the PG16 container (a hostpg_dump > 16writes an archivepg_restore 16can't read), andbuild_docker_imagetargetslinux/amd64(thepostgis:16-3.5base has no arm64 manifest).4642ea9ci: pin soil-id-db image to2026-06-23— CI'sdocker compose up -dreads the pinneddb.imageindocker-compose.yml, not:latest, so pushing the image alone never reached CI. This is the change that actually makes CI use the synced descriptions.c7de3e2fix: deterministic tiebreaker indrop_cokey_horz— when two instances of the same soil tied on distance, the survivor depended on Postgres/Pandas row order, so the ranking was non-deterministic across environments (CI vs local picked differentcomponentID/share). Break ties bycokey. No snapshot values change; it just makes the choice stable. Also de-flakes the US path, which shares the function.Notes
soil-id-db:latest+ dated:2026-06-23images were rebuilt and pushed manually (see the localization wiki, updated alongside this).wrb_descriptions_sync --write.🤖 Generated with Claude Code