Add SourceConceptId uniqueness validation to validateUsagiFile#70
Conversation
Co-authored-by: javier-gracia-tabuenca-tuni <54809193+javier-gracia-tabuenca-tuni@users.noreply.github.com> Agent-Logs-Url: https://github.com/FinOMOP/ROMOPMappingTools/sessions/7ce96e01-5b9b-4d67-9044-42baa2b415e2
| usagiTibble <- result$fileTibble | ||
| validationLogR6 <- result$validationLogR6 | ||
|
|
||
| # Check SourceConceptId is unique (each sourceConceptId should belong to only one sourceCode) |
There was a problem hiding this comment.
why dont you do this in the upper section using the validate:: package
just like
SourceConceptId.SouceCodes.are.unique = is_unique(ADD_INFO:sourceConceptId, sourceCode)
There was a problem hiding this comment.
Refactored in commit 3517032 to use validate:: — however, is_unique(sourceConceptId, sourceCode) directly on the full tibble would incorrectly flag valid multi-mapped codes (the valid file has 420 source codes that appear multiple times with the same sourceConceptId but different conceptIds).
The fix deduplicates to distinct (sourceCode, sourceConceptId) pairs first, then uses is_unique(sourceConceptId) on that deduplicated tibble:
usagiDistinct <- usagiTibble |>
dplyr::filter(!is.na(`ADD_INFO:sourceConceptId`)) |>
dplyr::distinct(sourceCode, `ADD_INFO:sourceConceptId`)
distinctValidationRules <- validate::validator(
SourceConceptId.is.not.unique = is_unique(`ADD_INFO:sourceConceptId`)
)
distinctValidations <- validate::confront(usagiDistinct, distinctValidationRules)…icated tibble Co-authored-by: javier-gracia-tabuenca-tuni <54809193+javier-gracia-tabuenca-tuni@users.noreply.github.com> Agent-Logs-Url: https://github.com/FinOMOP/ROMOPMappingTools/sessions/26984a07-d241-421c-8b78-ee643cf070e7
|
@copilot can you look at the error from the actions and fix it |
|
@javier-gracia-tabuenca-tuni The model You can try again without specifying a model (just If you want to contact GitHub about this error, please mention the following identifier so they can better serve you: |
2 similar comments
|
@javier-gracia-tabuenca-tuni The model You can try again without specifying a model (just If you want to contact GitHub about this error, please mention the following identifier so they can better serve you: |
|
@javier-gracia-tabuenca-tuni The model You can try again without specifying a model (just If you want to contact GitHub about this error, please mention the following identifier so they can better serve you: |
|
@javier-gracia-tabuenca-tuni The model You can try again without specifying a model (just If you want to contact GitHub about this error, please mention the following identifier so they can better serve you: |
1 similar comment
|
@javier-gracia-tabuenca-tuni The model You can try again without specifying a model (just If you want to contact GitHub about this error, please mention the following identifier so they can better serve you: |
|
actions return this error | ══ Failed tests ════════════════════════════════════════════════════════════════
fix it |
… valid A18.8+E35.0 row Co-authored-by: javier-gracia-tabuenca-tuni <54809193+javier-gracia-tabuenca-tuni@users.noreply.github.com> Agent-Logs-Url: https://github.com/FinOMOP/ROMOPMappingTools/sessions/4322906e-3901-4f87-aacc-a2d929dab135
Fixed in commit 3df863e. The
Replaced
|
validateUsagiFiledid not detect cases where the sameADD_INFO:sourceConceptIdwas assigned to multiple distinct source codes — a constraint violation since each source concept must own a unique concept ID.Changes
R/validateUsagiFile.R— Added uniqueness check in the C&CR validation block using thevalidate::package. Deduplicates to distinct(sourceCode, sourceConceptId)pairs first (to avoid false positives from valid multi-mapped codes where one source code legitimately maps to multiple target concepts), then usesis_unique()to detect sourceConceptIds shared across different source codes:Affected rows are flagged with
"ERROR: SourceConceptId is not unique".inst/testdata/VOCABULARIES/ICD10fi/ICD10fi_with_errors.usagi.csvA17.8+G63.1andA17.8+G63.2were unintentionally sharingsourceConceptId=2000500119withA17.8+G63.0; assigned unique IDs2000599997/2000599998.A99.0([SourceConceptId is not unique]Test duplicate sourceConceptId) that reusessourceConceptId=2000500124(also used byA18.0+M68.0) to deliberately trigger the new rule.tests/testthat/test-validateUsagiFile.R— Added assertions verifying the new validation step appears in the summary, flags exactly the intended row, sets the correctADD_INFO:validationMessages, and setsmappingStatustoFLAGGED.Original prompt
⌨️ Start Copilot coding agent tasks without leaving your editor — available in VS Code, Visual Studio, JetBrains IDEs and Eclipse.