Did my cohort pick the correct number of patients? Am I calculating an intersection in the right way? Is that the expected value for treatment duration? It only takes one incorrect parameter to get incoherent results in a pharmacoepidemiological study, and testing calculations on huge, complex databases is very challenging.
That is why TestGenerator is useful: it lets you push a small sample of patients to unit test a study on the OMOP CDM. It includes tools to create a blank CDM with a complete vocabulary and check whether the code is doing what we expect in very specific cases.
This package is based on the unit tests written for the Erasmus MC Ranitidine Study.
Install the released version from CRAN:
install.packages("TestGenerator")TestGenerator starts from a small patient dataset. The data can be stored in an Excel workbook, with one sheet per OMOP CDM table, or in a folder of CSV files, with one file per table.
For help creating an Excel input from scratch, see the Start from a Blank Excel Template section of the website.
The package then converts those files into a Unit Test Definition JSON file. This JSON file is the object you keep in your package tests.
TestGenerator::readPatients(
filePath = "inst/extdata/icu_sample_population.xlsx",
testName = "icu_sample",
outputPath = "tests/testthat/testCases",
cdmVersion = "5.4"
)If outputPath = NULL, the JSON file is written to
tests/testthat/testCases, which is the usual location for package
tests.
You can also call the Excel and CSV readers directly:
TestGenerator::readPatients.xl(
filePath = "inst/extdata/icu_sample_population.xlsx",
testName = "icu_sample",
outputPath = "tests/testthat/testCases",
cdmVersion = "5.4"
)
TestGenerator::readPatients.csv(
filePath = "inst/extdata/icu_sample_population_csv",
testName = "icu_sample",
outputPath = "tests/testthat/testCases",
cdmVersion = "5.4",
reduceLargeIds = FALSE
)Use patientsCDM() to load one Unit Test Definition into a blank OMOP
CDM. By default, this creates a local DuckDB CDM with the small patient
population and the vocabulary needed for testing.
cdm <- TestGenerator::patientsCDM(
pathJson = "tests/testthat/testCases",
testName = "icu_sample",
cdmVersion = "5.4"
)If pathJson = NULL, TestGenerator looks for the JSON file in
tests/testthat/testCases.
The example below uses the sample ICU population included in the package.
file_path <- system.file(
"extdata",
"icu_sample_population.xlsx",
package = "TestGenerator"
)
output_path <- file.path(tempdir(), "testgenerator-example")
dir.create(output_path, recursive = TRUE, showWarnings = FALSE)
TestGenerator::readPatients(
filePath = file_path,
testName = "icu_sample",
outputPath = output_path,
cdmVersion = "5.4"
)
cdm <- TestGenerator::patientsCDM(
pathJson = output_path,
testName = "icu_sample",
cdmVersion = "5.4"
)
DBI::dbDisconnect(CDMConnector::cdmCon(cdm), shutdown = TRUE)
unlink(output_path, recursive = TRUE)The most useful pattern is to keep a small JSON test case in
tests/testthat/testCases, build a CDM inside a testthat test, run
your study code, and assert the expected result.
testthat::test_that("cohort construction returns the expected patients", {
cdm <- TestGenerator::patientsCDM(
pathJson = "tests/testthat/testCases",
testName = "icu_sample",
cdmVersion = "5.4"
)
withr::defer(
DBI::dbDisconnect(CDMConnector::cdmCon(cdm), shutdown = TRUE)
)
cohort_set <- CDMConnector::readCohortSet(
system.file("extdata", "test_cohorts", package = "TestGenerator")
)
cdm <- CDMConnector::generateCohortSet(
cdm = cdm,
cohortSet = cohort_set,
name = "test_cohorts"
)
result <- cdm[["test_cohorts"]] |>
dplyr::collect()
testthat::expect_equal(
sort(unique(result$subject_id)),
c(1, 2, 4, 5, 6, 7)
)
})The exact expectation should come from the micro population you designed. Good tests usually check subject counts, inclusion or exclusion rules, cohort dates, or treatment durations that are easy to verify by hand.