In this vignette we will explore the OmopSketch functions designed to provide a concise overview of the OMOP person table. Specifically there are two small utilities that make this easy:
summarisePerson(): computes a set of summary statistics
and data-quality checks for the person table (total subjects, missing
observation-period checks, sex/race/ethnicity distributions, birth-date
components, and simple summaries for id-columns such as location_id,
provider_id, and care_site_id).tablePerson(): helps visualising the results in a
formatted table.Let’s load the required packages and create a mock CDM using the R package omock so we can run the functions on a small example.
library(dplyr)
library(OmopSketch)
library(omock)
# Connect to mock database
cdm <- mockCdmFromDataset(datasetName = "GiBleed", source = "duckdb")
#> ℹ Reading GiBleed tables.
#> ℹ Adding drug_strength table.
#> ℹ Creating local <cdm_reference> object.
#> ℹ Inserting <cdm_reference> into duckdb.
cdm
#>
#> ── # OMOP CDM reference (duckdb) of GiBleed ────────────────────────────────────
#> • omop tables: care_site, cdm_source, concept, concept_ancestor, concept_class,
#> concept_relationship, concept_synonym, condition_era, condition_occurrence,
#> cost, death, device_exposure, domain, dose_era, drug_era, drug_exposure,
#> drug_strength, fact_relationship, location, measurement, metadata, note,
#> note_nlp, observation, observation_period, payer_plan_period, person,
#> procedure_occurrence, provider, relationship, source_to_concept_map, specimen,
#> visit_detail, visit_occurrence, vocabulary
#> • cohort tables: -
#> • achilles tables: -
#> • other tables: -
Run summarisePerson() to compute basic summaries for the person table. The function will return a summarised_result.
result <- summarisePerson(cdm = cdm)
result |>
glimpse()
#> Rows: 123
#> Columns: 13
#> $ result_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name <chr> "GiBleed", "GiBleed", "GiBleed", "GiBleed", "GiBleed"…
#> $ group_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ group_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name <chr> "Number subjects", "Number subjects not in observatio…
#> $ variable_level <chr> NA, NA, NA, "Female", "Female", "Male", "Male", "None…
#> $ estimate_name <chr> "count", "count", "percentage", "count", "percentage"…
#> $ estimate_type <chr> "integer", "integer", "numeric", "integer", "numeric"…
#> $ estimate_value <chr> "2694", "0", "0", "1373", "50.9651076466221", "1321",…
#> $ additional_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…
summarisePerson() builds a set of common summaries:
Number subjects: total number of rows in person.
Number subjects not in observation: number (and percentage) of persons that do not appear in observation_period (useful to detect missing observation periods). A warning is emitted if any are found.
Sex: counts and percentages for the sex categories (Female, Male, Missing).
A separate Sex source table shows the raw gender_source_value distribution.
Race / Race source: distribution of race_concept_id and race_source_value
Ethnicity / Ethnicity source: distribution of ethnicity_concept_id and ethnicity_source_value.
Year / Month / Day of birth: numeric summaries (missingness, quantiles, min/max) of birth date components.
Location, Provider, Care site: number of missing, zeros, distinct values.
tablePerson() will help you to tidy the previous results
and create a formatted table of type gt, reactable or datatable. By default it
creates a gt table.
tablePerson(result = result, type = "gt")
| Variable name | Variable level | Estimate name |
CDM name
|
|---|---|---|---|
| GiBleed | |||
| Number subjects | – | N | 2,694 |
| Number subjects not in observation | – | N (%) | 0 (0.00%) |
| Sex | Female | N (%) | 1,373 (50.97%) |
| Male | N (%) | 1,321 (49.03%) | |
| None | N (%) | 0 (0.00%) | |
| Sex source | F | N (%) | 1,373 (50.97%) |
| M | N (%) | 1,321 (49.03%) | |
| Race | No matching concept | N (%) | 451 (16.74%) |
| Missing | N (%) | 2,243 (83.26%) | |
| Race source | asian | N (%) | 212 (7.87%) |
| black | N (%) | 338 (12.55%) | |
| hispanic | N (%) | 435 (16.15%) | |
| native | N (%) | 14 (0.52%) | |
| other | N (%) | 2 (0.07%) | |
| white | N (%) | 1,693 (62.84%) | |
| Ethnicity | No matching concept | N (%) | 2,259 (83.85%) |
| Missing | N (%) | 435 (16.15%) | |
| Ethnicity source | african | N (%) | 119 (4.42%) |
| american | N (%) | 79 (2.93%) | |
| american_indian | N (%) | 14 (0.52%) | |
| arab | N (%) | 2 (0.07%) | |
| asian_indian | N (%) | 81 (3.01%) | |
| central_american | N (%) | 75 (2.78%) | |
| chinese | N (%) | 131 (4.86%) | |
| dominican | N (%) | 105 (3.90%) | |
| english | N (%) | 218 (8.09%) | |
| french | N (%) | 129 (4.79%) | |
| french_canadian | N (%) | 74 (2.75%) | |
| german | N (%) | 130 (4.83%) | |
| greek | N (%) | 19 (0.71%) | |
| irish | N (%) | 438 (16.26%) | |
| italian | N (%) | 295 (10.95%) | |
| mexican | N (%) | 42 (1.56%) | |
| polish | N (%) | 107 (3.97%) | |
| portuguese | N (%) | 93 (3.45%) | |
| puerto_rican | N (%) | 258 (9.58%) | |
| russian | N (%) | 34 (1.26%) | |
| scottish | N (%) | 48 (1.78%) | |
| south_american | N (%) | 60 (2.23%) | |
| swedish | N (%) | 29 (1.08%) | |
| west_indian | N (%) | 114 (4.23%) | |
| Year of birth | – | Missing (%) | 0 (0.00%) |
| Median [Q25 - Q75] | 1,961 [1,950 - 1,970] | ||
| 90% Range [Q05 to Q95] | 1,922 to 1,979 | ||
| Range [min to max] | 1,908 to 1,986 | ||
| Month of birth | – | Missing (%) | 0 (0.00%) |
| Median [Q25 - Q75] | 7 [4 - 10] | ||
| 90% Range [Q05 to Q95] | 1 to 12 | ||
| Range [min to max] | 1 to 12 | ||
| Day of birth | – | Missing (%) | 0 (0.00%) |
| Median [Q25 - Q75] | 16 [8 - 23] | ||
| 90% Range [Q05 to Q95] | 2 to 29 | ||
| Range [min to max] | 1 to 31 | ||
| Location | – | Missing (%) | 2,694 (100.00%) |
| Zero count (%) | 0 (0.00%) | ||
| Distinct values | 1 | ||
| Provider | – | Missing (%) | 2,694 (100.00%) |
| Zero count (%) | 0 (0.00%) | ||
| Distinct values | 1 | ||
| Care site | – | Missing (%) | 2,694 (100.00%) |
| Zero count (%) | 0 (0.00%) | ||
| Distinct values | 1 |
Finally, disconnect from the mock CDM.
cdmDisconnect(cdm = cdm)