2 Introduction

In this vignette we will explore the OmopSketch functions designed to provide a concise overview of the OMOP person table. Specifically there are two small utilities that make this easy:

2.1 Create a mock cdm

Let’s load the required packages and create a mock CDM using the R package omock so we can run the functions on a small example.

library(dplyr)
library(OmopSketch)
library(omock)

# Connect to mock database
cdm <- mockCdmFromDataset(datasetName = "GiBleed", source = "duckdb") 
#> ℹ Reading GiBleed tables.
#> ℹ Adding drug_strength table.
#> ℹ Creating local <cdm_reference> object.
#> ℹ Inserting <cdm_reference> into duckdb.
cdm
#> 
#> ── # OMOP CDM reference (duckdb) of GiBleed ────────────────────────────────────
#> • omop tables: care_site, cdm_source, concept, concept_ancestor, concept_class,
#> concept_relationship, concept_synonym, condition_era, condition_occurrence,
#> cost, death, device_exposure, domain, dose_era, drug_era, drug_exposure,
#> drug_strength, fact_relationship, location, measurement, metadata, note,
#> note_nlp, observation, observation_period, payer_plan_period, person,
#> procedure_occurrence, provider, relationship, source_to_concept_map, specimen,
#> visit_detail, visit_occurrence, vocabulary
#> • cohort tables: -
#> • achilles tables: -
#> • other tables: -

3 Summarise person table

Run summarisePerson() to compute basic summaries for the person table. The function will return a summarised_result.

result <- summarisePerson(cdm = cdm)

result |> 
  glimpse()
#> Rows: 123
#> Columns: 13
#> $ result_id        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name         <chr> "GiBleed", "GiBleed", "GiBleed", "GiBleed", "GiBleed"…
#> $ group_name       <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ group_level      <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_name      <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level     <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name    <chr> "Number subjects", "Number subjects not in observatio…
#> $ variable_level   <chr> NA, NA, NA, "Female", "Female", "Male", "Male", "None…
#> $ estimate_name    <chr> "count", "count", "percentage", "count", "percentage"…
#> $ estimate_type    <chr> "integer", "integer", "numeric", "integer", "numeric"…
#> $ estimate_value   <chr> "2694", "0", "0", "1373", "50.9651076466221", "1321",…
#> $ additional_name  <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…

3.1 What the function reports

summarisePerson() builds a set of common summaries:

  • Number subjects: total number of rows in person.

  • Number subjects not in observation: number (and percentage) of persons that do not appear in observation_period (useful to detect missing observation periods). A warning is emitted if any are found.

  • Sex: counts and percentages for the sex categories (Female, Male, Missing).

  • A separate Sex source table shows the raw gender_source_value distribution.

  • Race / Race source: distribution of race_concept_id and race_source_value

  • Ethnicity / Ethnicity source: distribution of ethnicity_concept_id and ethnicity_source_value.

  • Year / Month / Day of birth: numeric summaries (missingness, quantiles, min/max) of birth date components.

  • Location, Provider, Care site: number of missing, zeros, distinct values.

4 Tidy the summarised object

tablePerson() will help you to tidy the previous results and create a formatted table of type gt, reactable or datatable. By default it creates a gt table.

tablePerson(result = result, type = "gt")
Summary of person table
Variable name Variable level Estimate name
CDM name
GiBleed
Number subjects N 2,694
Number subjects not in observation N (%) 0 (0.00%)
Sex Female N (%) 1,373 (50.97%)
Male N (%) 1,321 (49.03%)
None N (%) 0 (0.00%)
Sex source F N (%) 1,373 (50.97%)
M N (%) 1,321 (49.03%)
Race No matching concept N (%) 451 (16.74%)
Missing N (%) 2,243 (83.26%)
Race source asian N (%) 212 (7.87%)
black N (%) 338 (12.55%)
hispanic N (%) 435 (16.15%)
native N (%) 14 (0.52%)
other N (%) 2 (0.07%)
white N (%) 1,693 (62.84%)
Ethnicity No matching concept N (%) 2,259 (83.85%)
Missing N (%) 435 (16.15%)
Ethnicity source african N (%) 119 (4.42%)
american N (%) 79 (2.93%)
american_indian N (%) 14 (0.52%)
arab N (%) 2 (0.07%)
asian_indian N (%) 81 (3.01%)
central_american N (%) 75 (2.78%)
chinese N (%) 131 (4.86%)
dominican N (%) 105 (3.90%)
english N (%) 218 (8.09%)
french N (%) 129 (4.79%)
french_canadian N (%) 74 (2.75%)
german N (%) 130 (4.83%)
greek N (%) 19 (0.71%)
irish N (%) 438 (16.26%)
italian N (%) 295 (10.95%)
mexican N (%) 42 (1.56%)
polish N (%) 107 (3.97%)
portuguese N (%) 93 (3.45%)
puerto_rican N (%) 258 (9.58%)
russian N (%) 34 (1.26%)
scottish N (%) 48 (1.78%)
south_american N (%) 60 (2.23%)
swedish N (%) 29 (1.08%)
west_indian N (%) 114 (4.23%)
Year of birth Missing (%) 0 (0.00%)
Median [Q25 - Q75] 1,961 [1,950 - 1,970]
90% Range [Q05 to Q95] 1,922 to 1,979
Range [min to max] 1,908 to 1,986
Month of birth Missing (%) 0 (0.00%)
Median [Q25 - Q75] 7 [4 - 10]
90% Range [Q05 to Q95] 1 to 12
Range [min to max] 1 to 12
Day of birth Missing (%) 0 (0.00%)
Median [Q25 - Q75] 16 [8 - 23]
90% Range [Q05 to Q95] 2 to 29
Range [min to max] 1 to 31
Location Missing (%) 2,694 (100.00%)
Zero count (%) 0 (0.00%)
Distinct values 1
Provider Missing (%) 2,694 (100.00%)
Zero count (%) 0 (0.00%)
Distinct values 1
Care site Missing (%) 2,694 (100.00%)
Zero count (%) 0 (0.00%)
Distinct values 1

5 Disconnect from CDM

Finally, disconnect from the mock CDM.

cdmDisconnect(cdm = cdm)

mirror server hosted at Truenetwork, Russian Federation.