2 Introduction
- 2.1 Create a mock cdm
3 Summarise person table
- 3.1 What the function reports
4 Tidy the summarised object
5 Disconnect from CDM

2 Introduction

In this vignette we will explore the OmopSketch functions designed to provide a concise overview of the OMOP person table. Specifically there are two small utilities that make this easy:

summarisePerson(): computes a set of summary statistics and data-quality checks for the person table (total subjects, missing observation-period checks, sex/race/ethnicity distributions, birth-date components, and simple summaries for id-columns such as location_id, provider_id, and care_site_id).
tablePerson(): helps visualising the results in a formatted table.

2.1 Create a mock cdm

Let’s load the required packages and create a mock CDM using the R package omock so we can run the functions on a small example.

library(dplyr)
library(OmopSketch)
library(omock)

# Connect to mock database
cdm <- mockCdmFromDataset(datasetName = "GiBleed", source = "duckdb") 
#> ℹ Reading GiBleed tables.
#> ℹ Adding drug_strength table.
#> ℹ Creating local <cdm_reference> object.
#> ℹ Inserting <cdm_reference> into duckdb.
cdm
#> 
#> ── # OMOP CDM reference (duckdb) of GiBleed ────────────────────────────────────
#> • omop tables: care_site, cdm_source, concept, concept_ancestor, concept_class,
#> concept_relationship, concept_synonym, condition_era, condition_occurrence,
#> cost, death, device_exposure, domain, dose_era, drug_era, drug_exposure,
#> drug_strength, fact_relationship, location, measurement, metadata, note,
#> note_nlp, observation, observation_period, payer_plan_period, person,
#> procedure_occurrence, provider, relationship, source_to_concept_map, specimen,
#> visit_detail, visit_occurrence, vocabulary
#> • cohort tables: -
#> • achilles tables: -
#> • other tables: -

3 Summarise person table

Run summarisePerson() to compute basic summaries for the person table. The function will return a summarised_result.

result <- summarisePerson(cdm = cdm)

result |> 
  glimpse()
#> Rows: 123
#> Columns: 13
#> $ result_id        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name         <chr> "GiBleed", "GiBleed", "GiBleed", "GiBleed", "GiBleed"…
#> $ group_name       <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ group_level      <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_name      <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level     <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name    <chr> "Number subjects", "Number subjects not in observatio…
#> $ variable_level   <chr> NA, NA, NA, "Female", "Female", "Male", "Male", "None…
#> $ estimate_name    <chr> "count", "count", "percentage", "count", "percentage"…
#> $ estimate_type    <chr> "integer", "integer", "numeric", "integer", "numeric"…
#> $ estimate_value   <chr> "2694", "0", "0", "1373", "50.9651076466221", "1321",…
#> $ additional_name  <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…

3.1 What the function reports

summarisePerson() builds a set of common summaries:

Number subjects: total number of rows in person.
Number subjects not in observation: number (and percentage) of persons that do not appear in observation_period (useful to detect missing observation periods). A warning is emitted if any are found.
Sex: counts and percentages for the sex categories (Female, Male, Missing).
A separate Sex source table shows the raw gender_source_value distribution.
Race / Race source: distribution of race_concept_id and race_source_value
Ethnicity / Ethnicity source: distribution of ethnicity_concept_id and ethnicity_source_value.
Year / Month / Day of birth: numeric summaries (missingness, quantiles, min/max) of birth date components.
Location, Provider, Care site: number of missing, zeros, distinct values.

4 Tidy the summarised object

tablePerson() will help you to tidy the previous results and create a formatted table of type gt, reactable or datatable. By default it creates a gt table.

tablePerson(result = result, type = "gt")

Summary of person table
Variable name	Variable level	Estimate name	CDM name
Variable name	Variable level	Estimate name	GiBleed
Number subjects	–	N	2,694
Number subjects not in observation	–	N (%)	0 (0.00%)
Sex	Female	N (%)	1,373 (50.97%)
	Male	N (%)	1,321 (49.03%)
	None	N (%)	0 (0.00%)
Sex source	F	N (%)	1,373 (50.97%)
	M	N (%)	1,321 (49.03%)
Race	No matching concept	N (%)	451 (16.74%)
	Missing	N (%)	2,243 (83.26%)
Race source	asian	N (%)	212 (7.87%)
	black	N (%)	338 (12.55%)
	hispanic	N (%)	435 (16.15%)
	native	N (%)	14 (0.52%)
	other	N (%)	2 (0.07%)
	white	N (%)	1,693 (62.84%)
Ethnicity	No matching concept	N (%)	2,259 (83.85%)
	Missing	N (%)	435 (16.15%)
Ethnicity source	african	N (%)	119 (4.42%)
	american	N (%)	79 (2.93%)
	american_indian	N (%)	14 (0.52%)
	arab	N (%)	2 (0.07%)
	asian_indian	N (%)	81 (3.01%)
	central_american	N (%)	75 (2.78%)
	chinese	N (%)	131 (4.86%)
	dominican	N (%)	105 (3.90%)
	english	N (%)	218 (8.09%)
	french	N (%)	129 (4.79%)
	french_canadian	N (%)	74 (2.75%)
	german	N (%)	130 (4.83%)
	greek	N (%)	19 (0.71%)
	irish	N (%)	438 (16.26%)
	italian	N (%)	295 (10.95%)
	mexican	N (%)	42 (1.56%)
	polish	N (%)	107 (3.97%)
	portuguese	N (%)	93 (3.45%)
	puerto_rican	N (%)	258 (9.58%)
	russian	N (%)	34 (1.26%)
	scottish	N (%)	48 (1.78%)
	south_american	N (%)	60 (2.23%)
	swedish	N (%)	29 (1.08%)
	west_indian	N (%)	114 (4.23%)
Year of birth	–	Missing (%)	0 (0.00%)
		Median [Q25 - Q75]	1,961 [1,950 - 1,970]
		90% Range [Q05 to Q95]	1,922 to 1,979
		Range [min to max]	1,908 to 1,986
Month of birth	–	Missing (%)	0 (0.00%)
		Median [Q25 - Q75]	7 [4 - 10]
		90% Range [Q05 to Q95]	1 to 12
		Range [min to max]	1 to 12
Day of birth	–	Missing (%)	0 (0.00%)
		Median [Q25 - Q75]	16 [8 - 23]
		90% Range [Q05 to Q95]	2 to 29
		Range [min to max]	1 to 31
Location	–	Missing (%)	2,694 (100.00%)
		Zero count (%)	0 (0.00%)
		Distinct values	1
Provider	–	Missing (%)	2,694 (100.00%)
		Zero count (%)	0 (0.00%)
		Distinct values	1
Care site	–	Missing (%)	2,694 (100.00%)
		Zero count (%)	0 (0.00%)
		Distinct values	1

5 Disconnect from CDM

Finally, disconnect from the mock CDM.

cdmDisconnect(cdm = cdm)