Study Diagnostics

2026-06-09

The SelfControlledCohort package includes a suite of diagnostics that evaluate whether the assumptions of the Self-Controlled Cohort (SCC) design hold for a given analysis. These diagnostics run automatically when runDiagnostics = TRUE and determine whether study results should be unblinded (viewed) or kept blinded until issues are resolved.

This vignette describes each diagnostic, the assumption it checks, and how results are interpreted.

1 Overview

Four core diagnostics are available to assess the validity of the SCC analysis.

Diagnostic Name Assumption Tested Default Threshold
MDRR Adequate statistical power MDRR <= 10.0
PRE_EXPOSURE Correct temporal ordering Rate Ratio <= 1.0, p > 0.05
EVENT_DEPENDENT_OBSERVATION Non-informative censoring Proportion <= 10%
EASE Low systematic error EASE <= 0.25

Default thresholds are available via getDefaultDiagnosticThresholds():

library(SelfControlledCohort)
str(getDefaultDiagnosticThresholds())
#> List of 6
#>  $ mdrrMaxAcceptable         : num 10
#>  $ maxPreExposureProportion  : num 0.05
#>  $ preExposurePThreshold     : num 0.05
#>  $ maxEventDependentCensoring: num 0.25
#>  $ minEventsPerWindow        : num 3
#>  $ easeMaxAcceptable         : num 0.25

2 Minimum Detectable Relative Risk (MDRR)

2.1 What it checks

The MDRR quantifies the smallest rate ratio the study has 80% power to detect at alpha = 0.05. A high MDRR indicates that only very large effects would be detected — the study is underpowered.

2.2 Method

The calculation uses the Musonda (2006) Signed Root Likelihood (SRL1) method, which is specifically designed for self-controlled designs. It finds the rate ratio satisfying the target power (80%) given the observed person-time and event counts in exposed and unexposed windows.

2.3 Interpretation

2.4 Example

# Well-powered study
computeMdrrForRateRatio(
  exposedPersonTime = 50000,
  unexposedPersonTime = 150000,
  exposedEvents = 40,
  unexposedEvents = 90
)

# Underpowered study (SRL1 solver returns NA if power cannot be met)
computeMdrrForRateRatio(
  exposedPersonTime = 500,
  unexposedPersonTime = 1500,
  exposedEvents = 3,
  unexposedEvents = 7
)

2.5 Role in blinding

MDRR is the only diagnostic that affects Tier 2 (UNBLIND) but not Tier 1 (UNBLIND_FOR_CALIBRATION). This means a low-powered study can still serve as a negative control for empirical calibration, even if its point estimate should not be viewed directly.

3 Pre-Exposure Gain

3.1 What it checks

This diagnostic detects whether outcomes occur before the exposure start date at a rate higher than expected. In a properly specified SCC analysis, outcomes should not systematically precede exposure.

3.2 Why it matters

Pre-exposure outcomes suggest one or more of:

3.3 Method

The diagnostic is performed using a high-performance SQL query that aggregates counts directly in the database. For each target-outcome pair:

  1. Count the number of outcome events occurring in the window before exposure_start_date and the window after.
  2. Calculate the corresponding person-time for both windows across all individuals.
  3. Run a one-sided rate ratio test using rateratio.test::rateratio.test.

3.4 Interpretation

The diagnostic emits two rows: PRE_EXPOSURE_RATE_RATIO and PRE_EXPOSURE_P_VALUE.

4 Event-Dependent Observation

4.1 What it checks

This diagnostic identifies whether the observation period ends shortly after an outcome event. If it does, the outcome may be causing censoring (e.g., the outcome leads to death or disenrollment), which biases the rate ratio.

4.2 Why it matters

The SCC design compares rates across exposed and unexposed windows within the same person. If observation tends to end after the outcome, then:

4.3 Method

For each person with an outcome during the risk windows, the diagnostic checks whether their observation_period_end_date falls within 30 days after the outcome.

4.4 Interpretation

5 Expected Absolute Systematic Error (EASE)

5.1 What it checks

EASE quantifies the total expected systematic error in study estimates, combining both bias (deviation of the null distribution mean from zero) and imprecision (spread of the null distribution). It is computed from the null distribution fitted on negative control estimates.

5.2 When it runs

Unlike the other diagnostics, EASE requires negative controls and is computed after estimation (during calibration). If no negativeControlPairs are provided, the EASE diagnostic is simply skipped.

5.3 Method

  1. Fit a null distribution to the negative control log rate ratios using EmpiricalCalibration::fitNull().
  2. Compute EASE using EmpiricalCalibration::computeExpectedAbsoluteSystematicError().

The resulting value represents the expected absolute difference between the estimated and true log rate ratio for a random study estimate drawn from this analysis.

5.4 Interpretation

5.5 Example

# Compute EASE from negative control estimates
negatives <- data.frame(
  rr = c(1.2, 0.8, 1.0, 1.1, 0.95),
  seLogRr = c(0.2, 0.1, 0.3, 0.15, 0.25)
)
computeEase(negatives)

6 Tiered Blinding

The individual diagnostics feed into a two-tier blinding system:

7 Running Diagnostics

Diagnostics are run automatically when runDiagnostics = TRUE (the default):

r eval=FALSE runSelfControlledCohort( connectionDetails = connectionDetails, cdmDatabaseSchema = "cdm", exposureIds = c(1118084), outcomeIds = c(313217), databaseId = "my_db", resultExportPath = "results", runDiagnostics = TRUE )

Results are saved to scc_diagnostics_summary.csv in the export folder.

7.1 Customizing thresholds

thresholds <- getDefaultDiagnosticThresholds()
thresholds$mdrrMaxAcceptable <- 15.0       # Allow higher MDRR
thresholds$maxPreExposureProportion <- 0.10  # Allow up to 10% pre-exposure

runSelfControlledCohort(
  ...,
  runDiagnostics = TRUE,
  diagnosticThresholds = thresholds
)

7.2 Selecting specific diagnostics

r eval=FALSE runSelfControlledCohort( ..., runDiagnostics = TRUE, diagnostics = c("mdrr", "ease") # Skip pre-exposure and event-dependent )

7.3 Inspecting failures

diagnostics <- read.csv("results/scc_diagnostics_summary.csv")

# Which target-outcome pairs had failures?
failures <- diagnostics[diagnostics$pass == 0 &
  !(diagnostics$diagnostic_name %in% c("UNBLIND", "UNBLIND_FOR_CALIBRATION")), ]
print(failures)

mirror server hosted at Truenetwork, Russian Federation.