The SelfControlledCohort package includes a suite of
diagnostics that evaluate whether the assumptions of the Self-Controlled
Cohort (SCC) design hold for a given analysis. These diagnostics run
automatically when runDiagnostics = TRUE and determine
whether study results should be unblinded (viewed) or
kept blinded until issues are resolved.
This vignette describes each diagnostic, the assumption it checks, and how results are interpreted.
Four core diagnostics are available to assess the validity of the SCC analysis.
| Diagnostic Name | Assumption Tested | Default Threshold |
|---|---|---|
| MDRR | Adequate statistical power | MDRR <= 10.0 |
| PRE_EXPOSURE | Correct temporal ordering | Rate Ratio <= 1.0, p > 0.05 |
| EVENT_DEPENDENT_OBSERVATION | Non-informative censoring | Proportion <= 10% |
| EASE | Low systematic error | EASE <= 0.25 |
Default thresholds are available via
getDefaultDiagnosticThresholds():
The MDRR quantifies the smallest rate ratio the study has 80% power to detect at alpha = 0.05. A high MDRR indicates that only very large effects would be detected — the study is underpowered.
The calculation uses the Musonda (2006) Signed Root Likelihood (SRL1) method, which is specifically designed for self-controlled designs. It finds the rate ratio satisfying the target power (80%) given the observed person-time and event counts in exposed and unexposed windows.
# Well-powered study
computeMdrrForRateRatio(
exposedPersonTime = 50000,
unexposedPersonTime = 150000,
exposedEvents = 40,
unexposedEvents = 90
)
# Underpowered study (SRL1 solver returns NA if power cannot be met)
computeMdrrForRateRatio(
exposedPersonTime = 500,
unexposedPersonTime = 1500,
exposedEvents = 3,
unexposedEvents = 7
)MDRR is the only diagnostic that affects Tier 2 (UNBLIND) but not Tier 1 (UNBLIND_FOR_CALIBRATION). This means a low-powered study can still serve as a negative control for empirical calibration, even if its point estimate should not be viewed directly.
This diagnostic detects whether outcomes occur before the exposure start date at a rate higher than expected. In a properly specified SCC analysis, outcomes should not systematically precede exposure.
Pre-exposure outcomes suggest one or more of:
The diagnostic is performed using a high-performance SQL query that aggregates counts directly in the database. For each target-outcome pair:
exposure_start_date and the window after.rateratio.test::rateratio.test.The diagnostic emits two rows: PRE_EXPOSURE_RATE_RATIO
and PRE_EXPOSURE_P_VALUE.
This diagnostic identifies whether the observation period ends shortly after an outcome event. If it does, the outcome may be causing censoring (e.g., the outcome leads to death or disenrollment), which biases the rate ratio.
The SCC design compares rates across exposed and unexposed windows within the same person. If observation tends to end after the outcome, then:
For each person with an outcome during the risk windows, the
diagnostic checks whether their observation_period_end_date
falls within 30 days after the outcome.
EASE quantifies the total expected systematic error in study estimates, combining both bias (deviation of the null distribution mean from zero) and imprecision (spread of the null distribution). It is computed from the null distribution fitted on negative control estimates.
Unlike the other diagnostics, EASE requires negative
controls and is computed after estimation
(during calibration). If no negativeControlPairs are
provided, the EASE diagnostic is simply skipped.
EmpiricalCalibration::fitNull().EmpiricalCalibration::computeExpectedAbsoluteSystematicError().The resulting value represents the expected absolute difference between the estimated and true log rate ratio for a random study estimate drawn from this analysis.
The individual diagnostics feed into a two-tier blinding system:
Diagnostics are run automatically when
runDiagnostics = TRUE (the default):
r eval=FALSE runSelfControlledCohort( connectionDetails = connectionDetails, cdmDatabaseSchema = "cdm", exposureIds = c(1118084), outcomeIds = c(313217), databaseId = "my_db", resultExportPath = "results", runDiagnostics = TRUE )
Results are saved to scc_diagnostics_summary.csv in the
export folder.
r eval=FALSE runSelfControlledCohort( ..., runDiagnostics = TRUE, diagnostics = c("mdrr", "ease") # Skip pre-exposure and event-dependent )