This vignette provides comprehensive guidance on power analysis and
sample size determination for method comparison and agreement studies
using the SimplyAgree package.
SimplyAgree implements four approaches to power/sample
size calculations:
power_agreement_exact() - Exact
agreement test (Shieh
2019)blandPowerCurve() - Bland-Altman power
curves (Lu et al.
2016)agree_expected_half() - Expected
half-width criterion (Jan and Shieh 2018)agree_assurance() - Assurance
probability criterion (Jan and Shieh 2018)The methods divide into two categories:
Hypothesis Testing (binary decision):
power_agreement_exact() - Tests if central proportion,
essentially tolerance intevals, are within the maximal allowable
differenceblandPowerCurve() - Tests if confidence intervals of
limits of agreement fall within the maximal allowable differenceEstimation (quantifying precision):
agree_expected_half() - Controls average CI half-width
of limits of agreementagree_assurance() - Controls probability of achieving
target CI half-width of limits of agreementTests whether the central P* proportion of paired differences falls within the maximal allowable difference [-delta, delta].
Hypotheses:
power_agreement_exact(
n = NULL, # Sample size
delta = NULL, # Tolerance bound
mu = 0, # Mean of differences
sigma = NULL, # SD of differences
p0_star = 0.95, # Central proportion (tolerance coverage)
power = NULL, # Target power
alpha = 0.05 # Significance level
)Specify exactly three of: n, delta, power, sigma.
# Blood pressure device comparison
result <- power_agreement_exact(
delta = 7, # +/-7 mmHg tolerance
mu = 0.5, # Expected bias
sigma = 2.5, # Expected SD
p0_star = 0.95, # 95% must be within bounds
power = 0.80, # 80% power
alpha = 0.05
)
#> Maximum iterations reached in gamma computation
print(result)
#>
#> Power for Exact Method for Assessing Agreement Between Two Methods
#>
#> n = 34
#> delta = 7
#> mu = 0.5
#> sigma = 2.5
#> p0_star = 0.95
#> p1_star = 0.9939889
#> alpha = 0.05
#> power = 0.8018321
#> critical_value = 13.57044
#>
#> NOTE: H0: Central 95% of differences not within [-delta, delta]
#> H1: Central 99.4% of differences within [-delta, delta]
#> n is number pairs. Two measurements per unit; one for each method.Calculates power curves using approximate Bland-Altman confidence intervals using the method of Lu et al. (2016) (which is approximate). Useful for exploring power across sample sizes.
Determines sample size to ensure average CI half-width <= delta across hypothetical repeated studies.
agree_expected_half(
conf.level = 0.95, # CI confidence level
delta = NULL, # Target expected half-width
pstar = 0.95, # Central proportion
sigma = 1, # SD of differences
n = NULL # Sample size
)Specify either n OR delta.
# Want E[H] <= 2.5*sigma
result <- agree_expected_half(
conf.level = 0.95,
delta = 2.5, # As multiple of sigma
pstar = 0.95,
sigma = 1 # Standardized
)
print(result)
#>
#> Expected half-width and sample size for limits of agreement
#>
#> n = 52
#> conf.level = 0.95
#> target.delta = 2.5
#> actual.delta = 2.49677
#> pstar = 0.95
#> sigma = 1
#> g = 2.509039
#> c = 1.004914
#> zp = 1.959964Determines sample size to ensure probability that CI half-width <= omega is at least (1-gamma).
Stronger guarantee than expected half-width — ensures specific probability of achieving target precision.
agree_assurance(
conf.level = 0.95, # CI confidence level
assurance = 0.90, # Target assurance probability
omega = NULL, # Target half-width bound
pstar = 0.95, # Central proportion
sigma = 1, # SD of differences
n = NULL # Sample size
)Specify either n OR omega.
# Want 90% probability that H <= 2.5*sigma
result <- agree_assurance(
conf.level = 0.95,
assurance = 0.90, # 90% probability
omega = 2.5, # Target bound
pstar = 0.95,
sigma = 1
)
print(result)
#>
#> Assurance probability & sample size for Limits of Agreement
#>
#> n = 115
#> conf.level = 0.95
#> assurance = 0.9
#> actual.assurance = 0.9024848
#> omega = 2.5
#> pstar = 0.95
#> sigma = 1
#> g = 2.306167
#> zp = 1.959964Research Goal?
|
|- Hypothesis Testing ->
| \- Need exact Type I error control -> Power for Agreement
|
\- Precision Estimation ->
|- Average precision sufficient -> Expected Half-Width
\- Need probabilistic guarantee -> Assurance Probability
Many studies have clustered data where there are multiple measurements per subject or natural groupings (e.g., repeated measures, multi-center studies). Note, the advice here only applies to clustering but not to situations where replicate measures are taken within a measurement occasion (e.g., multiple measures at the same time point wherein any variation would only represent measurement error).
Standard formulas assume independence1. Ignoring clustering can leads to studies that lack precision. To my knowledge, there is no well developed methods for accounting for clustering in sample size calculations for agreement studies, so we use a common approximation from survey sampling and multilevel modeling: the design effect.
The design effect (DEFF) quantifies loss of efficiency due to clustering:
\[\text{DEFF} = 1 + (m - 1) \times \text{ICC}\]
where:
Effect on sample size: \[n_{\text{ESS}} = n_{\text{independent}} \times \text{DEFF}\]
ICC = proportion of variance between clusters:
\[\text{ICC} = \frac{\sigma^2_{\text{between}}}{\sigma^2_{\text{between}} + \sigma^2_{\text{within}}}\]
# Step 1: Independent sample size
result <- power_agreement_exact(
delta = 7, mu = 0.5, sigma = 2.5,
p0_star = 0.95, power = 0.80, alpha = 0.05
)
#> Maximum iterations reached in gamma computation
n_indep <- result$n
cat("Independent pairs needed:", n_indep, "\n")
#> Independent pairs needed: 34
# Step 2: Apply design effect
m <- 3 # 3 measurements per participant
ICC <- 0.15 # from pilot or literature
DEFF <- 1 + (m - 1) * ICC
cat("Design effect:", round(DEFF, 3), "\n")
#> Design effect: 1.3
# Step 3: Calculate participants needed
n_ess <- ceiling(n_indep * DEFF)
K <- ceiling(n_ess / m)
cat("Total observations:", n_ess, "\n")
#> Total observations: 45
cat("Participants needed:", K, "\n")
#> Participants needed: 15Result: Instead of 34 independent pairs, need ~15 participants (45 total observations).
# Compare different ICC values
n_indep <- 50
m <- 4
ICC_values <- c(0, 0.05, 0.10, 0.15, 0.20)
for (ICC in ICC_values) {
DEFF <- 1 + (m - 1) * ICC
K <- ceiling(ceiling(n_indep * DEFF) / m)
cat(sprintf("ICC = %.2f: Need %d participants\n", ICC, K))
}
#> ICC = 0.00: Need 13 participants
#> ICC = 0.05: Need 15 participants
#> ICC = 0.10: Need 17 participants
#> ICC = 0.15: Need 19 participants
#> ICC = 0.20: Need 20 participantsGood situations:
Problematic:
For complex designs, consider simulation-based power analysis and consult a statistician.
# Study parameters
sigma <- 3.3
delta <- 7
m <- 4 # measurements per participant
ICC <- 0.15
dropout <- 0.20
# Step 1: Independent sample size
result <- power_agreement_exact(
delta = delta, mu = 0, sigma = sigma,
p0_star = 0.95, power = 0.80, alpha = 0.05
)
#> Maximum iterations reached in gamma computation
# Step 2: Account for clustering
DEFF <- 1 + (m - 1) * ICC
n_total <- ceiling(result$n * DEFF)
K_pre <- ceiling(n_total / m)
# Step 3: Account for dropout
K_final <- ceiling(K_pre / (1 - dropout))
# Summary
cat("Independent pairs:", result$n, "\n")
#> Independent pairs: 566
cat("Design effect:", round(DEFF, 3), "\n")
#> Design effect: 1.45
cat("Participants (no dropout):", K_pre, "\n")
#> Participants (no dropout): 206
cat("Participants to recruit:", K_final, "\n")
#> Participants to recruit: 258
cat("Total measurements:", K_final * m, "\n")
#> Total measurements: 1032When uncertain: