Futility and Harm Bounds for Overall Survival Monitoring

gsDesign 3.10.0 — test.type = 7 and test.type = 8

Keaven Anderson

Motivation

  • Oncology trials often use accelerated approval based on surrogate endpoints, with overall survival (OS) as a confirmatory endpoint.
  • OS monitoring requires pre-specified boundaries for efficacy, futility, and harm.
  • FDA draft guidance (Assessment of Overall Survival Evidence in Support of Accelerated Approval, 2024) expects:
    • Pre-specified statistical analysis plan for interim OS
    • Monitoring for OS harm (detrimental survival trend)
    • Separate futility boundary
    • Spending functions for each boundary
  • The harm bound in gsDesign is a new method that is easy to implement — a principled, straightforward extension of widely used group sequential spending function methods.
  • While we believe this approach is understandable, useful, and flexible, other methods for harm monitoring may also be considered.

Three-Boundary Design Framework

Standard (test.type = 3 / 4):

Region Action
\(Z >\) Efficacy bound Stop for efficacy
Futility \(< Z \leq\) Efficacy Continue
\(Z \leq\) Futility bound Stop for futility

Harm bound (test.type = 7 / 8):

Region Action
\(Z >\) Efficacy bound Stop for efficacy
Futility \(< Z \leq\) Efficacy Continue
Harm \(< Z \leq\) Futility Stop for futility
\(Z \leq\) Harm bound Stop for harm

Binding vs Non-Binding

  • test.type = 8 (non-binding): Type I error is controlled regardless of whether stopping rules are followed. Preferred in most regulatory settings.

  • test.type = 7 (binding): Assumes trial will stop at lower bounds. Yields slightly easier efficacy bounds and fewer required events, but Type I error may inflate if rules are not followed.

In practice, test.type = 8 is almost always preferred because DMCs retain discretion to continue or stop based on the totality of evidence.

Example Design Setup

library(gsDesign)

x8 <- gsSurvCalendar(
  test.type = 8,
  alpha = 0.0125, beta = 0.1, astar = 0.1,
  calendarTime = c(12, 24, 36, 48, 60),
  sfu = sfLDOF,
  sfl = sfHSD, sflpar = -2,
  sfharm = sfLDPocock,
  lambdaC = log(2) / 36,
  hr = 0.75, R = 18, minfup = 42
)
  • 1:1 randomized, median control survival 36 months, target HR = 0.75
  • 90% power, one-sided \(\alpha = 0.0125\)
  • Analyses at 12, 24, 36, 48, 60 months
  • astar = 0.1: 10% probability of crossing harm bound under \(H_0\)

Spending Functions

Boundary Function Strategy
Efficacy Lan-DeMets O’Brien-Fleming (sfLDOF) Conservative; spends little \(\alpha\) early
Futility Hwang-Shih-DeCani (\(\gamma = -2\)) Moderate \(\beta\)-spending under \(H_1\)
Harm Lan-DeMets Pocock (sfLDPocock) Aggressive early spending for safety

Design Summary

Asymmetric two-sided group sequential design with non-binding futility
and harm bounds, 5 analyses, time-to-event outcome with sample size
1148 and 657 events required, 90 percent power, 1.25 percent (1-sided)
Type I error to detect a hazard ratio of 0.75. Enrollment and total
study durations are assumed to be 18 and 60 months, respectively.
Efficacy bounds derived using a Lan-DeMets O'Brien-Fleming
approximation spending function (no parameters). Futility bounds
derived using a Hwang-Shih-DeCani spending function with gamma = -2.
Harm bounds derived using a Lan-DeMets Pocock approximation spending
function.

Detailed Boundary Summary

Analysis Value Harm Futility Efficacy
IA 1: 11% Z -2.1121 -1.4408 7.4336
N: 766 p (1-sided) 0.9827 0.9252 0.0000
Events: 73 ~HR at bound 1.6434 1.4034 0.1740
Month: 12 P(Cross) if HR=1 0.0173 0.0748 0.0000
P(Cross) if HR=0.75 0.0004 0.0039 0.0000
IA 2: 38% Z -1.7667 0.1212 3.8622
N: 1148 p (1-sided) 0.9614 0.4518 0.0001
Events: 253 ~HR at bound 1.2491 0.9849 0.6149
Month: 24 P(Cross) if HR=1 0.0507 0.5554 0.0001
P(Cross) if HR=0.75 0.0004 0.0181 0.0574
IA 3: 63% Z -1.7256 1.0566 2.9347
N: 1148 p (1-sided) 0.9578 0.1454 0.0017
Events: 416 ~HR at bound 1.1846 0.9015 0.7497
Month: 36 P(Cross) if HR=1 0.0736 0.8641 0.0017
P(Cross) if HR=0.75 0.0004 0.0398 0.4990
IA 4: 83% Z -1.7170 1.7357 2.5278
N: 1148 p (1-sided) 0.9570 0.0413 0.0057
Events: 548 ~HR at bound 1.1580 0.8622 0.8057
Month: 48 P(Cross) if HR=1 0.0890 0.9631 0.0062
P(Cross) if HR=0.75 0.0004 0.0675 0.7996
Final Z -1.7149 2.3072 2.3072
N: 1148 p (1-sided) 0.9568 0.0105 0.0105
Events: 657 ~HR at bound 1.1433 0.8352 0.8352
Month: 60 P(Cross) if HR=1 0.1000 0.9888 0.0112
P(Cross) if HR=0.75 0.0004 0.1000 0.9000

Conditional power (CP) and predictive power (PP) can also be computed and included via gsBoundSummary(x8, exclude = c()).

Boundary Table

Z-value boundaries at each analysis
Analysis Month Events Harm Futility Efficacy
1 12 73 -2.11 -1.44 7.43
2 24 253 -1.77 0.12 3.86
3 36 416 -1.73 1.06 2.93
4 48 548 -1.72 1.74 2.53
5 60 657 -1.71 2.31 2.31
  • Harm bound \(\leq\) Futility bound \(\leq\) Efficacy bound at every analysis
  • Early efficacy bound is extreme (very unlikely to cross)
  • Harm and futility bounds allow early stopping

Z-Value Boundaries

Z-value boundaries: efficacy (upper), futility (lower), harm (lowest)

Power Plot

Boundary crossing probabilities as a function of treatment effect

Treatment Effect at Boundaries

Approximate hazard ratio at each boundary

Bounds adjustable by:

  • Alternate astar (Type I error for excess OS)
  • Alternate spending function
  • Alternate timing of analyses

Bounds must be clinically, ethically, and statistically sound.

B-Values at Boundaries

B-values at boundaries (Proschan, Lan, Wittes 2006)

B-values \(= Z \times \sqrt{t}\). Under proportional hazards, expected B-values increase linearly with information fraction.

Boundary Crossing Probabilities

Scenario Analysis P(Efficacy) P(Futility) P(Harm)
Under H0 (HR=1) 1 0.0000 0.0748 0.0173
Under H0 (HR=1) 2 0.0001 0.5554 0.0507
Under H0 (HR=1) 3 0.0017 0.8641 0.0736
Under H0 (HR=1) 4 0.0062 0.9631 0.0890
Under H0 (HR=1) 5 0.0112 0.9888 0.1000
Under H1 (HR=0.75) 1 0.0000 0.0039 0.0004
Under H1 (HR=0.75) 2 0.0574 0.0181 0.0004
Under H1 (HR=0.75) 3 0.4990 0.0398 0.0004
Under H1 (HR=0.75) 4 0.7996 0.0675 0.0004
Under H1 (HR=0.75) 5 0.9000 0.1000 0.0004

Under \(H_0\): cumulative harm probability \(\approx\) 0.1. Under \(H_1\): harm crossing is negligible.

Binding vs Non-Binding Comparison

Z-value boundaries
Bound Binding (type 7) Non-binding (type 8)
Efficacy 7.434, 3.862, 2.934, 2.523, 2.248 7.434, 3.862, 2.935, 2.528, 2.307
Futility -1.458, 0.09, 1.016, 1.689, 2.248 -1.441, 0.121, 1.057, 1.736, 2.307
Harm -2.112, -1.767, -1.726, -1.717, -1.715 -2.112, -1.767, -1.726, -1.717, -1.715
  • Type 7 efficacy bounds are slightly lower (easier to cross)
  • Max events: type 7 = 639 vs type 8 = 657

Alternate Alpha Levels

gsBoundSummary() with alpha = 0.025 shows efficacy bounds at both the design \(\alpha = 0.0125\) and \(\alpha = 0.025\) — useful when the OS endpoint may later receive a larger share of \(\alpha\).

Analysis Value α=0.0125 α=0.025 Futility Harm
IA 1: 11% Z 7.4336 6.6513 -1.4408 -2.1121
N: 766 p (1-sided) 0.0000 0.0000 0.9252 0.9827
Events: 73 ~HR at bound 0.1740 0.2092 1.4034 1.6434
Month: 12 P(Cross) if HR=1 0.0000 0.0000 0.0748 0.0173
P(Cross) if HR=0.75 0.0000 0.0000 0.0039 0.0004
IA 2: 38% Z 3.8622 3.4312 0.1212 -1.7667
N: 1148 p (1-sided) 0.0001 0.0003 0.4518 0.9614
Events: 253 ~HR at bound 0.6149 0.6492 0.9849 1.2491
Month: 24 P(Cross) if HR=1 0.0001 0.0003 0.5554 0.0507
P(Cross) if HR=0.75 0.0574 0.1259 0.0181 0.0004
IA 3: 63% Z 2.9347 2.5948 1.0566 -1.7256
N: 1148 p (1-sided) 0.0017 0.0047 0.1454 0.9578
Events: 416 ~HR at bound 0.7497 0.7751 0.9015 1.1846
Month: 36 P(Cross) if HR=1 0.0017 0.0048 0.8641 0.0736
P(Cross) if HR=0.75 0.4990 0.6323 0.0398 0.0004
IA 4: 83% Z 2.5278 2.2359 1.7357 -1.7170
N: 1148 p (1-sided) 0.0057 0.0127 0.0413 0.9570
Events: 548 ~HR at bound 0.8057 0.8261 0.8622 1.1580
Month: 48 P(Cross) if HR=1 0.0062 0.0138 0.9631 0.0890
P(Cross) if HR=0.75 0.7996 0.8684 0.0675 0.0004
Final Z 2.3072 2.0432 2.3072 -1.7149
N: 1148 p (1-sided) 0.0105 0.0205 0.0105 0.9568
Events: 657 ~HR at bound 0.8352 0.8526 0.8352 1.1433
Month: 60 P(Cross) if HR=1 0.0112 0.0201 0.9888 0.1000
P(Cross) if HR=0.75 0.9000 0.9218 0.1000 0.0004

Harm Bound Mechanics

  • Harm spending is computed under \(H_0\) (no treatment effect)
    • Controls the probability of a false harm signal
  • Harm bound is automatically capped at the futility bound
    • Ensures ordering: harm \(\leq\) futility \(\leq\) efficacy
  • The astar parameter controls total harm boundary spending
    • Default astar = 0 auto-converts to 1 - alpha
    • Set explicitly (e.g., astar = 0.1) for practical designs

Key API Parameters

Parameter Description
test.type 7 (binding) or 8 (non-binding)
astar Total harm boundary spending under \(H_0\)
sfharm Spending function for harm bound
sfharmpar Parameters for harm spending function
alpha One-sided Type I error for efficacy
sfu / sfl Spending functions for efficacy / futility

Works with gsDesign(), gsSurv() and gsSurvCalendar().

Summary

  1. Three boundaries — efficacy, futility, harm — address FDA guidance on OS monitoring
  2. test.type = 8 (non-binding) is preferred for regulatory submissions
  3. Spending functions independently control each boundary’s aggressiveness
  4. astar parameter gives direct control over harm boundary spending
  5. All 6 plot types supported for visualization
  6. gsBoundSummary() supports alternate \(\alpha\) levels and conditional/predictive power for test.type 7/8

This is a new method made easy to implement in gsDesign — a principled, straightforward extension of frequently used group sequential methods. While we believe it is understandable, useful, and flexible, other approaches to harm monitoring may also be considered.

Available in gsDesign ≥ 3.10.0.