Futility and Harm Bounds for Overall Survival Monitoring
gsDesign 3.10.0 — test.type = 7 and test.type = 8
Motivation
- Oncology trials often use accelerated approval based on surrogate endpoints, with overall survival (OS) as a confirmatory endpoint.
- OS monitoring requires pre-specified boundaries for efficacy, futility, and harm.
- FDA draft guidance (Assessment of Overall Survival Evidence in Support of Accelerated Approval, 2024) expects:
- Pre-specified statistical analysis plan for interim OS
- Monitoring for OS harm (detrimental survival trend)
- Separate futility boundary
- Spending functions for each boundary
- The harm bound in gsDesign is a new method that is easy to implement — a principled, straightforward extension of widely used group sequential spending function methods.
- While we believe this approach is understandable, useful, and flexible, other methods for harm monitoring may also be considered.
Three-Boundary Design Framework
Standard (test.type = 3 / 4):
| \(Z >\) Efficacy bound |
Stop for efficacy |
| Futility \(< Z \leq\) Efficacy |
Continue |
| \(Z \leq\) Futility bound |
Stop for futility |
Harm bound (test.type = 7 / 8):
| \(Z >\) Efficacy bound |
Stop for efficacy |
| Futility \(< Z \leq\) Efficacy |
Continue |
| Harm \(< Z \leq\) Futility |
Stop for futility |
| \(Z \leq\) Harm bound |
Stop for harm |
Binding vs Non-Binding
test.type = 8 (non-binding): Type I error is controlled regardless of whether stopping rules are followed. Preferred in most regulatory settings.
test.type = 7 (binding): Assumes trial will stop at lower bounds. Yields slightly easier efficacy bounds and fewer required events, but Type I error may inflate if rules are not followed.
In practice, test.type = 8 is almost always preferred because DMCs retain discretion to continue or stop based on the totality of evidence.
Spending Functions
| Efficacy |
Lan-DeMets O’Brien-Fleming (sfLDOF) |
Conservative; spends little \(\alpha\) early |
| Futility |
Hwang-Shih-DeCani (\(\gamma = -2\)) |
Moderate \(\beta\)-spending under \(H_1\) |
| Harm |
Lan-DeMets Pocock (sfLDPocock) |
Aggressive early spending for safety |
Design Summary
Asymmetric two-sided group sequential design with non-binding futility
and harm bounds, 5 analyses, time-to-event outcome with sample size
1148 and 657 events required, 90 percent power, 1.25 percent (1-sided)
Type I error to detect a hazard ratio of 0.75. Enrollment and total
study durations are assumed to be 18 and 60 months, respectively.
Efficacy bounds derived using a Lan-DeMets O'Brien-Fleming
approximation spending function (no parameters). Futility bounds
derived using a Hwang-Shih-DeCani spending function with gamma = -2.
Harm bounds derived using a Lan-DeMets Pocock approximation spending
function.
Boundary Table
Z-value boundaries at each analysis
| 1 |
12 |
73 |
-2.11 |
-1.44 |
7.43 |
| 2 |
24 |
253 |
-1.77 |
0.12 |
3.86 |
| 3 |
36 |
416 |
-1.73 |
1.06 |
2.93 |
| 4 |
48 |
548 |
-1.72 |
1.74 |
2.53 |
| 5 |
60 |
657 |
-1.71 |
2.31 |
2.31 |
- Harm bound \(\leq\) Futility bound \(\leq\) Efficacy bound at every analysis
- Early efficacy bound is extreme (very unlikely to cross)
- Harm and futility bounds allow early stopping
Z-Value Boundaries
![]()
Z-value boundaries: efficacy (upper), futility (lower), harm (lowest)
Power Plot
![]()
Boundary crossing probabilities as a function of treatment effect
Treatment Effect at Boundaries
Bounds adjustable by:
- Alternate
astar (Type I error for excess OS)
- Alternate spending function
- Alternate timing of analyses
Bounds must be clinically, ethically, and statistically sound.
B-Values at Boundaries
![]()
B-values at boundaries (Proschan, Lan, Wittes 2006)
B-values \(= Z \times \sqrt{t}\). Under proportional hazards, expected B-values increase linearly with information fraction.
Harm Bound Mechanics
- Harm spending is computed under \(H_0\) (no treatment effect)
- Controls the probability of a false harm signal
- Harm bound is automatically capped at the futility bound
- Ensures ordering: harm \(\leq\) futility \(\leq\) efficacy
- The
astar parameter controls total harm boundary spending
- Default
astar = 0 auto-converts to 1 - alpha
- Set explicitly (e.g.,
astar = 0.1) for practical designs
Key API Parameters
test.type |
7 (binding) or 8 (non-binding) |
astar |
Total harm boundary spending under \(H_0\) |
sfharm |
Spending function for harm bound |
sfharmpar |
Parameters for harm spending function |
alpha |
One-sided Type I error for efficacy |
sfu / sfl |
Spending functions for efficacy / futility |
Works with gsDesign(), gsSurv() and gsSurvCalendar().