Enhanced Futility Bounds Specification

Gernot Wassmer

October 6, 2025

Introduction

Current situation:

getDesignGroupSequential(
  kMax = 3,
  alpha = 0.025,
  sided = 1,
  typeOfDesign = "OF",
  futilityBounds = c(0, -Inf)  
) |> summary()

Sequential analysis with a maximum of 3 looks (group sequential design)

O’Brien & Fleming design, non-binding futility, one-sided overall significance level 2.5%, power 80%, undefined endpoint, inflation factor 1.0628, ASN H1 0.8528, ASN H01 0.8821, ASN H0 0.7059.

Stage 1 2 3
Planned information rate 33.3% 66.7% 100%
Cumulative alpha spent 0.0003 0.0072 0.0250
Stage levels (one-sided) 0.0003 0.0071 0.0225
Efficacy boundary (z-value scale) 3.471 2.454 2.004
Futility boundary (z-value scale) 0 -Inf
Cumulative power 0.0356 0.4617 0.8000
Futility probabilities under H1 0.048 0

Or derivation of futility bounds through beta spending approach

x <- getDesignGroupSequential(
  kMax = 3,
  alpha = 0.025,
  sided = 1,
  typeOfDesign = "asOF",
  typeBetaSpending = "bsKD",
  gammaB = 1.3,
  futilityStops = c(TRUE, FALSE)
) 
summary(x)

Sequential analysis with a maximum of 3 looks (group sequential design)

O’Brien & Fleming type alpha spending design and Kim & DeMets beta spending (gammaB = 1.3), non-binding futility, futility stops c(TRUE, FALSE), one-sided overall significance level 2.5%, power 80%, undefined endpoint, inflation factor 1.0586, ASN H1 0.8634, ASN H01 0.8829, ASN H0 0.7038.

Stage 1 2 3
Planned information rate 33.3% 66.7% 100%
Cumulative alpha spent 0.0001 0.0060 0.0250
Cumulative beta spent 0.0479 0.0479 0.2000
Stage levels (one-sided) 0.0001 0.0060 0.0231
Efficacy boundary (z-value scale) 3.710 2.511 1.993
Futility boundary (z-value scale) -0.001 -Inf
Cumulative power 0.0204 0.4370 0.8000
Futility probabilities under H1 0.048 0

plot(x)

Fisher’s Combination Test

getDesignFisher(
  kMax = 3,
  alpha = 0.025,
  sided = 1,
  method = "fullAlpha",
  alpha0Vec = c(0.5, 1)
) |> summary()

Sequential analysis with a maximum of 3 looks (Fisher’s combination test design)

Full last stage level design, binding futility, one-sided overall significance level 2.5%, undefined endpoint.

Stage 1 2 3
Fixed weight 1 1 1
Cumulative alpha spent 0.0084 0.0128 0.0250
Stage levels (one-sided) 0.0084 0.0084 0.0250
Efficacy boundary (p product scale) 0.0084123 0.0010734 0.0007284
Futility boundary (separate p-value scale) 0.5000 1.0000

Problem

  • For group sequential designs, futility bounds have to be specified on the \(z\)-value scale. For Fisher’s combination test, they are on the separate \(p\)-value scale

  • It is desired, however, to define it also for other scales, e.g., the conditional power scale

  • On the effect size scale, futility bounds are already the output in the getSampleSize...() and getPower...() functions. For example:

getSampleSizeMeans(
  design = 
    getDesignGroupSequential(
      kMax = 3,
      alpha = 0.025,
      sided = 1,
      typeOfDesign = "OF",
      futilityBounds = c(0, 0.5) 
    ),
    normalApproximation = TRUE
  ) |> summary()

Sample size calculation for a continuous endpoint

Sequential analysis with a maximum of 3 looks (group sequential design), one-sided overall significance level 2.5%, power 80%. The results were calculated for a two-sample t-test (normal approximation), H0: mu(1) - mu(2) = 0, H1: effect as specified, standard deviation = 1.

Stage 1 2 3
Planned information rate 33.3% 66.7% 100%
Cumulative alpha spent 0.0003 0.0072 0.0250
Stage levels (one-sided) 0.0003 0.0071 0.0225
Efficacy boundary (z-value scale) 3.471 2.454 2.004
Futility boundary (z-value scale) 0 0.500
Efficacy boundary (t), alt. = 0.2 0.416 0.208 0.139
Efficacy boundary (t), alt. = 0.4 0.831 0.416 0.277
Efficacy boundary (t), alt. = 0.6 1.247 0.623 0.416
Efficacy boundary (t), alt. = 0.8 1.663 0.831 0.554
Efficacy boundary (t), alt. = 1 2.078 1.039 0.693
Futility boundary (t), alt. = 0.2 0 0.042
Futility boundary (t), alt. = 0.4 0 0.085
Futility boundary (t), alt. = 0.6 0 0.127
Futility boundary (t), alt. = 0.8 0 0.169
Futility boundary (t), alt. = 1 0 0.212
Cumulative power 0.0359 0.4633 0.8000
Number of subjects, alt. = 0.2 279.0 557.9 836.9
Number of subjects, alt. = 0.4 69.7 139.5 209.2
Number of subjects, alt. = 0.6 31.0 62.0 93.0
Number of subjects, alt. = 0.8 17.4 34.9 52.3
Number of subjects, alt. = 1 11.2 22.3 33.5
Expected number of subjects under H1, alt. = 0.2 666.5
Expected number of subjects under H1, alt. = 0.4 166.6
Expected number of subjects under H1, alt. = 0.6 74.1
Expected number of subjects under H1, alt. = 0.8 41.7
Expected number of subjects under H1, alt. = 1 26.7
Overall exit probability (under H0) 0.5003 0.2459
Overall exit probability (under H1) 0.0833 0.4444
Exit probability for efficacy (under H0) 0.0003 0.0069
Exit probability for efficacy (under H1) 0.0359 0.4274
Exit probability for futility (under H0) 0.5000 0.2391
Exit probability for futility (under H1) 0.0474 0.0171

Legend:

  • (t): treatment effect scale
  • alt.: alternative

The function getFutilityBounds()

  • This new function converts futility bounds between different scales

  • For one-sided two-stage designs, futility bounds can be specified for different scales which are

    • the \(z\)-value or \(p\)-value scale

    • the effect size scale

    • the reverse conditional power scale

    • the conditional power scale

      Here one can select between:

      • the conditional power at some specified effect size
      • the conditional power at observed effect
      • the Bayesian predictive power
  • This can also be applied to inverse normal or Fisher combination tests

The function getFutilityBounds() (cont’d)

Examples

getFutilityBounds(
    sourceValue = 0,
    sourceScale = "zValue",
    targetScale = "pValue"
)
[1] 0.5


getFutilityBounds(
    sourceValue = c(0.5, 0.3),
    sourceScale = "pValue",
    targetScale = "zValue"
)
[1] 0.0000000 0.5244005

The function getFutilityBounds() (cont’d)

Examples

getDesignGroupSequential(
  futilityBounds = getFutilityBounds(
    sourceValue = c(0.5, 0.3),
    sourceScale = "pValue",
    targetScale = "zValue"
  )
) |> summary()

Sequential analysis with a maximum of 3 looks (group sequential design)

O’Brien & Fleming design, non-binding futility, one-sided overall significance level 2.5%, power 80%, undefined endpoint, inflation factor 1.0668, ASN H1 0.849, ASN H01 0.842, ASN H0 0.6214.

Stage 1 2 3
Planned information rate 33.3% 66.7% 100%
Cumulative alpha spent 0.0003 0.0072 0.0250
Stage levels (one-sided) 0.0003 0.0071 0.0225
Efficacy boundary (z-value scale) 3.471 2.454 2.004
Futility boundary (z-value scale) 0 0.524
Cumulative power 0.0359 0.4635 0.8000
Futility probabilities under H1 0.047 0.018

The function getFutilityBounds() (cont’d)

Examples

getDesignGroupSequential(
    kMax = 2, 
    typeOfDesign = "noEarlyEfficacy", 
    alpha = 0.05
) |> 
getFutilityBounds(
  information1 = 10,
  information2 = 10,
  sourceValue = 0.5,
  sourceScale = "condPowerAtObserved",
  targetScale = "pValue"
)
[1] 0.1223971


Some explanation is needed here

z-value and p-value scale

A futility bound \(u_1^0\) on the \(z\,\)-value scale is transformed to the \(p\,\)-value scale and vice versa via \[\begin{equation} \alpha_0 = 1 - \Phi(u_1^0) \;\hbox{ and }\; u_1^0 = \Phi^{-1}(1 - \alpha_0), \hbox{ respectively}. \end{equation}\]

Effect size scale

A futility bound \(u_1^0\) on the \(z\,\)-value scale is transformed to the effect size scale and vice versa via

\[\begin{equation} \hat\delta_0 = \frac{u_1^0}{\sqrt{I_1}} \;\hbox{ and }\; u_1^0 = \hat\delta_0 \sqrt{I_1}, \hbox{ respectively}, \end{equation}\]

where \(I_1\) is the first stage information.

For example, for a one-sample test with continuous endpoint and known variance \(\sigma^2\),

\[\begin{equation} I_1 = \frac{n_1}{\sigma^2}\;. \end{equation}\]

For other testing situations, this needs to be derived accordingly.

Reverse conditional power scale

According to Tan, Xiong, and Kutner (1998), the reverse conditional power, RCP, is an alternative tool for assessing futility of a trial.

For a two-stage trial using test statistics \(Z_1\) and \(Z_2^*\) at interim and at the final stage, respectively, the RCP is the conditional probability of obtaining results at least as disappointing as the current results given that a significant result will be obtained at the end of the trial.

“Reverse stochastic curtailment”

Let \(t_1\) be the information at interim. The formula for RCP is

\[\begin{equation} RCP = P(Z_1 \leq z_1 | Z_2^* = u_2) = \Phi\left(\frac{z_1 - \sqrt{t_1} u_2}{\sqrt{1 - t_1}} \right) \end{equation}\]

which is independent from the alternative because \(Z_2^*\) is a sufficient statistic (cf., Ortega-Villa et al. (2025)).

Notes:

  • A one-sided two-stage design with rejection boundary for the second stage has to be defined

  • No need to specify the (absolute) information \(I_1\) at interim, only information rate \(t_1\).

  • One attractive choice is stopping for futility if RCP \(\leq 0.025\) (\(\widehat = \;\; z \leq 0\) for two-stage design at level \(\alpha = 0.025\) with no early stopping)


Specifying an upper bound \(cp_0\) for the RCP with regard to futility stopping yields

\[\begin{equation} \begin{split} RCP &\geq cp_0 \\[3mm] z_1 &\geq \sqrt{1 - t_1}\;\Phi^{-1}(cp_0) + \sqrt{t_1} u_2 =: u_1^0 \;. \end{split} \end{equation}\]

as a lower bound for proceeding the trial (without stopping for futility).

Conditional power at specified effect size for the group sequential and inverse normal combination case

At interim, the conditional power is given by

\[\begin{equation} \begin{split} CP_{H_1} &= P_{H_1}(Z_2^* \geq u_2 \mid z_1) \\ &= P_{H_1}\left(Z_2 \geq \frac{u_2 - w_1 z_1}{w_2}\right) \\ &= 1 - \Phi\left(\frac{u_2 - w_1 z_1}{w_2} - \delta \sqrt{I_2}\right)\;, \end{split} \end{equation}\]

where \(w_1 = \sqrt{t_1}\), \(w_2 = \sqrt{1 - t_1}\), \(\delta\) is the treatment effect, and \(I_2\) is the second stage information.

Specifying an upper bound \(cp_0\) for the conditional power with regard to futility stopping yields

\[\begin{equation} \begin{split} CP_{H_1} &\geq cp_0 \\[3mm] \frac{u_2 - w_1 z_1}{w_2} - \delta \sqrt{I_2} &\leq \Phi^{-1}(1 - cp_0) \\ z_1 &\geq \frac{u_2 - w_2\big(\Phi^{-1}(1 - cp_0) + \delta \sqrt{I_2}\big)}{w_1} =: u_1^0 \;. \end{split} \end{equation}\]

as a lower bound for proceeding the trial (without stopping for futility).


Depends on the effect size \(\delta\) and \(I_2\).

Conditional power at observed effect for the group sequential and inverse normal combination case

If the observed treatment effect estimate, \(\hat\delta\), is used to calculate the conditional power,

\[\begin{equation} \hat\delta \sqrt{I_2} = z_1\sqrt{\frac{I_2}{I_1}} \;, \end{equation}\]

and therefore

\[\begin{equation} CP_{\hat H_1} = 1- \Phi\left(\frac{u_2 - w_1 z_1}{w_2} - z_1\sqrt{\frac{I_2}{I_1}}\right) \;. \end{equation}\]

Depends on \(I_1\) and \(I_2\) only through \(I_2 / I_1\), so absolute values are irrelevant.

The futility bound is then

\[\begin{equation} CP_{\hat H_1} \geq cp_0 \; \Leftrightarrow \; z_1 \geq \left(\frac{u_2}{w_2} - \Phi^{-1}(1 - cp_0)\right) / \left(\frac{w_1}{w_2} + \sqrt{\frac{I_2}{I_1}}\right)=: u_1^0 \end{equation}\]

Bayesian predictive power for the group sequential and inverse normal combination case

The Bayesian predictive power using a normal prior \(\pi_0\) with mean \(\delta_0\) and variance \(1 / I_0\) can be shown to be (cf., Wassmer and Brannath (2025), Sect. 7.4)

\[\begin{equation} PP_{\pi_0} = 1 - \Phi\left(\sqrt{\frac{I_0 + I_1}{I_0 + I_1 + I_2}}\left(\frac{u_2 - w_1 z_1}{w_2} - \hat\delta_{\pi_0}\sqrt{I_2}\right)\right) \end{equation}\]

where

\[\begin{equation*} \hat\delta_{\pi_0} = \delta_0 \frac{I_0}{I_0 + I_1} + \hat\delta \frac{I_1}{I_0 + I_1}\;. \end{equation*}\]

The Bayesian predictive power using a flat (improper) prior distribution \(\pi_0\) is then

\[\begin{equation} PP_{\pi_0} = 1 - \Phi\left(\sqrt{\frac{I_1}{I_1 + I_2}}\left(\frac{u_2 - w_1 z_1}{w_2} - z_1\sqrt{\frac{I_2}{I_1}}\right)\right) \end{equation}\]

From this, \[\begin{equation} \begin{split} PP_{\hat \delta} &\geq cp_0 \\[6mm] \Leftrightarrow \quad z_1 &\geq \left(\frac{u_2}{w_2} - \sqrt{1 + \frac{I_2}{I_1}}\;\Phi^{-1}(1 - cp_0)\right)/ \left(\frac{w_1}{w_2} + \sqrt{\frac{I_2}{I_1}}\right)=: u_1^0 \;. \end{split} \end{equation}\]


Depends on \(I_1\) and \(I_2\) only through \(I_2 / I_1\), so absolute values are irrelevant.

Conditional power at specified effect size for Fisher’s combination test

If \(p_1>u_2\), at interim, the conditional power is given by

\[\begin{equation} \begin{split} CP_{H_1} &= P_{H_1}(p_1 p_2^{w_2} \leq u_2 \mid p_1) \\[3mm] &\vdots \\[3mm] &= \Phi\left(\Phi^{-1}\left(\left(\frac{u_2}{1 - \Phi(z_1)}\right)^{1/w_2}\right) + \delta \sqrt{I_2}\right)\;, \end{split} \end{equation}\]

where \(w_2 = \sqrt{\frac{1 - t_1}{t_1}}\), \(\delta\) is the treatment effect, and \(I_2\) is the second stage information.

If \(p_1\leq u_2\), due to stochastic curtailment, \(CP_{H_1} = 1\).

Specifying an upper bound \(cp_0\) for the conditional power with regard to futility stopping yields

\[\begin{equation} \begin{split} CP_{H_1} &\geq cp_0 \\[6mm] \Leftrightarrow \quad z_1 &\geq \Phi^{-1}\left(1 - \frac{u_2}{\left(\Phi(\Phi^{-1}(cp_0) - \delta \sqrt{I_2})\right)^{w_2}}\right) =: u_1^0 \;. \end{split} \end{equation}\]

as a lower bound for proceeding the trial (without stopping for futility).

Conditional power at observed effect for Fisher’s combination test

If \(p_1>u_2\) and if the observed treatment effect estimate, \(\hat\delta\), is used to calculate the conditional power,

\[\begin{equation} CP_{\hat H_1} = \Phi\left(\Phi^{-1}\left(\left(\frac{u_2}{1 - \Phi(z_1)}\right)^{1/w_2}\right) + z_1\sqrt{\frac{I_2}{I_1}}\right)\;. \end{equation}\]

If \(p_1\leq u_2\), \(CP_{\hat H_1} = 1\).

The futility bound is then found numerically by finding the minimum \(z_1\) value fulfilling

\[\begin{equation} CP_{\hat H_1} \geq cp_0 \;. \end{equation}\]

Bayesian predictive power for Fisher’s combination test

If \(p_1>u_2\), the Bayesian predictive power using a flat (improper) prior distribution \(\pi_0\) is

\[\begin{equation} PP_{\pi_0} = \Phi\left(\sqrt{\frac{I_1}{I_1 + I_2}}\left( \Phi^{-1}\left(\left(\frac{u_2}{1 - \Phi(z_1)}\right)^{1/w_2}\right) + z_1\sqrt{\frac{I_2}{I_1}}\right)\right)\;, \end{equation}\] If \(p_1\leq u_2\), \(PP_{\pi_0} = 1\).

The futility bound is found numerically by finding the minimum \(z_1\) value fulfilling

\[\begin{equation} PP_{\pi_0} \geq cp_0 \;. \end{equation}\]

Examples

p-value vs. conditional power at observed effect

Promizing zone approach (Mehta and Pocock (2011)): Increase sample size if conditional power at observed effect exceeds 50% (refined values exist). Then traditional test statistic can be used.

design <- getDesignGroupSequential(
    kMax = 2, 
    typeOfDesign = "OF", 
    alpha = 0.025)

futilityBounds = seq(0.01, 0.5, by = 0.01)
y <- getFutilityBounds(
    sourceValue = futilityBounds,
    sourceScale = "pValue",
    targetScale = "condPowerAtObserved",
    information1 = 1,
    information2 = 1,
    design = design
)
dat <- data.frame(pValue = futilityBounds, condPower = y)

ggplot(dat, aes(pValue, condPower)) +
    geom_line(lwd = 0.75) +
    geom_vline(xintercept = c(0.081, 0.114), linetype = "dashed", color = "green", lwd = 0.5) +
    geom_hline(yintercept = c(0.35, 0.5), linetype = "dashed", color = "red", lwd = 0.5) +
    scale_x_continuous(breaks = c(0.081, 0.114, 0.2, 0.3, 0.4, 0.5)) +
    scale_y_continuous(breaks = seq(0,1,0.1)) +
    theme_classic()

p-value vs. conditional power at observed effect

#  Note: Arbitrary absolute values of information1 = information2 can be specified
getFutilityBounds(
    sourceValue = c(0.35, 0.5),
    sourceScale = "condPowerAtObserved",
    targetScale = "pValue",
    information1 = 1,
    information2 = 1,
    design = design
)
[1] 0.11398692 0.08101828

Predictive power vs. conditional power at observed effect

design <- getDesignGroupSequential(
    typeOfDesign = "noEarlyEfficacy", 
    alpha = 0.025, 
    informationRates = c(0.4, 1))

futilityBounds = seq(-0.5, 2.5, by = 0.025)
y <- getFutilityBounds(
    sourceValue = futilityBounds,
    sourceScale = "zValue",
    targetScale = "predictivePower",
    information1 = 10,
    information2 = 10,
    design = design
)
dat <- data.frame(zValue = futilityBounds, CP = y, out = "pred")
y <- getFutilityBounds(
    sourceValue = futilityBounds,
    sourceScale = "zValue",
    targetScale = "condPowerAtObserved",
    information1 = 10,
    information2 = 10,
    design = design
)
dat <- rbind(dat, data.frame(zValue = futilityBounds, CP = y, out = "cond"))

ggplot(dat, aes(zValue, CP, out)) +
    geom_line(aes(linetype = out), lwd = 0.75) +
    geom_hline(yintercept = 0.5, color = "red", lwd = 0.55) +
    theme_classic()

Predictive power vs. conditional power at observed effect

z-value vs reverse conditional power

design <- getDesignGroupSequential(
    kMax = 2, 
    typeOfDesign = "noEarlyEfficacy", 
    alpha = 0.025)

futilityBounds = seq(0.1, 0.9, by = 0.01)
y <- getFutilityBounds(
    sourceValue = futilityBounds,
    sourceScale = "pValue",
    targetScale = "reverseCondPower",
    design = design
)
dat <- data.frame(pValue = futilityBounds, reverseCondPow = y)

ggplot(dat, aes(pValue, reverseCondPow)) +
    geom_line(lwd = 0.75) +
    geom_vline(xintercept = c(0.2, 0.4, 0.5), linetype = "dashed", color = "green", lwd = 0.5) +
    geom_hline(yintercept = c(0.025, 0.055, 0.221), linetype = "dashed", color = "red", lwd = 0.5) +
    scale_x_continuous(breaks = seq(0,1,0.1)) +
    scale_y_continuous(breaks = c(0, 0.025, 0.055, 0.1, 0.2, 0.221, 0.3, 0.4)) +
    theme_classic()

z-value vs reverse conditional power

getFutilityBounds(
    sourceValue = c(0.2, 0.4, 0.5),
    sourceScale = "pValue",
    targetScale = "reverseCondPower",
    design = design
)
[1] 0.22072949 0.05461352 0.02500000

Summary

  • Several possibilities to define futility stopping
  • Predictive interval plots might be another alternative (cf., Ortega-Villa et al. (2025))
  • beta spending function might help to construct futility bounds
  • All boundaries should be considered as guidelines rather than strict rules, i.e., as a non-binding rule.
  • getFutilityBounds() function as a separate tool
  • Extensively tested, e.g., through reverse checks
  • Will be included in sample size and power calculation features, esp., to obtain informations and effect size automatically.

References

Mehta, Cyrus R, and Stuart J Pocock. 2011. “Adaptive Increase in Sample Size When Interim Results Are Promising: A Practical Guide with Examples.” Statistics in Medicine 30 (28): 3267–84.
Ortega-Villa, Ana M, Megan C Grieco, Kevin Rubenstein, Jing Wang, and Michael A Proschan. 2025. “Futility Monitoring in Clinical Trials.” Statistics in Medicine 44 (13-14): e70157.
Tan, Ming, Xiaoping Xiong, and Michael H Kutner. 1998. “Clinical Trial Designs Based on Sequential Conditional Probability Ratio Tests and Reverse Stochastic Curtailing.” Biometrics, 682–95.
Wassmer, Gernot, and Werner Brannath. 2025. Group Sequential and Confirmatory Adaptive Designs in Clinical Trials. 2nd ed. Springer. https://link.springer.com/book/10.1007/978-3-031-89669-9.