Leapfrog Exercises

Exercises and scripts to help introduce the ‘leapfrog’ trial design

These are adapted from a workshop “Innovative approaches to clinical trial design and analysis”, given at Ruhr-Universität Bochum, 19.11.2022. This current document is published at https://www.psych.uni-goettingen.de/en/trace/team/blackwell-simon under a CC BY 4.0 license. Please contact Simon Blackwell () if you have any questions, and visit https://www.psych.uni-goettingen.de/en/trace/team/blackwell-simon for further information.

Click here to return to the main leapfrog information page.

Packages needed

You will need to install and load the following packages for these exercises:

install.packages("BayesFactor")
install.packages("MASS")
install.packages("devtools") # if not yet installed
library(devtools)
install_github("nicebread/BFDA", subdir="package")

library(BFDA)
library(BayesFactor)
library(MASS)

Exercise 2: Simulating data using BFDA and selecting trial parameters

The Bayes factor design analysis (BFDA) package provides a simple way to simulate and analyse sequential bayesian analyses for trial planning. For more complex analyses you can create your own custom simulations (e.g. see the scripts at https://osf.io/8mxda/ or https://osf.io/b397u/  for examples)

Please see https://rawgit.com/nicebread/BFDA/master/package/doc/BFDA_manual.html for the BFDA manual and full instructions

1. Simulate sequential analyses for planning

A first step is to simulate the results of sequential analyses under i) the alternative hypothesis for the target effect size of interest (to find parameters that give you sufficient power), and ii) the null hypothesis (to find parameters that give a suitable false-positive (Type 1) error rate). Once you have suitable parameters for these, you can then simulate a range of other effect sizes to get a better picture of your power at different effect sizes.

As simulations can take some time, for demonstration purposes in this exercise we will i) attempt to find a medium effect size (d = 0.5), as this requires smaller maximum sample sizes, and ii) only do a small-ish number of simulations (1000), as we are not too worried about accuracy/precision - for planning an actual trial you would want to do a larger number (e.g. 5000)

Please run the following simulations as preparation:

set.seed(19112022)

#for the following I use the naming convention sim.Hd followed by the simulated effect size, e.g. sim.Hd0.5 is for simulating the hypothesis of a between-group effect size of d = 0.5

sim.Hd0.5 <- BFDA.sim(expected.ES=0.5, type="t.between",prior=list("Cauchy",list(prior.location=0, prior.scale=sqrt(2)/2)), n.min=10, n.max=80, alternative="greater", boundary=Inf, B=1000, verbose=TRUE, stepsize = 1)

sim.Hd0.0 <- BFDA.sim(expected.ES=0, type="t.between", prior=list("Cauchy", list(prior.location=0, prior.scale=sqrt(2)/2)), n.min=10, n.max=80, alternative="greater", boundary=Inf, B=1000, verbose=TRUE, stepsize = 1)

#If you like (for later) you can now also run some in-between or larger values, e.g.

sim.Hd0.2 <- BFDA.sim(expected.ES=0.2, type="t.between", prior=list("Cauchy", list(prior.location=0, prior.scale=sqrt(2)/2)), n.min=10, n.max=80, alternative="greater", boundary=Inf, B=1000, verbose=TRUE, stepsize = 1)

sim.Hd0.4 <- BFDA.sim(expected.ES=0.4, type="t.between", prior=list("Cauchy", list(prior.location=0, prior.scale=sqrt(2)/2)), n.min=10, n.max=80, alternative="greater", boundary=Inf, B=1000, verbose=TRUE, stepsize = 1)

sim.Hd0.6 <- BFDA.sim(expected.ES=0.6, type="t.between", prior=list("Cauchy", list(prior.location=0, prior.scale=sqrt(2)/2)), n.min=10, n.max=80, alternative="greater", boundary=Inf, B=1000, verbose=TRUE, stepsize = 1)

#etc...

As an explanation, these simulate sequential Bayes-factor based analyses based on:

  1. Effect size of d = 0.5 (expected.ES=0.5) for sim.Hd0.5, or d=0 for sim.Hd0.0

  2. a between-group t-test, e.g. difference between two groups in change in symptoms (type=“t.between”)

  3. use a default Cauchy prior with a rscale value of 0.707 (prior=list(“Cauchy”, list(prior.location=0, prior.scale=sqrt(2)/2))) [we won’t go into priors here but default ones like this work fine! See Stefan et al. (2019) for discussion of using informed priors with BFDA, https://link.springer.com/article/10.3758/s13428-018-01189-8]

  4. a minimum sample size (per arm) of 10 (n.min = 10) - note that when trying to find parameters later you can choose higher minimum sample sizes, but not lower

  5. a maximum sample size (per arm) of 80 (n.max = 80) - note that when trying out parameters later on you can try lower, but not higher maximum sample sizes

  6. a directional (i.e. one-tailed) Bayes Factor (alternative = “greater” and boundary = Inf). We are interested in whether one treatment is superior to another, not whether it is different.

  7. we use 1000 simulations (B = 1000) - fine for getting a feel, but for finalising your study design you would want a larger number e.g. 5000

  8. we carry out the analysis every 1 participants (stepsize = 1). If you are looking at larger samples you could alternatively repeat the analysis every 5 or 10 participants.

If you were carrying out large important simulations that took a long time you would probably want to save them so that you can load them again later without having to re-run the simulations:

#Saving a BFDA simulation object:
saveRDS(sim.Hd0.5,"sim.Hd0.5.dd.mm.yy.RDS")
saveRDS(sim.Hd0.0,"sim.Hd0.0.dd.mm.yy.RDS")
#to load again
sim.Hd0.5<-readRDS("sim.Hd0.5.dd.mm.yy.RDS")
sim.Hd0.0<-readRDS("sim.Hd0.0.dd.mm.yy.RDS")

2. Try out different analysis parameters to find a set that fits your requirements

Overview

Now that you have a simulated set of sequential BFs you can test out what would happen if you applied different sets of analysis parameters to them.

These are:

  • Nmin: The minimum sample size (per arm) at which you start the sequential analyses

  • Nmax: The maximum sample size (per arm) at which you drop an arm (if it has not already hit a BF boundary)

  • BFfail: A BF threshold for failure (i.e. sufficient evidence for the null hypothesis of non-superiority to the control condition vs. the alternative hypothesis of superiority). This will be a value less than 1 (e.g. 1/3, 1/5, 1/10 etc)

  • BFsuccess: A BF threshold for success (i.e. sufficient evidence for the alternative hypothesis of superiority over the control condition vs.the null hypothesis of non-superiority to the control condition ). This will be a value greater than 1 (e.g. 3, 5, 10)

We are interested in:

  1. power: what proportion of arms hit the BFsuccess threshold when d > 0

  2. false-positive / Type 1 error rate: what proportion of arms hit BF success threshold when d = 0 (or d < 0)

  3. keeping the average sample sizes as low as possible

There are several functions in BFDA to analyse simulation outcomes, but here we will just use the plot function. We will start by using the boundary conditions of the simulations (i.e. n.min = 10 and n.max = 80), and a default set of starting boundaries of BFfail = 1/5 and BFsuccess = 5 (boundary=c(1/5, 5)). The parameter n.trajectories just tells it how many lines to draw on the graph (representing individual BF trajectories)

#First plot for H1
dev.new() #sometimes the plot doesn't work if you don't make a new window first
plot(sim.Hd0.5, n.min=10, n.max=80, boundary=c(1/5, 5), n.trajectories = 60)

#Then plot for H0
dev.new() #sometimes the plot doesn't work if you don't make a new window first
plot(sim.Hd0.0, n.min=10, n.max=80, boundary=c(1/5, 5), n.trajectories = 60)

#There are other functions in BFDA to explore the simulations/sequential analyses but I find the plots the most informative. However, feel free to explore.
Exercise steps
  1. Play around with the parameters (n.min, n.max, the BF boundaries) and see what happens. e.g. increasing n.min will tend to reduce error rates, but means you lose the chance to make decisions so quickly. Remember that the BF boundaries don’t have to be symmetrical (e.g. you could use boundary=c(1/3,10) if you weren’t so worried about false-negatives but were very concerned about potential false-positives)

  2. Try to come up with a set of parameters that gives you 80% power and a false-positive (type 1) error rate of <5%, i.e. for H1, ≥ 80% stopping at H1 boundary, for H0, < 5% (i.e. 4%) stopping at h1 boundary.

  3. Once you find this you can see if you can improve on these, to reduce the potential sample sizes needed.

  4. See what proportion of participants will hit BFsuccess (H1 boundary) for sim.H1 at different sample sizes (by adjusting n.max): the value for which 50% are stopping at the nmax boundary tells you the average sample size you might expect if d=0.5 with these parameters. You can then do the same for sim.H0. It might be that you can find a set of parameters that give you smaller average sample size predictions.

  5. If you have simulated other effect sizes you can see what happens for these, e.g. what if you power for d=0.5, but d=0.4, d=0.3 etc?

Summarising the simulations for planning

One way to collect this information is into a table, as provided in the examples in Table 2 and Table 3 in the paper by Blackwell et al. (2019) https://journals.sagepub.com/doi/full/10.1177/2167702619858071

Table 2 from the paper is reproduced below. This illustrates a particular set of parameters chosen for a small to medium between-group effect size equivalent to Cohen’s d = 0.4: Nmin = 35 (per arm), Nmax = 125 (per arm), BFfail = 1/4, and a BFsuccess = 5 (and directional default Cauchy prior, rscale parameter = √2/2).

Here we can see that we have a false-positive rate of < 5% (top row), and 81% to find d = 0.4. We can also see that 54% of the time, we would hit BFfail at Nmin when d = 0, and 54% of the time we would stop the trial at n = 50 per arm when d = 0.4 (8% at BFfail, 46% at BFsuccess):


“True” effect size (Cohen’s d) Probability of reaching threshold at each participant number (per group)
Discontinuation threshold Replacement threshold
n = 35 n = 50 n = 75 n = 100 n = 125 n = 35 n = 50 n = 75 n = 100 n = 125
0 (null) 54 70 81 86 89 1 3 3 4 4
0.1 37 52 62 68 71 4 7 10 12 13
0.2 22 33 41 45 47 8 15 23 28 32
0.3 11 18 22 24 25 16 29 43 52 58
0.4 5 8 10 10 11 28 46 65 75 81
0.5 2 3 4 4 4 43 64 82 91 94
0.6 1 1 1 1 1 60 80 94 97 98
0.7 0 0 0 0 0 75 91 98 100 100
0.8 0 0 0 0 0 88 97 100 100 100



Doing these plots is quite a long-winded way to arrive at such a table, but it at least gives you a good feel for what happens when you change the parameters. If you were doing your own simulations you could write a script to output a table automatically (e.g. see the scripts at https://osf.io/8mxda/ and https://osf.io/b397u/ for examples)

Exercise 3: Carry out sequential analysis of (simulated) trial data

Overview and setup

In this exercise you will analyse some (simulated) data using the trial parameters you selected, and find out when you hit the BF boundaries.

First you will need to simulate the trial data - please run the code below (which I’m happy to explain but there’s no need to understand it for the exercise - it’s just a means to provide you with the data to analyse):

#Simulates 100 participants in each arm of trial where:
#There is a correlation of r = 0.35 between pre and post-data
#One arm is not superior to control, the other is
set.seed(19112022)
samples=100
r=0.35
data<-mvrnorm(n=samples,mu=c(1,0.8),Sigma=matrix(c(1,r,r,1),nrow=2),empirical=FALSE)
X = data[,1]
Y = data[,2]
X<-as.integer(X*5+20)
Y<-as.integer(Y*5+20)
data<-mvrnorm(n=samples,mu=c(1,0.15),Sigma=matrix(c(1,r,r,1),nrow=2),empirical=FALSE)
X1 = data[,1]
Y1 = data[,2]
X1<-as.integer(X1*5+20)
Y1<-as.integer(Y1*5+20)
data<-mvrnorm(n=samples,mu=c(1,0.9),Sigma=matrix(c(1,r,r,1),nrow=2),empirical=FALSE)
X2 = data[,1]
Y2 = data[,2]
X2<-as.integer(X2*5+20)
Y2<-as.integer(Y2*5+20)
cdiff<-Y-X
T1diff<-Y1-X1
T2diff<-Y2-X2
predata<-c(X,X1,X2)
postdata<-c(Y,Y1,Y2)
diffdata<-postdata-predata
group<-c(rep("C",samples),rep("Tx1",samples),rep("Tx2",samples))
dseq<-seq(1,samples*3)
pid<-as.character(seq(1,samples*3))
mydata<-(cbind(predata,postdata,diffdata,group))
mydata<-mydata[sample(1:nrow(mydata)), ]
mydata<-data.frame(cbind(pid,dseq,mydata))
mydata$dseq<-as.integer(mydata$dseq)
mydata$predata<-as.integer(mydata$predata)
mydata$postdata<-as.integer(mydata$postdata)
mydata$diffdata<-as.integer(mydata$diffdata)

You now have a dataframe (mydata) with pre and post outcome data for a pretend 3-arm trial of treatments for depression (i.e. a decrease in score on the outcome measure is good). There are three treatment arms, “C” (control condition, e.g. TA), “Tx1” (New treatment 1) and “Tx2” (New treatment 2). One arm is superior to control, one isn’t. Each arm includes 100 participants.

Dataframe columns are:

  • pid = participant id (a string of a number from 1 to 300)
  • dseq = sequence in which they provided outcome data for the trial (numerical, here the same as pid for convenience)
  • predata = score on the depression outcome measure at pre-treatment
  • postdata = score on the depression outcome measure at post-treatment
  • diffdata = post-treatment score minus pre-treatment score
  • group = which group (C, Tx1, Tx2)

Exploring the data

Take a look at the data (e.g. with View(mydata) ) to get a sense of it. You can then use the functions in the code chunk below to explore the potential outcome of applying sequential Bayesian analyses.

First, run the code below to load the two functions:

BFsnapshot<-function(BFdata,n=300,rs=(sqrt(2)/2)){

  if (length(BFdata$pid)<n){
    return("Error: n larger than number of participants");
  }
  
  tryCatch(
        {
          tdata<-BFdata[1:n,]
          cat("\n\n Total N = ",n,"\n\n")
          cat("Control: n = ",length(tdata[tdata$group=="C",]$pid),", mean change (post minus pre-treatment) = ", mean(tdata[tdata$group=="C",]$diffdata,na.rm=TRUE)," (SD = ", sd(tdata[tdata$group=="C",]$diffdata,na.rm=TRUE),")\n\n",sep="")
         cat("Tx1: n = ",length(tdata[tdata$group=="Tx1",]$pid),", mean change (post minus pre-treatment) = ", mean(tdata[tdata$group=="Tx1",]$diffdata,na.rm=TRUE)," (SD = ", sd(tdata[tdata$group=="Tx1",]$diffdata,na.rm=TRUE),")\n\n",sep="")
         #effect size calculation
         d<-effectsize::cohens_d(tdata[tdata$group=="C",]$diffdata,tdata[tdata$group=="Tx1",]$diffdata)
         cat("Effect size vs. control: d=",d$Cohens_d," 95% CIs [",d$CI_low,",",d$CI_high,"]\n\n",sep="")
         #Calculates directional Bayesian t-test (nullinterval=c(0,Inf))
         BF<-BayesFactor::ttestBF(x=tdata[tdata$group=="C",]$diffdata,y=tdata[tdata$group=="Tx1",]$diffdata,nullinterval=c(0,Inf),rscale=rs)
        cat("BF vs. control: BF=",exp(BF@bayesFactor$bf),"\n\n",sep="")

        cat("Tx2: n = ",length(tdata[tdata$group=="Tx2",]$pid),", mean change (post minus pre-treatment) = ", mean(tdata[tdata$group=="Tx2",]$diffdata,na.rm=TRUE)," (SD = ", sd(tdata[tdata$group=="Tx2",]$diffdata,na.rm=TRUE),")\n\n",sep="")
        #effect size calculation
         d<-effectsize::cohens_d(tdata[tdata$group=="C",]$diffdata,tdata[tdata$group=="Tx2",]$diffdata)
         cat("Effect size vs. control: d=",d$Cohens_d," 95% CIs [",d$CI_low,",",d$CI_high,"]\n\n",sep="")
         #Calculates directional Bayesian t-test (nullinterval=c(0,Inf))
         BF<-BayesFactor::ttestBF(x=tdata[tdata$group=="C",]$diffdata,y=tdata[tdata$group=="Tx2",]$diffdata,nullinterval=c(0,Inf),rscale=rs)
        cat("BF vs. control: BF=",exp(BF@bayesFactor$bf),"\n\n",sep="") 
          
        },
        error=function(cond) {
          return (paste0("Something went wrong! ",cond))
        })
}

seqBFs<-function(BFdata,rs=(sqrt(2)/2)){
  tryCatch(
        {
          alldata<-BFdata[1:length(BFdata$pid),]
          dsTx1<-vector()
          dsTx2<-vector()
          BFsTx1<-vector()
          BFsTx2<-vector()
          nsTx1<-vector()
          nsTx2<-vector()
          is<-vector()
          #start at index number 10 to avoid problems
          for (i in 10:length(alldata$pid)){
          tdata<-alldata[1:i,]
          Ns<-as.data.frame(cbind(table(tdata[1:i,]$group)))
          dsTx1<-c(dsTx1,(effectsize::cohens_d(tdata[tdata$group=="C",]$diffdata,tdata[tdata$group=="Tx1",]$diffdata))$Cohens_d)
          BFsTx1<-c(BFsTx1,exp((BayesFactor::ttestBF(x=tdata[tdata$group=="C",]$diffdata,y=tdata[tdata$group=="Tx1",]$diffdata,nullinterval=c(0,Inf),rscale=rs)@bayesFactor$bf)))
            nsTx1<-c(nsTx1,Ns["Tx1",])
            
            dsTx2<-c(dsTx2,(effectsize::cohens_d(tdata[tdata$group=="C",]$diffdata,tdata[tdata$group=="Tx2",]$diffdata))$Cohens_d)
            BFsTx2<-c(BFsTx2,exp((BayesFactor::ttestBF(x=tdata[tdata$group=="C",]$diffdata,y=tdata[tdata$group=="Tx2",]$diffdata,nullinterval=c(0,Inf),rscale=rs)@bayesFactor$bf)))
            nsTx2<-c(nsTx2,Ns["Tx2",])
            is<-c(is,i)
          
        }
          allresults<-data.frame(cbind(is,nsTx1,dsTx1,BFsTx1,nsTx2,dsTx2,BFsTx2))
          return(allresults)
        },
        error=function(cond) {
          return (paste0("Something went wrong! ",cond))
        })
}

You can now use these two functions to explore the effect of doing sequential Bayesian analyses (here for simplicity, t-test on change scores).

Function: BFsnapshot()

BFsnapshot() can be used to give you a snapshot of the data at a particular point in time. Arguments are:

  • BFdata = the dataframe with the data (i.e. here it is mydata)
  • n = the total sample size / sequence number (up to a max of 300) you want to take the snapshot at (e.g. if you put 100 in here, it will give you a snapshot of the analysis outcomes at the point there were 100 total participants in the trial)
  • rs = you can ignore unless you want to try change the rscale parameter (sets to √2/2 as default)

e.g. BFsnapshot(mydata,100) would give you analysis output when there are 100 people in the trial

Function: seqBFs()

seqBFs() outputs a dataframe with sequential BFs over the course of the simulated trial. You just pass it the dataframe (and can specify a rscale parameter if you want to change this)

trialresults<-seqBFs(mydata)

will give you a dataframe (trialresults), which you can then investigate (e.g. via View(trialresults), or you can try making some plots) to see how the BFs develop over time.

Columns in the dataframe are:

  • is = index / total N participants in the trial
  • nsTx1 = n in Tx1 arm
  • dsTx1 = effect size (cohen’s d) for Tx1 vs control
  • BFsTx1 = Bayes Factor for Tx1 vs control
  • nsTx2 = n in Tx2 arm
  • dsTx2 = effect size (cohen’s d) for Tx2 vs control
  • BFsTx2 = Bayes Factor for Tx2 vs control

Exercise task:

See if you can work out what would have happened if you used your chosen trial parameters on this data set. When would you hit a BF boundary for each treatment arm (if at all)?

Next steps:

If you want to take this further you might want to simulate more complex analyses (e.g. mixed models) and multi-arm trials - see the examples at https://osf.io/8mxda/ and https://osf.io/b397u/ for ideas.