The CRTanalysis()
function is a wrapper for different statistical analysis packages that
can be used to analyse either simulated or real trial datasets. It is
designed for use in simulation studies of different analytical methods
for spatial CRTs by automating the data processing and selecting some
appropriate analysis options. It does not replace conventional use of
these packages. Real field trials very often entail complications that
are not catered for any of the analysis options in
CRTanalysis()
and it does not aspire to carry out the full
analytical workflow for a trial. It can be used as part of a wider
workflow. In particular the usual object output by the statistical
analysis package constitutes the model_object
element
within the CRTanalysis
object generated by
CRTanalysis()
. This can be accessed by the usual methods
(e.g predict()
, summary()
,
plot()
) which may be needed for diagnosing errors,
assessing goodness of fit, and for identifying needs for additional
analyses.
The options that can be specified using the method
parameter in the function call are:
method = "T"
summarises the outcome at the level of the
cluster, and uses 2-sample t-tests to carry out statistical significance
tests of the effect, and to compute confidence intervals for the effect
size. The t.test
function in the stats
package is used.method = "GEE"
uses Generalised Estimating Equations to
estimate the efficacy in a model with iid random effects for the
clusters. An estimate of the intracluster correlation (ICC) is also
provided. This uses calls to the geepack
package.method = "LME4"
fits linear (for continuous data) or
generalized linear (for counts and proportions) mixed models with iid
random effects for clusters in lme4.method = "MCMC"
uses Markov chain Monte Carlo
simulation in package jagsUI, which calls
r-JAGS.method = "INLA"
uses approximate Bayesian inference via
the R-INLA package. This provides
functionality for geostatistical analysis, which can be used for
geographical mapping of model outputs (as illustrated in . INLA spatial
analysis requires a prediction mesh. This can be generated using CRTspat::new_mesh()
.
This can be computationally expensive, so it is recommended to compute
the mesh just once for each dataset.All these analysis methods can be used to carry out a simple
comparision of outcomes between trial arms. Each offers different
additional functionality, and has its own limitations (see Table 5.1).
Some of these limitations are specific to the options offered within
CRTanalysis()
, which does not embrace the full range of
options of the packages that are ‘wrapped’. These are specified using
the method
argument of the function.
Table 5.1. Available statistical methods
method |
Package | What the CRTanalysis() implementation offers |
Limitations (as implemented) |
---|---|---|---|
T |
t.test | P-values and confidence intervals for efficacy based on comparison of cluster means | No analysis of contamination or degree of clustering |
GEE |
geepack | Interval estimates for efficacy and Intra-cluster correlations | No analysis of contamination or degree of clustering |
LME4 |
lme4 | Analysis of contamination | No geostatistical analysis |
INLA |
INLA | Analysis of contamination, geostatistical analysis and spatially structured outputs | Computationally intensive |
MCMC |
jagsUI | Interval estimates for contamination parameters | Identifiability issues and slow convergence are possible |
For the analysis of proportions, the outcome in the control arm is estimated as: \(\hat{p}_{C} = \frac{1}{1 + exp(-\beta_1)}\), in the intervention arm as \(\hat{p}_{I} = \frac{1}{1 + exp(-\beta_1-\beta_2)}\), and the efficacy is estimated as \(\tilde{E}_{s} = 1- \frac{\tilde{p}_{I}}{\tilde{p}_{C}}\) where \(\beta_1\) is the intercept term and \(\beta_2\) the incremental effect associated with the intervention.
summary("<analysis>"")
is used to view the key
results of the trial. To display the output from the statistical
procedure that is called, try <analysis>$model_object
or summary("<analysis>$model_object")
.
library(CRTspat)
example <- readdata("exampleCRT.txt")
analysisT <- CRTanalysis(example, method = "T")
summary(analysisT)
##
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method: T
## Link function: logit
## Model formula: arm + (1 | cluster)
## No modelling of contamination
## Estimates: Control: 0.364 (95% CL: 0.286 0.451 )
## Intervention: 0.21 (95% CL: 0.147 0.292 )
## Efficacy: 0.423 (95% CL: 0.208 0.727 )
##
## P-value (2-sided): 0.006879064
##
## Two Sample t-test
##
## data: lp by arm
## t = 2.9818, df = 22, p-value = 0.006879
## alternative hypothesis: true difference in means between group control and group intervention is not equal to 0
## 95 percent confidence interval:
## 0.2332638 1.2989425
## sample estimates:
## mean in group control mean in group intervention
## -0.5561662 -1.3222694
The model = "LME4"
option outputs the deviance of the
model and the Akaike information criterion (AIC), which can be used to
select the best fitting model. The deviance information criterion (DIC)
and Bayesian information criterion (BIC) perform the same role for the
Bayesian methods ("INLA"
, and "MCMC"
). The
comparison of results with cfunc = "X"
and
cfunc = "Z"
is used to assess whether the intervention
effect is likely to be due to chance. With method = "T"
,
cfunc = "X"
provides a significance test of the
intervention effect directly. The models with contamination (see below)
can be compared by that with cfunc = "X"
to evaluate
whether contamination has led to an important bias.
CRTanalysis()
provides options for analysing
contamination between arms, or spillover effects either as function of a
Euclidean distance or as a function of a surround measure:
Models that do not consider contamination can be fitted using options
Z
and X
. These are included both to allow
conventional analyses (see above), and also to enable model selection
using and likelihood ratio tests, the Akaike information criterion
(AIC), deviance information criterion (DIC) or Bayesian information
criterion (BIC) .
These methods require a measure of distance from the boundary between
the trial arms, with locations in the control arm assigned negative
values, and those in the intervention arm assigned positive values. The
functional forms for this relationship is specified by the value of
cfunc
(Table 5.2).
Table 5.2. Available contamination functions
cfunc |
Description | Formula for \(P\left( d \right)\) | Compatible method (s) |
---|---|---|---|
Z |
No intervention effect | \(P\left( d \right) = \ 0\ \) | GEE LME4 INLA
MCMC |
X |
Simple intervention effect | \(\begin{matrix} P\left( d \right) = \ 0\ for\ d\ < \ 0 \\ P\left( d \right) = \ 1\ for\ d\ > \ 0 \\ \end{matrix}\ \) | T GEE LME4 INLA
MCMC |
L |
inverse logistic (sigmoid) | \(P\left( d \right) = \ \frac{1}{\left( 1\ + \ exp\left( - d/S \right) \right)}\) | LME4 INLA MCMC |
P |
inverse probit (error function) | \(P\left( d \right) = 1\ +\ erf\left(\frac{d}{S\sqrt2}\right)\) | LME4 INLA MCMC |
S |
piecewise linear | \(\begin{matrix} P\left( d \right) = \ 0\ for\ d\ < \ - S/2\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ P\left( d \right) = \ \left(S/2\ + \ d \right)/S\ for\ - S/2 < d\ < \ S/2\\ P\left( d \right) = \ 1\ for\ d\ > \ S/2\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \end{matrix}\ \) | LME4 INLA MCMC |
R |
rescaled linear | \(P\left( d \right) =\frac{d\ -\ min(d)}{max(d)\ -\ min(d)}\) | LME4 INLA MCMC |
cfunc
options P
, L
and
S
lead to non-linear models in which the contamination
scale parameter (S
) must be estimated. This is done by
selecting scale_par
using a one-dimensional optimisation of
the goodness of fit of the model in stats::optimize()
.
The different values for cfunc
lead to the fitted curves
shown in Figure 5.1. The light blue shaded part of the plot corresponds
to the contamination interval in those cases where this is
estimated.
##
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method: LME4
## Link function: logit
## Model formula: (1 | cluster)
## No comparison of arms
## Estimates: Control: 0.285 (95% CL: NA )
## deviance: 1387.609
## AIC : 1391.609
##
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method: LME4
## Link function: logit
## Model formula: arm + (1 | cluster)
## No modelling of contamination
## Estimates: Control: 0.366 (95% CL: 0.292 0.449 )
## Intervention: 0.216 (95% CL: 0.162 0.281 )
## Efficacy: 0.41 (95% CL: 0.165 0.584 )
## deviance: 1379.898
## AIC : 1385.898
##
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method: LME4
## Link function: logit
## Measure of distance or surround: Signed distance to other arm (km)
## Estimated scale parameter: 0.45
## Model formula: pvar + (1 | cluster)
## Error function model for contamination
## Estimates: Control: 0.418 (95% CL: 0.331 0.509 )
## Intervention: 0.186 (95% CL: 0.136 0.25 )
## Efficacy: 0.553 (95% CL: 0.327 0.703 )
## Contamination range(km): 4.22 (95% CL: 4.2 4.23 )
## % locations contaminated: 91.6 (95% CL: 90.6 92 %)
## Total effect : 0.23 (95% CL: 0.114 0.344 )
## Ipsilateral Spillover : 0.0233 (95% CL: 0.0127 0.0323 )
## Contralateral Spillover : 0.0417 (95% CL: 0.0192 0.0651 )
## deviance: 1374.215
## AIC : 1382.215 including penalty for the contamination scale parameter
##
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method: LME4
## Link function: logit
## Measure of distance or surround: Signed distance to other arm (km)
## Estimated scale parameter: 0.249
## Model formula: pvar + (1 | cluster)
## Sigmoid (logistic) function for contamination
## Estimates: Control: 0.417 (95% CL: 0.332 0.51 )
## Intervention: 0.186 (95% CL: 0.136 0.249 )
## Efficacy: 0.552 (95% CL: 0.329 0.7 )
## Contamination range(km): 4.26 (95% CL: 4.24 4.28 )
## % locations contaminated: 92.7 (95% CL: 92.2 93.1 %)
## Total effect : 0.229 (95% CL: 0.115 0.342 )
## Ipsilateral Spillover : 0.0219 (95% CL: 0.0121 0.0304 )
## Contralateral Spillover : 0.0388 (95% CL: 0.0183 0.0604 )
## deviance: 1374.201
## AIC : 1382.201 including penalty for the contamination scale parameter
##
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method: LME4
## Link function: logit
## Measure of distance or surround: Signed distance to other arm (km)
## Estimated scale parameter: 1.674
## Model formula: pvar + (1 | cluster)
## Piecewise linear function for contamination
## Estimates: Control: 0.423 (95% CL: 0.334 0.516 )
## Intervention: 0.185 (95% CL: 0.135 0.247 )
## Efficacy: 0.561 (95% CL: 0.341 0.711 )
## Contamination range(km): 4.1 (95% CL: 4.1 4.11 )
## % locations contaminated: 86.6 (95% CL: 86.6 87.1 %)
## Total effect : 0.237 (95% CL: 0.12 0.356 )
## Ipsilateral Spillover : 0.029 (95% CL: 0.016 0.0403 )
## Contralateral Spillover : 0.0522 (95% CL: 0.0248 0.0818 )
## deviance: 1374.094
## AIC : 1382.094 including penalty for the contamination scale parameter
##
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method: LME4
## Link function: logit
## Measure of distance or surround: Signed distance to other arm (km)
## No non-linear parameter. 1
## Model formula: pvar + (1 | cluster)
## Rescaled linear function for contamination
## Estimates: Control: 0.584 (95% CL: 0.381 0.758 )
## Intervention: 0.116 (95% CL: 0.0587 0.216 )
## Efficacy: 0.801 (95% CL: 0.465 0.92 )
## Contamination range(km): 6.64 (95% CL: 6.61 6.65 )
## % locations contaminated: 99.8 (95% CL: 99.8 99.8 %)
## Total effect : 0.468 (95% CL: 0.181 0.694 )
## Ipsilateral Spillover : 0.117 (95% CL: 0.0564 0.157 )
## Contralateral Spillover : 0.238 (95% CL: 0.0831 0.368 )
## deviance: 1378.711
## AIC : 1384.711
p0 <- plotCRT(analysisLME4_Z, map = FALSE)
p1 <- plotCRT(analysisLME4_X, map = FALSE)
p2 <- plotCRT(analysisLME4_P, map = FALSE)
p3 <- plotCRT(analysisLME4_L, map = FALSE)
p4 <- plotCRT(analysisLME4_S, map = FALSE)
p5 <- plotCRT(analysisLME4_R, map = FALSE)
library(cowplot)
plot_grid(p0, p1, p2, p3, p4, p5, labels = c('Z', 'X', 'P', 'L', 'S', 'R'), label_size = 10, ncol = 2)
Fig 5.1 Fitted curves for the
example dataset with different options for cfunc
The piecewise linear contamination function,
cfunc = "S"
, is only linear on the scale of the linear
predictor. When used in a logistic model, as here, the transformation
via the inverse of the link function leads to a slightly curved plot
(Figure 5.1S). The rescaled linear function, cfunc = "R"
,
is provided as a comparator and for use with distance
values other than distance = "nearestDiscord"
see below (it
should not be used to estimate the contamination range).
The full set of different cfunc
options are available
for each of model options "LME4"
, "INLA"
, and
"MCMC"
. The performance of all these different models has
not yet been thoroughly investigated. The analyses of Multerer
et al. (2021b) found that that a model equivalent to
method = "MCMC"
, cfunc = "L"
gave estimates of
efficacy with low bias, even in simulations with considerable
contamination.
Contamination can also be analysed by assuming the effect size to be
a function of the number of intervened locations in the surroundings of
the location Anaya-Izquierdo
& Alexander(2021). Several different surround functions are
available. These are specified by the distance
parameter
(Table 5.3).
Table 5.3. Available surround functions
distance |
Description | Details |
---|---|---|
nearestDiscord |
Distance to nearest discordant location | The default. This is used for analyses by distance (see above) |
hdep |
Tukey half-depth | Algorithm of Rousseeuw & Ruts(1996) |
sdep |
Simplicial depth | Algorithm of Rousseeuw & Ruts(1996) |
disc |
disc | The number of intervened locations within the specified radius (excluding the location itself) as described by Anaya-Izquierdo & Alexander(2021) |
kern |
Sum of kernels | The sum of normal kernels |
The compute_distance()
function is provided to compute these quantities, so that they can be
described, compared, and analysed independently of
CRTanalysis()
. Note that the values of the surround
calculated by compute_distance()
are scaled to avoid
correlation with the spatial density of the points (see documentation) and so are
not equivalent to the quantities reported in the original
publications.
Users can also devise other measures of surround or distance, add
them to a trial
data frame and specify them using
distance
. CRTanalysis()
computes the minimum
value for the specified field
examples <- compute_distance(example, distance = "hdep")
ps1 <- plotCRT(examples, distance = "hdep", legend.position = c(0.6, 0.8))
ps2 <- plotCRT(examples, distance = "sdep")
examples <- compute_distance(examples, distance = "disc", scale_par = 0.5)
ps3 <- plotCRT(examples, distance = "disc")
examples <- compute_distance(examples, distance = "kern", scale_par = 0.5)
ps4 <- plotCRT(examples, distance = "kern")
plot_grid(ps1, ps2, ps3, ps4, labels = c('hdep', 'sdep', 'disc', 'kern'), label_size = 10, ncol = 2)
Fig 5.2 Stacked bar plots for
different surrounds
If distance
is assigned a value of either
hdep
, sdep
, then cfunc = "R"
is
used by default and the overall effect size is computed by comparing the
fitted values of the model for a surround value of zero with that of the
maximum of the surround in the data. If distance = "disc"
or distance = "kern"
and scale_par
is assigned
a value, then cfunc = "R"
is also used. If
cfunc = "E"
is specified then an escape function is fitted
with the scale parameter estimated in the same way as in the scale
parameter in other models (see above Table 5.2).
examples_hdep <- CRTanalysis(examples, method = "LME4", distance = "hdep", cfunc = 'R')
summary(examples_hdep)
##
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method: LME4
## Link function: logit
## Measure of distance or surround: Tukey half-depth
## No non-linear parameter. 1
## Model formula: pvar + (1 | cluster)
## Rescaled linear function for contamination
## Estimates: Control: 0.381 (95% CL: 0.292 0.478 )
## Intervention: 0.209 (95% CL: 0.15 0.282 )
## Efficacy: 0.452 (95% CL: 0.167 0.639 )
## Contamination range(km): 0.978 (95% CL: 0.976 0.98 )
## % locations contaminated: 55 (95% CL: 55 55 %)
## Total effect : 0.172 (95% CL: 0.0524 0.292 )
## Ipsilateral Spillover : 0.0313 (95% CL: 0.01 0.0512 )
## Contralateral Spillover : 0.0444 (95% CL: 0.0128 0.0785 )
## deviance: 1379.89
## AIC : 1385.89
ps4 <- plotCRT(examples_hdep,legend.position = c(0.8, 0.8))
examples_sdep <- CRTanalysis(examples, method = "LME4", distance = "sdep", cfunc = 'R')
summary(examples_sdep)
##
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method: LME4
## Link function: logit
## Measure of distance or surround: Simplicial depth
## No non-linear parameter. 1
## Model formula: pvar + (1 | cluster)
## Rescaled linear function for contamination
## Estimates: Control: 0.393 (95% CL: 0.307 0.485 )
## Intervention: 0.199 (95% CL: 0.145 0.268 )
## Efficacy: 0.493 (95% CL: 0.243 0.66 )
## Contamination range(km): 0.978 (95% CL: 0.976 0.98 )
## % locations contaminated: 52.4 (95% CL: 52.2 52.4 %)
## Total effect : 0.193 (95% CL: 0.0802 0.306 )
## Ipsilateral Spillover : 0.0299 (95% CL: 0.013 0.0456 )
## Contralateral Spillover : 0.0431 (95% CL: 0.0169 0.0704 )
## deviance: 1376.417
## AIC : 1382.417
ps5 <- plotCRT(examples_sdep)
examples_disc <- CRTanalysis(examples, method = "LME4", distance = "disc", cfunc = 'R', scale_par = 0.15)
summary(examples_disc)
##
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method: LME4
## Link function: logit
## Measure of distance or surround: disc of radius 0.15 km
## Precalculated scale parameter: 0.15
## Model formula: pvar + (1 | cluster)
## Rescaled linear function for contamination
## Estimates: Control: 0.387 (95% CL: 0.312 0.467 )
## Intervention: 0.2 (95% CL: 0.149 0.26 )
## Efficacy: 0.482 (95% CL: 0.273 0.634 )
## Contamination range(km): 0.978 (95% CL: 0.976 0.98 )
## % locations contaminated: 8.89 (95% CL: 8.89 8.89 %)
## Total effect : 0.186 (95% CL: 0.0912 0.282 )
## Ipsilateral Spillover : 0.00458 (95% CL: 0.00239 0.00656 )
## Contralateral Spillover : 0.00576 (95% CL: 0.00271 0.00905 )
## deviance: 1374.274
## AIC : 1380.274
ps6 <- plotCRT(examples_disc)
examples_kern <- CRTanalysis(examples, method = "LME4", distance = "kern", cfunc = 'R', scale_par = 0.15)
summary(examples_kern)
##
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method: LME4
## Link function: logit
## Measure of distance or surround: kern with kernel s.d. 0.15 km
## Precalculated scale parameter: 0.15
## Model formula: pvar + (1 | cluster)
## Rescaled linear function for contamination
## Estimates: Control: 0.406 (95% CL: 0.327 0.491 )
## Intervention: 0.185 (95% CL: 0.136 0.245 )
## Efficacy: 0.542 (95% CL: 0.349 0.684 )
## Contamination range(km): 0.979 (95% CL: 0.977 0.98 )
## % locations contaminated: 50.8 (95% CL: 50.6 50.9 %)
## Total effect : 0.22 (95% CL: 0.122 0.32 )
## Ipsilateral Spillover : 0.011 (95% CL: 0.00661 0.0152 )
## Contralateral Spillover : 0.0134 (95% CL: 0.00707 0.0203 )
## deviance: 1369.677
## AIC : 1375.677
ps7 <- plotCRT(examples_kern)
plot_grid(ps4, ps5, ps6, ps7, labels = c('hdep', 'sdep', 'disc', 'kern'), label_size = 10, ncol = 2)
Fig 5.3 Fitted curves for the
example dataset with different surrounds
To carry out a geostatistical analysis with
method = "INLA"
a prediction mesh is needed. By default a
very low resolution mesh is created (creating a high resolution mesh is
computationally expensive). To create a 100m INLA mesh for
<MyTrial>
, use:
mesh <- new_mesh(trial = <MyTrial> , pixel = 0.1)