The psc.R package implements the methods for applying Personalised Synthetic Controls, which allows for patients receiving some experimental treatment to be compared against a model which predicts their reponse to some control. This is a form of causal inference which differes from other approaches in that
Data are only required on a single treatment - all counterfactual evidence is supplied by a parametric modelCausal inference, in theory at least, is estimated at a patient level - as opposed to estimating average effects over a population
The causal estimand obtained is the Average Treatment Effect of the Treated (ATT) which differs from the Average Treatment Effect (ATE) obtained in other settings and addresses the question of whether treatments are effective in the population of patients who are treated. This estimand then targets efficacy over effectivness.
In its basic form, this method creates a likelihood to compare a cohort of data to a parametric model. See (X) for disucssion on it’s use as a causal inference tool. To use this package, two basic peices of information are required, a dataset and a model against which they can be compared.
In this vignette, we will detail how the psc.r package is constructed and give some examples for it’s application in practice.
The pscfit
function compares a dataset (‘DC’) against a
parametric model. This is done by selecting a likelihood which is
identified by the type of CFM that is supplied. At present, two types of
model are supported, a flexible parmaeteric survival model of type
‘flexsurvreg’ and a geleneralised linear model of type ‘glm’.
Where the CFM is of type ‘flexsurvreg’ the likeihood supplied is of the form:
\[L(D∣\Lambda,\Gamma_i)=\prod_{i=1}^{n} f(t_i∣\Lambda,\Gamma_i)^{c_i} S(t_i∣\Gamma,\Lambda_i)^{(1−c_i)}\]
Where \(\Gamma\) defines the cumulative baseline hazard function, \(\Lambda\) is the linear predictor and \(t\) and \(c\) are the event time and indicator variables.
Where the CFM is of the type ‘glm’ the likelihood supplied is of the form:
\[L(x∣\Gamma_i) = \prod_{i=1}^{n} b (x∣ \Gamma_i )\exp\{\Gamma_i t(x)− c(\Gamma_i)\}\]
Where \(b(.)\), \(t(.)\) and \(c(.)\) represent the functions of the exponential family. In both cases, \(\Gamma\) is defiend as:
\[ \Gamma_i = \gamma x_i+\beta \]
Where \(\gamma\) are the model coefficients supplied by the CFM and \(\beta\) is the parameter set to measure the difference between the CFM and the DC.
Estimation is performed using a Bayesian MCMC procedure. Prior distributions for \(\Gamma\) (& \(\Lambda\)) are derived directly from the model coefficients (mean and variance covariance matrix) or the CFM. A bespoke MCMC routine is performed to estimate \(\beta\). Please see ‘?mcmc’ for more detials.
For the standard example where the DC contains information from only a single treatment, trt need not be specified. Where comparisons between the CFM and multiple treatments are require, a covariate of treamtne allocations must be specified sperately (using the ‘trt’ option).
The main function for using applying Personal Synthetic Controls is the pscfit() function which has two inputs, a Counter-Factual Model (CFM) and a data cohort (DC). Further arguments include
The output of the “pscfit()” function is an object of class ‘psc’. This class contains the following attributes
basic post estimation functions have been developed to work with the psc object, namely “print()”, “coef()”, “summary()” and “plot()”. For the first three of these these provided basic summaries of the efficacy parameter obtained from the posterior distribution.
The psc.r package includes as example a dataset which is derived from patients with pancreatic ductal adenocarcinoma (PDAC) who have all received some experimental treatment, in this case GemCap. The dataset is named ‘e4_data’ and is loaded into the enviroment using the “data()” function
#install.packages("psc")
library(psc)
library(ggpubr)
#> Loading required package: ggplot2
e4_data <- psc::e4_data
Included is a list of prognostic covariates:
Also included are the following structures
We give esamples of how the ‘pscfit()’ function can be used to comapre data against models with survival outcomes (with a ‘flexsurvreg’ model). Examples on how to perform analyses using GLM model objects are available from the github repo https://github.com/richJJackson/psc
For an example with a survival outcome a model must be supplied which is contructed ont he basis of flexible parametric splines. This is contructed using the “flexsurvreg” function within the “flexsurv” package. An example is included within the ‘psc.r’ package names ‘surv.mod’ and is loaded using the ’data()” function:
The ‘gemCFM’ is an object of calss pscCFM which means it contains all of the structures required for analysis but has stripped the model object of any patient level data. Included instead are a summary table:
…and a set of visualisations which we arrange using the ggarrange
In this example you can see that this is a model constructed with 3 internal knots and hence 5 parameters to describe the baseline cumulative hazard function. There are also prognostic covariates which match with the prognostic covariates in the data cohort.
Comparing the dataset to the model is then performed using
and we can view the attributes of the psc object that is created
For example to view the matrix contianing the draws of the posterior distribution we use
surv.post <- surv.psc$posterior
head(surv.post)
#> gamma0 gamma1 gamma2 gamma3 gamma4 t3 t4
#> 1 -10.183475 2.886891 0.32844796 -0.5849253 0.5525292 0.5605011 0.4197932
#> 2 -10.969265 3.301359 0.58198030 -1.0237175 0.8195741 0.5566349 0.4100658
#> 3 -10.587853 3.167029 0.66580286 -1.1318098 0.8270476 0.7798882 0.7911586
#> 4 -8.798829 2.430359 0.19109386 -0.4354553 0.4860852 0.4867984 0.2684409
#> 5 -9.450566 2.525952 -0.08427386 -0.1078601 0.4413689 0.7747423 0.6481159
#> 6 -10.064442 2.659429 0.70571749 -1.5411143 1.1810407 0.8033240 0.7750495
#> grade2 grade3 nodes2 lca199 beta DIC
#> 1 0.4117679 0.7335891 0.48618366 0.2057172 -0.24299183 NA
#> 2 0.4668254 0.9665844 0.54071154 0.1540281 -0.26320056 843.1062
#> 3 0.1574126 0.5174706 0.36121804 0.1877874 -0.26994414 840.9129
#> 4 0.5180700 0.7743066 0.20808802 0.2000951 -0.41976725 846.8173
#> 5 0.2112535 0.3699374 0.09043326 0.2502472 -0.47843710 854.6727
#> 6 0.3945218 0.6149910 0.68924241 0.2097002 -0.03326903 846.6461
Inspection will show that there is a column for each parameter in the original model as well as ‘beta’ and ‘DIC’ vcolumns which give teh posterior estiamtes for \(\beta\) and the Deviance Informaiton Criterion respectively.
We can inspect the poterior distribution using the autocorrelation function, trace and stardard summary statistics:
Standard ‘summary()’ function wil summarise the model fit
summary(surv.psc)
#> Summary:
#>
#> 311 observations selected from the data cohort for comparison
#> CFM of type flexsurvreg identified
#> linear predictor succesfully obtained with median:
#> trt: 1.786
#> Average expected response:
#> trt: 30.077
#> Average observed response: 26.327
#>
#> Counterfactual Model (CFM):
#> A model of class 'flexsurvreg'
#> Fit with 3 internal knots
#>
#> Formula:
#> s.ob ~ t + grade + nodes + lca199
#> <environment: 0x119ea0d88>
#>
#> Call:
#> CFM model + beta
#>
#> Coefficients:
#> median 2.5% 97.5% Pr(x<0) Pr(x>0)
#> beta -0.25000 -0.43277 -0.07186 0.99540 0.00460
#> DIC 844.82218 837.91883 857.34798 NA NA
To visualise the original model and the fit of the data, the plot function has been developed