Guide to glmmsel

Ryan Thompson

Introduction

glmmsel is an R package for generalised linear mixed model (GLMM) selection. Given observations on \(m\) clusters \((\mathbf{y}_i,\mathbf{X}_i)_{i=1}^m\), where \(\mathbf{y}_i\) and \(\mathbf{X}_i\) represent the response vector and predictor matrix for cluster \(i\), glmmsel can fit a GLMM of the form

\[ \operatorname{E}\left[\eta(\mathbf{y}_i)\right]=\mathbf{X}_i(\boldsymbol{\beta}+\mathbf{u}_i), \] where \(\boldsymbol{\beta}\) is a sparse vector of fixed effects (i.e., predictor effects that are the same across clusters), \(\mathbf{u}_i\) is a sparse vector of random effects (i.e., predictor effects that differ across clusters), and \(\eta\) is a link function. glmmsel fits this model by solving the optimisation problem \[ \underset{\boldsymbol{\beta},\boldsymbol{\gamma}}{\min}\;l(\mathbf{y},\mathbf{X};\boldsymbol{\beta},\boldsymbol{\gamma})+\lambda\alpha\|\boldsymbol{\beta}\|_0+\lambda(1-\alpha)\|\boldsymbol{\gamma}\|_0\quad\operatorname{s.t.}\;\beta_k=0\Rightarrow\gamma_k=0, \] where \(l\) is a negative log-likelihood, \(\|\cdot\|_0\) is the \(\ell_0\)-norm (i.e., a count of the number of nonzeros), and \(\lambda\geq0\) and \(\alpha\in(0,1]\) are tuning parameters. Here, \(\boldsymbol{\gamma}\) characterises the variance of the random effects \(\mathbf{u}_i\), which we assume follow a \(N(\mathbf{0},\operatorname{diag}(\boldsymbol{\gamma}))\) distribution. Observe that if \(\gamma_k=0\) then \(u_{ik}\) is zero.

glmmsel operates on the hierarchy principle that a random effect can only be selected if its corresponding fixed effect is also selected; see the constraint \(\beta_k=0\Rightarrow\gamma_k=0\). Setting \(\alpha=1\) means there is no penalty for selecting a random effect if its fixed effect is also selected. Smaller values of \(\alpha\) encourage the random effect to be selected only if it substantially improves the fit. The default value of \(\alpha=0.8\) works well in practice.

Main functions

The two main functions provided by the package are glmmsel() and cv.glmmsel(), responsible for model fitting and cross-validation, respectively.

The glmmsel() function provides a convenient way of fitting the model for a path of \(\lambda\) values. To demonstrate this functionality, let’s simulate some clustered data.

set.seed(1234)
n <- 100 # Number of observations
m <- 4 # Number of clusters
p <- 5 # Number of predictors
s.fix <- 2 # Number of nonzero fixed effects
s.rand <- 1 # Number of nonzero random effects
x <- matrix(rnorm(n * p), n, p) # Predictor matrix
beta <- c(rep(1, s.fix), rep(0, p - s.fix)) # True fixed effects
u <- cbind(matrix(rnorm(m * s.rand), m, s.rand), matrix(0, m, p - s.rand)) # True random effects
cluster <- sample(1:m, n, replace = TRUE) # Cluster labels
xb <- rowSums(x * sweep(u, 2, beta, '+')[cluster, ]) # x %*% (beta + u) matrix
y <- rnorm(n, xb) # Response vector

Of the five candidate predictors, the first two have nonzero fixed effects. Only the first predictor has a nonzero random effect.

library(glmmsel)
fit <- glmmsel(x, y, cluster)

The values of \(\lambda\) are automatically computed from the data, providing a path of solutions from the null model (intercept only) to the full model (all predictors included). The fixed effects and random effects from the path of fits can be extracted using the fixef() and ranef() functions.

fixef(fit)
#>            [,1]       [,2]        [,3]        [,4]        [,5]
#> [1,] -0.2922322 -0.1167263 -0.11498018 -0.11484951 -0.10708102
#> [2,]  0.0000000  1.1104188  1.11240541  1.12407682  1.12770012
#> [3,]  0.0000000  1.1609182  1.16109489  1.17013772  1.17421123
#> [4,]  0.0000000  0.0000000  0.00000000  0.00000000 -0.04771683
#> [5,]  0.0000000  0.0000000  0.00000000 -0.05988147 -0.05850389
#> [6,]  0.0000000  0.0000000  0.04929141  0.04396767  0.04550589
ranef(fit)
#> , , 1
#> 
#>      [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,]    0    0    0    0    0    0
#> [2,]    0    0    0    0    0    0
#> [3,]    0    0    0    0    0    0
#> [4,]    0    0    0    0    0    0
#> 
#> , , 2
#> 
#>      [,1]        [,2] [,3] [,4] [,5] [,6]
#> [1,]    0 -1.14669569    0    0    0    0
#> [2,]    0  0.03435146    0    0    0    0
#> [3,]    0  0.75530136    0    0    0    0
#> [4,]    0  0.35703619    0    0    0    0
#> 
#> , , 3
#> 
#>      [,1]        [,2] [,3] [,4] [,5] [,6]
#> [1,]    0 -1.15635158    0    0    0    0
#> [2,]    0  0.03829802    0    0    0    0
#> [3,]    0  0.75971247    0    0    0    0
#> [4,]    0  0.35850014    0    0    0    0
#> 
#> , , 4
#> 
#>      [,1]       [,2] [,3] [,4] [,5] [,6]
#> [1,]    0 -1.1514971    0    0    0    0
#> [2,]    0  0.0327921    0    0    0    0
#> [3,]    0  0.7502685    0    0    0    0
#> [4,]    0  0.3687197    0    0    0    0
#> 
#> , , 5
#> 
#>      [,1]        [,2] [,3] [,4] [,5] [,6]
#> [1,]    0 -1.14421868    0    0    0    0
#> [2,]    0  0.03769651    0    0    0    0
#> [3,]    0  0.74853993    0    0    0    0
#> [4,]    0  0.35795599    0    0    0    0

Each column in the output of fixef() corresponds to a set of fixed effects for a particular value of \(\lambda\), with the first row containing intercept terms. In the output of ranef(), each slice corresponds to a set of random effects for a particular value of \(\lambda\), with each row containing the random effects for a given cluster.

When making predictions, it is often useful to add the fixed and random effects to get the cluster-specific coefficients. The coef() function provides this functionality.

coef(fit)
#> , , 1
#> 
#>            [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] -0.2922322    0    0    0    0    0
#> [2,] -0.2922322    0    0    0    0    0
#> [3,] -0.2922322    0    0    0    0    0
#> [4,] -0.2922322    0    0    0    0    0
#> 
#> , , 2
#> 
#>            [,1]        [,2]     [,3] [,4] [,5] [,6]
#> [1,] -0.1167263 -0.03627693 1.160918    0    0    0
#> [2,] -0.1167263  1.14477022 1.160918    0    0    0
#> [3,] -0.1167263  1.86572012 1.160918    0    0    0
#> [4,] -0.1167263  1.46745495 1.160918    0    0    0
#> 
#> , , 3
#> 
#>            [,1]        [,2]     [,3] [,4] [,5]       [,6]
#> [1,] -0.1149802 -0.04394616 1.161095    0    0 0.04929141
#> [2,] -0.1149802  1.15070343 1.161095    0    0 0.04929141
#> [3,] -0.1149802  1.87211788 1.161095    0    0 0.04929141
#> [4,] -0.1149802  1.47090555 1.161095    0    0 0.04929141
#> 
#> , , 4
#> 
#>            [,1]        [,2]     [,3] [,4]        [,5]       [,6]
#> [1,] -0.1148495 -0.02742033 1.170138    0 -0.05988147 0.04396767
#> [2,] -0.1148495  1.15686892 1.170138    0 -0.05988147 0.04396767
#> [3,] -0.1148495  1.87434529 1.170138    0 -0.05988147 0.04396767
#> [4,] -0.1148495  1.49279648 1.170138    0 -0.05988147 0.04396767
#> 
#> , , 5
#> 
#>           [,1]        [,2]     [,3]        [,4]        [,5]       [,6]
#> [1,] -0.107081 -0.01651856 1.174211 -0.04771683 -0.05850389 0.04550589
#> [2,] -0.107081  1.16539663 1.174211 -0.04771683 -0.05850389 0.04550589
#> [3,] -0.107081  1.87624005 1.174211 -0.04771683 -0.05850389 0.04550589
#> [4,] -0.107081  1.48565611 1.174211 -0.04771683 -0.05850389 0.04550589

Each row in each of these slices represents the fixed effects plus the random effects for a given cluster, e.g., the second row represents \(\hat{\boldsymbol{\beta}}+\hat{\mathbf{u}}_2\).

The predict() function is available for making predictions on new data.

x.new <- x[1:3, ]
cluster.new <- cluster[1:3]
predict(fit, x.new, cluster.new)
#>            [,1]        [,2]       [,3]       [,4]        [,5]
#> [1,] -0.2922322  0.40829027  0.3588954  0.3840867  0.35454510
#> [2,] -0.2922322 -0.35024290 -0.3451526 -0.2907129 -0.31701760
#> [3,] -0.2922322 -0.07945345 -0.1067836 -0.0751470 -0.06503485

Again, the columns represent predictions for different values of \(\lambda\).

In practice, \(\lambda\) usually needs to be cross-validated. The cv.glmmsel() function provides a convenient way to perform cross-validation.

fit <- cv.glmmsel(x, y, cluster)

glmmsel() does not need to be run after using cv.glmmsel(), as the latter calls the former and saves the result as fit$fit.

The coef() and predict() functions applied to the output of cv.glmmsel() return the result corresponding to the value of \(\lambda\) that minimises the cross-validation loss.

coef(fit)
#>            [,1]        [,2]     [,3] [,4] [,5] [,6]
#> [1,] -0.1167263 -0.03627693 1.160918    0    0    0
#> [2,] -0.1167263  1.14477022 1.160918    0    0    0
#> [3,] -0.1167263  1.86572012 1.160918    0    0    0
#> [4,] -0.1167263  1.46745495 1.160918    0    0    0
predict(fit, x.new, cluster.new)
#> [1]  0.40829027 -0.35024290 -0.07945345

Non-Gaussian likelihoods

Currently, glmmsel supports Gaussian likelihoods (default) and binomial likelihoods. To use a binomial likelihood and perform a logistic linear mixed model fit, set family = 'binomial'.

y <- rbinom(n, 1, 1 / (1 + exp(- xb)))
fit <- cv.glmmsel(x, y, cluster, family = 'binomial')
coef(fit)
#>           [,1]       [,2]      [,3] [,4] [,5] [,6]
#> [1,] 0.1712199 -1.2805969 0.9336272    0    0    0
#> [2,] 0.1712199  0.7217031 0.9336272    0    0    0
#> [3,] 0.1712199  1.6736120 0.9336272    0    0    0
#> [4,] 0.1712199  1.6302119 0.9336272    0    0    0

Algorithms

The primary algorithm driving glmmsel is coordinate descent. Sometimes when the predictors are strongly correlated, the models fit by coordinate descent can be improved using local search. This algorithm runs on top of coordinate descent. To use local search, set local.search = TRUE.

x <- 0.2 * matrix(rnorm(n * p), n, p) + 0.8 * matrix(rnorm(n), n, p)
xb <- rowSums(x * sweep(u, 2, beta, '+')[cluster, ])
y <- rnorm(n, xb)
fit <- cv.glmmsel(x, y, cluster)
coef(fit)
#>           [,1]      [,2]     [,3] [,4]      [,5] [,6]
#> [1,] 0.0772794 0.2402767 1.476905    0 -0.569285    0
#> [2,] 0.0772794 0.8580355 1.476905    0 -0.569285    0
#> [3,] 0.0772794 1.4901887 1.476905    0 -0.569285    0
#> [4,] 0.0772794 1.8003791 1.476905    0 -0.569285    0
fit <- cv.glmmsel(x, y, cluster, local.search = TRUE)
coef(fit)
#>            [,1]        [,2]     [,3] [,4] [,5] [,6]
#> [1,] 0.04734797 -0.09505438 1.264529    0    0    0
#> [2,] 0.04734797  0.47932137 1.264529    0    0    0
#> [3,] 0.04734797  1.11757935 1.264529    0    0    0
#> [4,] 0.04734797  1.48196842 1.264529    0    0    0

The correct predictors are not selected without local search in this high-correlation example.

mirror server hosted at Truenetwork, Russian Federation.