Type: | Package |
Title: | Sparse Gaussian Graphical Modeling with Variable Clustering |
Version: | 0.1.0 |
Date: | 2025-10-13 |
Depends: | R (≥ 3.5.0) |
Imports: | Rcpp (≥ 1.0.9), dplyr, parallel, rlang, stats |
Suggests: | mvtnorm |
LinkingTo: | Rcpp, RcppEigen |
Description: | Perform sparse estimation of a Gaussian graphical model (GGM) with node aggregation through variable clustering. Currently, the package implements the clusterpath estimator of the Gaussian graphical model (CGGM) (Touw, Alfons, Groenen & Wilms, 2025; <doi:10.48550/arXiv.2407.00644>). |
License: | GPL (≥ 3) |
URL: | https://github.com/aalfons/clusterGGM |
BugReports: | https://github.com/aalfons/clusterGGM/issues |
Author: | Daniel J.W. Touw |
Maintainer: | Andreas Alfons <alfons@ese.eur.nl> |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.3 |
NeedsCompilation: | yes |
Packaged: | 2025-10-13 15:58:42 UTC; alfons |
Repository: | CRAN |
Date/Publication: | 2025-10-17 20:20:13 UTC |
Sparse Gaussian Graphical Modeling with Variable Clustering
Description
Perform sparse estimation of a Gaussian graphical model (GGM) with node aggregation through variable clustering. Currently, the package implements the clusterpath estimator of the Gaussian graphical model (CGGM) (Touw, Alfons, Groenen & Wilms, 2025; <doi:10.48550/arXiv.2407.00644>).
Details
The DESCRIPTION file:
Package: | clusterGGM |
Type: | Package |
Title: | Sparse Gaussian Graphical Modeling with Variable Clustering |
Version: | 0.1.0 |
Date: | 2025-10-13 |
Depends: | R (>= 3.5.0) |
Imports: | Rcpp (>= 1.0.9), dplyr, parallel, rlang, stats |
Suggests: | mvtnorm |
LinkingTo: | Rcpp, RcppEigen |
Description: | Perform sparse estimation of a Gaussian graphical model (GGM) with node aggregation through variable clustering. Currently, the package implements the clusterpath estimator of the Gaussian graphical model (CGGM) (Touw, Alfons, Groenen & Wilms, 2025; <doi:10.48550/arXiv.2407.00644>). |
License: | GPL (>= 3) |
URL: | https://github.com/aalfons/clusterGGM |
BugReports: | https://github.com/aalfons/clusterGGM/issues |
Authors@R: | c(person("Daniel J.W.", "Touw", role = "aut", comment = c(ORCID = "0000-0003-3074-5401")), person("Andreas", "Alfons", email = "alfons@ese.eur.nl", role = c("aut", "cre"), comment = c(ORCID = "0000-0002-2513-3788")), person("Ines", "Wilms", role = "aut", comment = c(ORCID = "0000-0003-3269-4601")), person("Patrick J.F.", "Groenen", role = "ths", comment = c(ORCID = "0000-0001-6683-8971", "PhD advisor of Daniel J.W. Touw"))) |
Author: | Daniel J.W. Touw [aut] (ORCID: <https://orcid.org/0000-0003-3074-5401>), Andreas Alfons [aut, cre] (ORCID: <https://orcid.org/0000-0002-2513-3788>), Ines Wilms [aut] (ORCID: <https://orcid.org/0000-0003-3269-4601>), Patrick J.F. Groenen [ths] (ORCID: <https://orcid.org/0000-0001-6683-8971>, PhD advisor of Daniel J.W. Touw) |
Maintainer: | Andreas Alfons <alfons@ese.eur.nl> |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.3 |
Index of help topics:
cggm Clusterpath Estimator of the Gaussian Graphical Model cggm_cv Cross Validation for the Clusterpath Estimator of the Gaussian Graphical Model cggm_refit Refit the Gaussian Graphical Model for a Given Aggregation and Sparsity Structure clusterGGM-package Sparse Gaussian Graphical Modeling with Variable Clustering clusterpath_weights Compute the Weight Matrix for the Clusterpath Penalty cv_folds Create Cross-Validation Folds get_Theta Extract the Estimated Precision Matrix get_clusters Extract the Cluster Assignment lasso_weights Compute the Weight Matrix for the Lasso Penalty min_clusters Calculate the Minimum Number of Clusters
Author(s)
Daniel J.W. Touw [aut] (ORCID: <https://orcid.org/0000-0003-3074-5401>), Andreas Alfons [aut, cre] (ORCID: <https://orcid.org/0000-0002-2513-3788>), Ines Wilms [aut] (ORCID: <https://orcid.org/0000-0003-3269-4601>), Patrick J.F. Groenen [ths] (ORCID: <https://orcid.org/0000-0001-6683-8971>, PhD advisor of Daniel J.W. Touw)
Maintainer: Andreas Alfons <alfons@ese.eur.nl>
References
D.J.W. Touw, A. Alfons, P.J.F. Groenen and I. Wilms (2025) Clusterpath Gaussian Graphical Modeling. arXiv:2407.00644. doi:10.48550/arXiv.2407.00644.
See Also
Useful links:
Clusterpath Estimator of the Gaussian Graphical Model
Description
Compute the clusterpath estimator of the Gaussian Graphical Model (CGGM) for fixed values of the tuning parameters to obtain a sparse estimate with variable clustering of the precision matrix or the covariance matrix.
Usage
cggm(
S,
W_cpath,
lambda_cpath,
W_lasso = NULL,
lambda_lasso = 0,
eps_lasso = 0.005,
gss_tol = 0.005,
conv_tol = 1e-07,
fusion_threshold = NULL,
tau = 0.001,
max_iter = 5000,
expand = FALSE,
max_difference = 0.01,
verbose = 0
)
Arguments
S |
The sample covariance matrix of the data. |
W_cpath |
The weight matrix used in the clusterpath penalty. |
lambda_cpath |
A numeric vector of tuning parameters for regularization. Should be a sequence of monotonically increasing values. |
W_lasso |
The weight matrix used in the lasso penalty. Defaults to
|
lambda_lasso |
The penalty parameter used for the lasso penalty.
Defaults to |
eps_lasso |
Parameter that governs the quadratic approximation of the
lasso penalty. Within the interval |
gss_tol |
The tolerance value used in the golden section search (GSS) algorithm. Defaults to 0.005. |
conv_tol |
The tolerance used to determine convergence. Defaults to
|
fusion_threshold |
The threshold for fusing two clusters. If
|
tau |
The parameter used to determine the fusion threshold. Defaults to 0.001. |
max_iter |
The maximum number of iterations allowed for the optimization
algorithm. Defaults to |
expand |
Determines whether the vector |
max_difference |
The maximum allowed difference between consecutive
solutions of Theta if |
verbose |
Determines the amount of information printed during the optimization. Slows down the algorithm significantly. Defaults to 0. |
Value
An object of class "CGGM"
with the following components:
A , R |
Lists of matrices. Each pair of matrices with the same index
parametrize the estimated precision matrix for the corresponding value
of the aggregation parameter |
clusters |
An integer matrix in which each row contains the cluster
assignment of each variable for the corresponding value of the aggregation
parameter |
lambdas |
A vector with the values for the aggregation parameter
|
Theta |
List of matrices. Contains the solution to the minimization
procedure for each value of the aggregation parameter |
losses |
A vector with the values of the minimized CGGM loss function
for each value of the aggregation parameter |
cluster_counts |
An integer vector containing the number of clusters
obtained for each value of the aggregation parameter |
loss_progression |
A list of vectors. Contains, for each value of the
aggregation parameter |
loss_timings |
A list of vectors. Contains, for each value of the
aggregation parameter |
fusion_threshold |
The threshold value used to determine whether two clusters should be clustered. |
cluster_solution_index |
An integer vector containing the index of the
value of the aggregation parameter |
n |
The number of values of the aggregation parameter
|
inputs |
A list of the inputs of the function, used internally and in
|
Note
The function interface and output structure are still experimental and may change in the next version.
Author(s)
Daniel J.W. Touw
References
D.J.W. Touw, A. Alfons, P.J.F. Groenen and I. Wilms (2025) Clusterpath Gaussian Graphical Modeling. arXiv:2407.00644. doi:10.48550/arXiv.2407.00644.
See Also
clusterpath_weights()
, lasso_weights()
,
cggm_refit()
, cggm_cv()
Examples
## CGGM can be used to estimate a clustered precision matrix
# Generate data
set.seed(3)
Theta <- matrix(
c(2, 1, 0, 0,
1, 2, 0, 0,
0, 0, 4, 1,
0, 0, 1, 4),
nrow = 4
)
X <- mvtnorm::rmvnorm(n = 100, sigma = solve(Theta))
# Estimate the covariance matrix
S <- cov(X)
# Compute the weight matrix for the clusterpath (clustering) weights
W_cpath <- clusterpath_weights(S, phi = 1, k = 2)
# Compute the weight matrix for the lasso (sparsity) weights
W_lasso <- lasso_weights(S)
# Set values to be used for the aggregation parameter
lambdas <- seq(0, 0.2, by = 0.01)
# Estimate the precision matrix for each value of the aggregation
# parameter and a fixed value of the sparsity parameter
fit <- cggm(S, W_cpath = W_cpath, lambda_cpath = lambdas,
W_lasso = W_lasso, lambda_lasso = 0.2)
# The index of the first value for lambda for which there are 2 clusters
keep <- fit$cluster_solution_index[2]
# Accessor function that retrieve the solution with 2 clusters
get_Theta(fit, index = keep)
get_clusters(fit, index = keep)
# Often, it is not clear which values of the aggregation parameter
# make up the right sequence. But it can be expanded automatically.
fit <- cggm(S, W_cpath = W_cpath, lambda_cpath = lambdas,
W_lasso = W_lasso, lambda_lasso = 0.2,
expand = TRUE)
# A solution with 2 clusters
keep <- fit$cluster_solution_index[2]
get_Theta(fit, index = keep)
get_clusters(fit, index = keep)
## CGGM can also be used to estimate a clustered covariance matrix
# Generate data
set.seed(3)
Sigma <- matrix(
c(2, 1, 0, 0,
1, 2, 0, 0,
0, 0, 4, 1,
0, 0, 1, 4),
nrow = 4
)
X <- mvtnorm::rmvnorm(n = 100, sigma = Sigma)
# Estimate the covariance matrix and compute its inverse
S <- cov(X)
S_inv <- solve(S)
# Compute the weight matrix for the clusterpath (clustering) weights.
# The input is now the sample precision matrix.
W_cpath <- clusterpath_weights(S_inv, phi = 1, k = 2)
# Compute the weight matrix for the lasso (sparsity) weights.
# The input is again the sample precision matrix.
W_lasso <- lasso_weights(S_inv)
# Set values to be used for the aggregation parameter
lambdas <- seq(0, 0.2, by = 0.01)
# Use the sample precision matrix to estimate the covariance matrix
# for each value of the aggregation parameter and a fixed value of
# the sparsity parameter
fit <- cggm(S_inv, W_cpath = W_cpath, lambda_cpath = lambdas,
W_lasso = W_lasso, lambda_lasso = 0.2, expand = TRUE)
# A solution with 2 clusters
keep <- fit$cluster_solution_index[2]
get_Theta(fit, index = keep)
get_clusters(fit, index = keep)
Cross Validation for the Clusterpath Estimator of the Gaussian Graphical Model
Description
Perform cross validation to tune the weight matrix parameters phi
and k
(for k-nearest-neighbors) as well as the aggregation parameter
lambda_cpath
and the sparsity parameter lambda_lasso
of the
clusterpath estimator of the Gaussian Graphical Model (CGGM) in order to
obtain a sparse estimate with variable clustering of the precision matrix or
the covariance matrix. The scoring metric is the negative log-likelihood
(lower is better).
Usage
cggm_cv(
X,
tune_grid,
kfold = 5,
folds = NULL,
connected = TRUE,
fit = TRUE,
refit = TRUE,
lasso_unit_weights = FALSE,
estimate_Sigma = FALSE,
verbose = 0,
n_jobs = 1,
...
)
Arguments
X |
The |
tune_grid |
A data frame with values of the tuning parameters. Each row
is a combination of parameters that is evaluated. The columns have the names
of the tuning parameters and should include |
kfold |
The number of folds. Defaults to 5. |
folds |
Optional argument to manually set the folds for the cross
validation procedure. If this is not |
connected |
Logical, indicating whether connectedness of the weight
matrix should be ensured. Defaults to |
fit |
Logical, indicating whether the cross-validation procedure
should consider the result from |
refit |
Logical, indicating whether the cross-validation procedure
should also consider the refitted result from |
lasso_unit_weights |
Logical, indicating whether the weights in the
sparsity penalty should be all one or decreasing in the magnitude of the
corresponding element of the inverse of the sample covariance matrix.
Defaults to |
estimate_Sigma |
Logical, indicating whether CGGM should be used to
estimate the covariance matrix based on the sample precision matrix. Defaults
to |
verbose |
Determines the amount of information printed during the
cross validation. Defaults to |
n_jobs |
Number of parallel jobs used for cross validation. If 0 or
smaller, uses the maximum available number of physical cores. Defaults to
|
... |
Additional arguments to be passed down to |
Value
An object of class "CGGM_CV"
with the following components:
fit |
A list with cross-validation results for CGGM without the refitting step. It consists of four components:
|
refit |
A list with cross-validation results for CGGM including the
refitting step. It contains the same four components as above, except that
|
raw_cv_results |
A list of raw cross-validation results before restructuring. |
best |
A character string indicating whether the optimal model fit
without the refitting step ( |
Note
The function interface and output structure are still experimental and may change in the next version.
Author(s)
Daniel J.W. Touw, modifications by Andreas Alfons
References
D.J.W. Touw, A. Alfons, P.J.F. Groenen and I. Wilms (2025) Clusterpath Gaussian Graphical Modeling. arXiv:2407.00644. doi:10.48550/arXiv.2407.00644.
See Also
clusterpath_weights()
, lasso_weights()
,
cggm()
, cggm_refit()
Examples
# Generate data
set.seed(3)
Theta <- matrix(
c(2, 1, 0, 0,
1, 2, 0, 0,
0, 0, 4, 1,
0, 0, 1, 4),
nrow = 4
)
X <- mvtnorm::rmvnorm(n = 100, sigma = solve(Theta))
# Use cross-validation to select the tuning parameters
fit_cv <- cggm_cv(
X = X,
tune_grid = expand.grid(
phi = 1,
k = 2,
lambda_lasso = c(0, 0.02),
lambda = seq(0, 0.2, by = 0.01)
),
folds = cv_folds(nrow(X), 5)
)
# The best solution has 2 clusters
get_Theta(fit_cv)
get_clusters(fit_cv)
Refit the Gaussian Graphical Model for a Given Aggregation and Sparsity Structure
Description
Estimate the parameters of a clustered and sparse precision matrix or
covariance matrix based on a restricted negative log-likelihood loss
function. The restrictions are given by the provided aggregation and
sparsity structure. This function is different from cggm()
,
as there are no aggregation and sparsity penalties on the precision or
covariance matrix.
Usage
cggm_refit(cggm_output, verbose = 0)
Arguments
cggm_output |
An object of class |
verbose |
Determines the amount of information printed during the
optimization. Defaults to |
Value
An object of class "CGGM_refit"
with the following components:
A , R |
Lists of matrices. Each pair of matrices with the same index
parametrize the estimated precision matrix after the refitting step given
the aggregation structure found with the corresponding value of the
aggregation parameter |
clusters |
An integer matrix in which each row contains the cluster
assignment of each variable for the corresponding value of the aggregation
parameter |
lambdas |
A vector with the values for the aggregation parameter
|
Theta |
List of matrices. Contains the solution to the minimization
procedure for each value of the aggregation parameter |
cluster_counts |
An integer vector containing the number of clusters
obtained for each value of the aggregation parameter |
cluster_solution_index |
An integer vector containing the index of the
value of the aggregation parameter |
n |
The number of values of the aggregation parameter
|
Note
The function interface and output structure are still experimental and may change in the next version.
Author(s)
Daniel J.W. Touw
References
D.J.W. Touw, A. Alfons, P.J.F. Groenen and I. Wilms (2025) Clusterpath Gaussian Graphical Modeling. arXiv:2407.00644. doi:10.48550/arXiv.2407.00644.
See Also
Examples
# Generate data
set.seed(3)
Theta <- matrix(
c(2, 1, 0, 0,
1, 2, 0, 0,
0, 0, 4, 1,
0, 0, 1, 4),
nrow = 4
)
X <- mvtnorm::rmvnorm(n = 100, sigma = solve(Theta))
# Estimate the covariance matrix
S <- cov(X)
# Compute the weight matrix for the clusterpath (clustering) weights
W_cpath <- clusterpath_weights(S, phi = 1, k = 2)
# Compute the weight matrix for the lasso (sparsity) weights
W_lasso <- lasso_weights(S)
# Set values to be used for the aggregation parameter
lambdas <- seq(0, 0.2, by = 0.01)
# Estimate the precision matrix while automatically expanding
# the sequence of values for the aggregation parameter
fit <- cggm(S, W_cpath = W_cpath, lambda_cpath = lambdas,
W_lasso = W_lasso, lambda_lasso = 0.2,
expand = TRUE)
# Apply the refitting step to the results, estimating the
# precision matrix based on the clustering and sparsity
# patterns but without additional shrinkage
refit <- cggm_refit(fit)
# A solution with 2 clusters
keep <- refit$cluster_solution_index[2]
get_Theta(refit, index = keep)
get_clusters(refit, index = keep)
Compute the Weight Matrix for the Clusterpath Penalty
Description
Compute the (possibly sparse) weight matrix for the clusterpath penalty in the clusterpath estimator of the Gaussian graphical model (CGGM). Weights are computed based on a distance measure so that variables that are close are clustered more quickly (higher weight) than variables that are far apart (lower weight). Only neighboring variables thereby receive a nonzero weight. Additionally, groups of variables that would not be connected via nonzero weights due to the sparsity of the weight matrix can still be connected by applying a minimum spanning tree algorithm.
Usage
clusterpath_weights(S, phi, k, connected = TRUE)
Arguments
S |
The sample covariance matrix of the data. |
phi |
Tuning parameter of the weights. |
k |
The number of nearest neighbors that should be used to set weights
to a nonzero value. If |
connected |
A logical indicating whether a connected weight matrix
should be enforced. Defaults to |
Value
A weight matrix for the clusterpath penalty.
Author(s)
Daniel J.W. Touw
References
D.J.W. Touw, A. Alfons, P.J.F. Groenen and I. Wilms (2025) Clusterpath Gaussian Graphical Modeling. arXiv:2407.00644. doi:10.48550/arXiv.2407.00644.
See Also
lasso_weights()
, cggm()
,
cggm_refit()
, cggm_cv()
Examples
# Generate data
set.seed(3)
Theta <- matrix(
c(2, 1, 0, 0,
1, 2, 0, 0,
0, 0, 4, 1,
0, 0, 1, 4),
nrow = 4
)
X <- mvtnorm::rmvnorm(n = 100, sigma = solve(Theta))
# Estimate the covariance matrix
S <- cov(X)
# Compute the weight matrix for the clusterpath (clustering) weights
W_cpath <- clusterpath_weights(S, phi = 1, k = 2)
W_cpath
Create Cross-Validation Folds
Description
Obtain indices for splitting observations into K
blocks to be to be
folded into training and test data during K
-fold cross-validation.
Usage
cv_folds(n, K = 5L)
Arguments
n |
an integer giving the number of observations to be split. |
K |
an integer giving the number of blocks into which the observations should be split (the default is five). |
Value
A list of indices giving the blocks of observations to be folded into training and test data during cross-validation.
Author(s)
Andreas Alfons
See Also
cggm_cv()
Examples
cv_folds(20, K = 5)
Extract the Estimated Precision Matrix
Description
Extract a (block-structured and sparse) precision matrix obtained via the clusterpath estimator of the Gaussian graphical model (CGGM).
Usage
get_Theta(object, ...)
## S3 method for class 'CGGM'
get_Theta(object, index, ...)
## S3 method for class 'CGGM_refit'
get_Theta(object, index, ...)
## S3 method for class 'CGGM_CV'
get_Theta(object, which = NULL, ...)
Arguments
object |
an object from which to extract the precision matrix. |
... |
additional arguments are currently ignored. |
index |
an integer specifying the step along the clusterpath for which to extract the precision matrix. |
which |
a character string specifying for which solution to extract
the precision matrix. Possible values are |
Value
The estimated (block-structured and sparse) precision matrix.
For the "CGGM_CV"
method (see cggm_cv()
), the returned
precision matrix corresponds to the optimal values of the tuning parameters.
Author(s)
Daniel J.W. Touw
References
D.J.W. Touw, A. Alfons, P.J.F. Groenen and I. Wilms (2025) Clusterpath Gaussian Graphical Modeling. arXiv:2407.00644. doi:10.48550/arXiv.2407.00644.
See Also
cggm()
, cggm_refit()
, cggm_cv()
Examples
# Generate data
set.seed(3)
Theta <- matrix(
c(2, 1, 0, 0,
1, 2, 0, 0,
0, 0, 4, 1,
0, 0, 1, 4),
nrow = 4
)
X <- mvtnorm::rmvnorm(n = 100, sigma = solve(Theta))
# Estimate the covariance matrix
S <- cov(X)
# Compute the weight matrix for the clusterpath (clustering) weights
W_cpath <- clusterpath_weights(S, phi = 1, k = 2)
# Compute the weight matrix for the lasso (sparsity) weights
W_lasso <- lasso_weights(S)
# Set values to be used for the aggregation parameter
lambdas <- seq(0, 0.2, by = 0.01)
# Estimate the precision matrix while automatically expanding
# the sequence of values for the aggregation parameter
fit <- cggm(S, W_cpath = W_cpath, lambda_cpath = lambdas,
W_lasso = W_lasso, lambda_lasso = 0.2,
expand = TRUE)
# Precision matrix with 2 clusters
get_Theta(fit, index = fit$cluster_solution_index[2])
# Apply the refitting step to the results, estimating the
# precision matrix based on the clustering and sparsity
# patterns but without additional shrinkage
refit <- cggm_refit(fit)
# Precision matrix with 2 clusters
get_Theta(refit, index = refit$cluster_solution_index[2])
Extract the Cluster Assignment
Description
Extract a cluster assignment obtained via the clusterpath estimator of the Gaussian graphical model (CGGM).
Usage
get_clusters(object, ...)
## S3 method for class 'CGGM'
get_clusters(object, index, ...)
## S3 method for class 'CGGM_refit'
get_clusters(object, index, ...)
## S3 method for class 'CGGM_CV'
get_clusters(object, which = NULL, ...)
Arguments
object |
an object from which to extract the cluster assignment. |
... |
additional arguments are currently ignored. |
index |
an integer specifying the step along the clusterpath for which to extract the cluster assignment. |
which |
a character string specifying for which solution to extract
the cluster assignment. Possible values are |
Value
An integer vector giving the obtained cluster assignment for each variable.
For the "CGGM_CV"
method (see cggm_cv()
), the returned
cluster assignment corresponds to the optimal values of the tuning
parameters.
Author(s)
Daniel J.W. Touw
References
D.J.W. Touw, A. Alfons, P.J.F. Groenen and I. Wilms (2025) Clusterpath Gaussian Graphical Modeling. arXiv:2407.00644. doi:10.48550/arXiv.2407.00644.
See Also
cggm()
, cggm_refit()
, cggm_cv()
Examples
# Generate data
set.seed(3)
Theta <- matrix(
c(2, 1, 0, 0,
1, 2, 0, 0,
0, 0, 4, 1,
0, 0, 1, 4),
nrow = 4
)
X <- mvtnorm::rmvnorm(n = 100, sigma = solve(Theta))
# Estimate the covariance matrix
S <- cov(X)
# Compute the weight matrix for the clusterpath (clustering) weights
W_cpath <- clusterpath_weights(S, phi = 1, k = 2)
# Compute the weight matrix for the lasso (sparsity) weights
W_lasso <- lasso_weights(S)
# Set values to be used for the aggregation parameter
lambdas <- seq(0, 0.2, by = 0.01)
# Estimate the precision matrix while automatically expanding
# the sequence of values for the aggregation parameter
fit <- cggm(S, W_cpath = W_cpath, lambda_cpath = lambdas,
W_lasso = W_lasso, lambda_lasso = 0.2,
expand = TRUE)
# Cluster membership for 2 clusters
get_clusters(fit, index = fit$cluster_solution_index[2])
# Apply the refitting step to the results, estimating the
# precision matrix based on the clustering and sparsity
# patterns but without additional shrinkage
refit <- cggm_refit(fit)
# Cluster membership for 2 clusters
get_clusters(refit, index = refit$cluster_solution_index[2])
Compute the Weight Matrix for the Lasso Penalty
Description
Compute the weight matrix for the lasso penalty in the clusterpath estimator of the Gaussian graphical model (CGGM).
Usage
lasso_weights(S, unit = FALSE)
Arguments
S |
The sample covariance matrix of the data. |
unit |
A logical indicating whether the weights should be all one or
based on the inverse of |
Value
A weight matrix for the lasso penalty.
Author(s)
Daniel J.W. Touw
References
D.J.W. Touw, A. Alfons, P.J.F. Groenen and I. Wilms (2025) Clusterpath Gaussian Graphical Modeling. arXiv:2407.00644. doi:10.48550/arXiv.2407.00644.
See Also
clusterpath_weights()
, cggm()
,
cggm_refit()
, cggm_cv()
Examples
# Generate data
set.seed(3)
Theta <- matrix(
c(2, 1, 0, 0,
1, 2, 0, 0,
0, 0, 4, 1,
0, 0, 1, 4),
nrow = 4
)
X <- mvtnorm::rmvnorm(n = 100, sigma = solve(Theta))
# Estimate the covariance matrix
S <- cov(X)
# Compute the weight matrix for the lasso (sparsity) weights
W_lasso <- lasso_weights(S)
W_lasso
Calculate the Minimum Number of Clusters
Description
Compute the minimum number of clusters achievable by the clusterpath penalty using the provided weight matrix.
Usage
min_clusters(W)
Arguments
W |
The weight matrix for the clusterpath penalty. |
Value
An integer giving the minimum number of clusters.
Author(s)
Daniel J.W. Touw
References
D.J.W. Touw, A. Alfons, P.J.F. Groenen and I. Wilms (2025) Clusterpath Gaussian Graphical Modeling. arXiv:2407.00644. doi:10.48550/arXiv.2407.00644.
See Also
clusterpath_weights()
, cggm()
,
cggm_cv()
Examples
# Generate data
set.seed(3)
Theta <- matrix(
c(2, 1, 0, 0,
1, 2, 0, 0,
0, 0, 4, 1,
0, 0, 1, 4),
nrow = 4
)
X <- mvtnorm::rmvnorm(n = 100, sigma = solve(Theta))
# Estimate the covariance matrix
S <- cov(X)
# Compute the weight matrix for the clusterpath (clustering) weights
# without enforcing connectedness
W_cpath <- clusterpath_weights(S, phi = 1, k = 1, connected = FALSE)
# The smallest number of clusters is 2
min_clusters(W_cpath)
# Compute the weight matrix for the clusterpath (clustering) weights
# with enforcing connectedness (default behavior)
W_cpath <- clusterpath_weights(S, phi = 1, k = 1, connected = TRUE)
# The smallest number of clusters is 1
min_clusters(W_cpath)