| Type: | Package |
| Title: | Scalable Causal Discovery and Model Selection on Mixed Datasets with 'rCausalMGM' |
| Version: | 1.0 |
| Date: | 2026-02-09 |
| Author: | Tyler C Lovelace [aut], Max Dudek [aut], Jack Fiore [aut], Panayiotis V Benos [aut, cre] |
| Maintainer: | Panayiotis V Benos <pbenos@ufl.edu> |
| Description: | Scalable methods for learning causal graphical models from mixed data, including continuous, discrete, and censored variables. The package implements CausalMGM, which combines a convex, score-based approach for learning an initial moralized graph with a producer-consumer scheme that enables efficient parallel conditional independence testing in constraint-based causal discovery algorithms. The implementation supports high-dimensional datasets and provides individual access to core components of the workflow, including MGM and the PC-Stable and FCI-Stable causal discovery algorithms. To support practical applications, the package includes multiple model selection strategies, including information criteria based on likelihood and model complexity, cross-validation for out-of-sample likelihood estimation, and stability-based approaches that assess graph robustness across subsamples. |
| License: | GPL-3 |
| Imports: | Rcpp (≥ 1.0.3), survival |
| LinkingTo: | BH, Rcpp, RcppArmadillo, RcppThread |
| Suggests: | Rgraphviz, graph |
| RoxygenNote: | 7.3.2 |
| Encoding: | UTF-8 |
| NeedsCompilation: | yes |
| Packaged: | 2026-02-22 23:54:06 UTC; tyler |
| Repository: | CRAN |
| Date/Publication: | 2026-03-03 10:20:02 UTC |
Structural Hamming Distance (SHD)
Description
Calculate the Structural Hamming Distance (SHD) between two graphs.
Usage
SHD(graph1, graph2)
Arguments
graph1 |
A graph object |
graph2 |
A graph object |
Value
The SHD btween the two graph objects
Examples
sim <- simRandomDAG(200, 25, deg=2)
g <- pcStable(sim$data)
SHD(g, cpdag(sim$graph))
Convert an adjacency matrix into a graph
Description
Convert an adjacency matrix into a graph
Usage
adjMat2Graph(adj, nodes, directed = FALSE)
Arguments
adj |
The adjacency matrix, p x p, with non-zero values indicating the presence of an adjacency. |
nodes |
The names of the nodes, length p. |
directed |
TRUE if the graph should be directed. This default is FALSE. |
Value
A graph object representing the adjacency matrix.
Examples
mat <- matrix(sample(c(0,1), 16, replace=TRUE), nrow=4)
mat <- mat + t(mat)
nodes <- c("X1", "X2", "X3", "X4")
g <- adjMat2Graph(mat, nodes)
Combined graph recovery metrics
Description
Calculate the SHD, precision, recall, F1, and Matthew's Correlation Coefficient (MCC) for the adjacencies and orientations of an estimated graph compared to the ground truth. This is the concatenated output of the SHD, adjacency PR metrics, and the orientation PR metrics.
Usage
allMetrics(estimate, groundTruth, groundTruthDAG = NULL)
Arguments
estimate |
An estimated graph object |
groundTruth |
A ground truth graph object of the same type as the estimated graph object |
groundTruthDAG |
A ground truth graph object containing the true causal DAG. Only necessary for calculating the or precision, recall, F1, and MCC for partial ancestral graphs (PAGs) |
Value
The orientation precision, recall, F1, and MCC, between the two graph objects
Examples
sim <- simRandomDAG(200, 25, deg=2)
g <- pcStable(sim$data)
allMetrics(g, cpdag(sim$graph))
Runs bootstrapping for a causal graph on the dataset.
Description
Runs bootstrapping for a causal graph on the dataset. This function can be used to estimate the stability of edge adjacencies and orientations in the causal graph. It returns an ensemble graph which consists of the most common edges accross bootstrap samples. The ensemble graph is constructed based on edge-wise probabilities, so it is not guaranteed to be a valid CPDAG or PAG. The ensemble graph's stabilites entry contains information about the frequency of each possible orientation for each edge that appears at least once across bootstrap samples.
Usage
bootstrap(
data,
graph,
knowledge = NULL,
numBoots = 20L,
threads = -1L,
replace = FALSE,
rank = FALSE,
verbose = FALSE
)
Arguments
data |
A data.frame containing the dataset to be used for estimating the MGM, with each row representing a sample and each column representing a variable. All continuous variables must be of the numeric type, while categorical variables must be factor or character. Any rows with missing values will be dropped. |
graph |
A graph object containing the graph to estimate the stability of through bootstrapping. |
knowledge |
A knowledge object containing prior knowledge about the causal interactions in a dataset. This knowledge can be used to forbid or require certain edges in the causal graph, helping to inform causal discovery an prevent orientations known to be nonsensical. The default is NULL, in which case no prior knowledge is provided to the causal discovery algorithm. |
numBoots |
The number of bootstrap samples to run. The default is 20. |
threads |
An integer value denoting the number of threads to use for parallelization. The default value is -1, which will all available CPUs. |
replace |
A logical value indicating whether to use sampling with replacement or to draw subsamples of size floor(0.632 * N). The default value is FALSE. |
rank |
A logical value indicating whether to use the nonparanormal transform to learn rank-based associations. The default is FALSE. |
verbose |
A logical value indicating whether to print progress updates. The default is FALSE. |
Value
A graph object representing an ensemble graph learned from bootstrapped samples. For each adjacency observed across the bootstrap graphs, if absence is not the most frequent outcome, the edge orientation with the highest frequency is included in the ensemble graph. The object also contains a 'stabilities' data frame that records the frequencies of all possible edge orientations for each observed adjacency. The ensemble graph may not becorrespond to a valid CPDAG or PAG and is not guaranteed to represent a causal graph.
Examples
sim <- simRandomDAG(200, 25, deg=2)
g <- pcStable(sim$data)
g.boot <- bootstrap(sim$data, g)
print(g.boot)
print(g.boot$stabilities[1:6,])
Runs the BOSS causal discovery algorithm on the dataset
Description
Runs the BOSS causal discovery algorithm on the dataset
Usage
boss(
data,
numStarts = 3L,
penalty = 2,
threads = -1L,
rank = FALSE,
verbose = FALSE
)
Arguments
data |
A data.frame containing the dataset to be used for estimating the MGM, with each row representing a sample and each column representing a variable. All continuous variables must be of the numeric type, while categorical variables must be factor or character. Any rows with missing values will be dropped. |
numStarts |
The number of restarts (with different randomly sampled initial topological orders). Reduces the variance that can result from being stuck with an unfavorable initial starting order. |
penalty |
A numeric value that represents the strength of the penalty for model complexity. The default value is 2, which corresponds to twice the BIC penalty. |
threads |
An integer value denoting the number of threads to use for parallelization. The default value is -1, which will all available CPUs. |
rank |
A logical value indicating whether to use the nonparanormal transform to learn rank-based associations. The default is FALSE. |
verbose |
A logical value indicating whether to print progress updates. The default is FALSE. |
Value
The CPDAG learned by BOSS
Examples
sim <- simRandomDAG(200, 25, deg=2)
g <- boss(sim$data)
print(g)
Calculate the CoxMGM graph on a dataset.
Description
Calculate the CoxMGM graph on a dataset. The dataset must contain at least one censored variable formatted as Surv object from the survival package.
Usage
coxmgm(
data,
lambda = as.numeric(c(0.2, 0.2, 0.2, 0.2, 0.2)),
rank = FALSE,
verbose = FALSE
)
Arguments
data |
A data.frame containing the dataset to be used for estimating the MGM, with each row representing a sample and each column representing a variable. All continuous variables must be of the numeric type, while categorical variables must be factor or character. All censored variables must be a survival::Surv object. Any rows with missing values will be dropped. |
lambda |
A numeric vector of five values for the regularization parameter lambda: the first for continuous-continuous edges, the second for continuous-discrete, the third for discrete-discrete, the fourth for continuous-survival, and the fifth for discrete-survival. Defaults to c(0.2, 0.2, 0.2, 0.2, 0.2). If a single value is provided, all three values in the vector will be set to that value. |
rank |
A logical value indicating whether to use the nonparanormal transform to learn rank-based associations. The default is FALSE. |
verbose |
A logical value indicating whether to print updates on the progress of optimizing MGM. The default is FALSE. |
Value
The calculated CoxMGM graph
Examples
sim <- simRandomDAG(200, 25, 1)
ig <- coxmgm(sim$data)
print(ig)
Implements k-fold cross-validation for CoxMGM
Description
Calculate the solution path for a CoxMGM graph on a dataset with k-fold cross-validation. The dataset must contain at least one censored variable formatted as Surv object from the survival package. This function returns the graph that minimizes negative log(pseudolikelihood) and the graph selected by the one standard error rule.
Usage
coxmgmCV(
data,
lambdas = NULL,
nLambda = 30L,
nfolds = 5L,
foldid = NULL,
rank = FALSE,
verbose = FALSE
)
Arguments
data |
A data.frame containing the dataset to be used for estimating the CoxMGM, with each row representing a sample and each column representing a variable. All continuous variables must be of the numeric type, while categorical variables must be factor or character. All censored variables must be a survival::Surv object. Any rows with missing values will be dropped. |
lambdas |
A numeric vector containing the values of lambda to learn an MGM with. The default value is NULL, in which case a log-spaced vector of nLambda values for lambda will be supplied instead. |
nLambda |
A numeric value indicating the number of lambda values to test when the lambdas vector is NULL. The default is 30. |
nfolds |
An integer value defining the number of folds to be used for cross-validation if foldid is NULL. The default value is 5. |
foldid |
An integer vector containing values in the range of 1 to K for each sample that identifies which test set that sample belongs to. This enables users to define their own cross-validation splits, for example in the case stratified cross-validation is needed. The default value is NULL. |
rank |
A logical value indicating whether to use the nonparanormal transform to learn rank-based associations. The default is FALSE. |
verbose |
A logical value indicating whether to print progress updates. The default is FALSE. |
Value
A graphCV object that contains the minimum and one standard error rule selected graphs.
Examples
sim <- simRandomDAG(200, 25, 1)
ig.cv <- coxmgmCV(sim$data)
print(ig.cv)
Estimates a solution path for CoxMGM
Description
Calculate the solution path for a CoxMGM graph on a dataset. The dataset must contain at least one censored variable formatted as Surv object from the survival package. It also returns the models selected by the BIC and AIC scores.
Usage
coxmgmPath(data, lambdas = NULL, nLambda = 30L, rank = FALSE, verbose = FALSE)
Arguments
data |
A data.frame containing the dataset to be used for estimating the CoxMGM, with each row representing a sample and each column representing a variable. All continuous variables must be of the numeric type, while categorical variables must be factor or character. All censored variables must be a survival::Surv object. Any rows with missing values will be dropped. |
lambdas |
A numeric vector containing the values of lambda to learn an MGM with. The default value is NULL, in which case a log-spaced vector of nLambda values for lambda will be supplied instead. |
nLambda |
A numeric value indicating the number of lambda values to test when the lambdas vector is NULL. The default is 30. |
rank |
A logical value indicating whether to use the nonparanormal transform to learn rank-based associations. The default is FALSE. |
verbose |
A logical value indicating whether to print progress updates. The default is FALSE. |
Value
A graphPath object that contains CoxMGM graphs learned by the solution path, as well as the BIC and AIC selected models
Examples
sim <- simRandomDAG(200, 25, 1)
ig.path <- coxmgmPath(sim$data)
print(ig.path)
Calculate the CPDAG for a given DAG
Description
Create the completed partially directed acyclic graph (CPDAG) for the input directed acyclic graph (DAG). The CPDAG represents the Markov equivalence class of the true cauasl DAG. The PC algorithms are only identifiable up to the Markov equivalence class, so assessments of causal structure recovery should be compared to the CPDAG rather than the causal DAG.
Usage
cpdag(graph)
Arguments
graph |
The graph object used to generate the CPDAG. Should be the ground-truth causal DAG |
Value
The CPDAG corresponding to the input DAG
Examples
sim <- simRandomDAG(200, 25, deg=2)
sim$cpdag <- cpdag(sim$graph)
print(sim$cpdag)
A function to create a prior knowledge object for use with causal discovery algorithms
Description
A function to create a prior knowledge object for use with causal discovery algorithms
Usage
createKnowledge(
tiers = list(),
forbiddenWithinTier = NULL,
forbidden = list(),
required = list()
)
Arguments
tiers |
A list containing ordered vectors of variables where variables in tier t can only be ancestors of variables in tiers t+1 ... T and descendants of variables in tiers (1 .. t-1). If tiers are used, all variables must be in a tier, and no variable can be in multiple tiers. |
forbiddenWithinTier |
A vector of logical values indicating whether edges are allowed between variables in a given tier. The value is NULL by default, which results in forbiddenWithinTier being set to FALSE for each tier. |
forbidden |
A list containing vectors of node pairs that forbid a specific directed edge. For example, to forbid A –> B, add c("A", "B") to forbidden. |
required |
A list containing vectors of node pairs that require the presence of a specific directed edge. For example, to require B –> A, add c("B", "A") to required. |
Value
A knowledge object that can be passed to causal discovery algorithms.
Implements k-fold cross-validation for FCI-Stable
Description
Runs k-fold cross-validation to select the value of alpha and orientation rule for FCI-Stable. Returns a graphCV object containing the causal graphical models that minimize the negative log(pseudo-likelihood) and the sparsest model within one standard error of the minimum.
Usage
fciCV(
data,
initialGraph = NULL,
knowledge = NULL,
orientRule = as.character(c("majority", "maxp", "conservative")),
alphas = NULL,
nfolds = 5L,
foldid = NULL,
threads = -1L,
fdr = FALSE,
rank = FALSE,
verbose = FALSE
)
Arguments
data |
A data.frame containing the dataset to be used for estimating the MGM, with each row representing a sample and each column representing a variable. All continuous variables must be of the numeric type, while categorical variables must be factor or character. Any rows with missing values will be dropped. |
initialGraph |
An undirected rCausalMGM graph object containing the initial skeleton of adjacencies used in the causal discovery algorithm. This graph can be learned by 'mgm' or learned by another method and imported into an undirected rCausalMGM graph object from its adjacency matrix. The default is NULL, in which case a fully connected graph is used as the initial skeleton. |
knowledge |
A knowledge object containing prior knowledge about the causal interactions in a dataset. This knowledge can be used to forbid or require certain edges in the causal graph, helping to inform causal discovery an prevent orientations known to be nonsensical. The default is NULL, in which case no prior knowledge is provided to the causal discovery algorithm. |
orientRule |
A vector of strings to determine which of the orientation rules to test in the cross-validation procedure to select the optimal model. The default is a vector that contains the "majority", "maxp", and "conservative" orientation rules. |
alphas |
A numeric vector containing values of alpha to test in the cross-validation procedure. The default value is NULL, in which case we set alpha = c(0.001, 0.005, 0.01, 0.05, 0.1, 0.15, 0.2). |
nfolds |
An integer value defining the number of folds to be used for cross-validation if foldid is NULL. The default value is 5. |
foldid |
An integer vector containing values in the range of 1 to K for each sample that identifies which test set that sample belongs to. This enables users to define their own cross-validation splits, for example in the case stratified cross-validation is needed. The default value is NULL. |
threads |
An integer value denoting the number of threads to use for parallelization of independence tests. The default value is -1, which will all available CPUs. |
fdr |
A logical value indicating whether to use false discovery rate control for the discovery of adjacencies in the causal graph. The default value is FALSE. |
rank |
A logical value indicating whether to use the nonparanormal transform to learn rank-based associations. The default is FALSE. |
verbose |
A logical value indicating whether to print progress updates. The default is FALSE. |
Value
A graphCV object containing the PAGs selected by the minimum and one standard error rule.
Examples
sim <- simRandomDAG(200, 25, deg=2)
g.cv <- fciCV(sim$data)
print(g.cv)
Runs the causal discovery algorithm FCI-Stable on a dataset.
Description
Runs the causal discovery algorithm FCI-Stable on a dataset. The FCI-Stable algorithm is designed to recover the Markov equivalence class of causal MAGs that could give rise to the observed conditional independence relationships in the causally insufficient case. This means that FCI-Stable can still learn the Markov equivalence class of the true MAG even in the presence of latent confounders and/or selection bias. The resulting graph is a partial ancestral graph (PAG).
Usage
fciStable(
data,
initialGraph = NULL,
knowledge = NULL,
orientRule = as.character(c("majority")),
alpha = 0.05,
threads = -1L,
possDsep = TRUE,
fdr = FALSE,
rank = FALSE,
verbose = FALSE
)
Arguments
data |
A data.frame containing the dataset to be used for estimating the MGM, with each row representing a sample and each column representing a variable. All continuous variables must be of the numeric type, while categorical variables must be factor or character. Any rows with missing values will be dropped. |
initialGraph |
An undirected rCausalMGM graph object containing the initial skeleton of adjacencies used in the causal discovery algorithm. This graph can be learned by 'mgm' or learned by another method and imported into an undirected rCausalMGM graph object from its adjacency matrix. The default is NULL, in which case a fully connected graph is used as the initial skeleton. |
knowledge |
A knowledge object containing prior knowledge about the causal interactions in a dataset. This knowledge can be used to forbid or require certain edges in the causal graph, helping to inform causal discovery an prevent orientations known to be nonsensical. The default is NULL, in which case no prior knowledge is provided to the causal discovery algorithm. |
orientRule |
Determines which of the four possible orientation rules will be utilized to orient colliders in the FCI-Stable algorithm. Possible options are "majority", "maxp", "conservative", and "sepsets". The default value is "majority". Additionally, a vector of valid orientation rules can be provided, and fciStable will return a list containing the graphs learned with each. |
alpha |
A numeric value containing the significance threshold alpha for the conditional independence tests used during constraint-based causal discovery. This parameter directly controls graph sparsity, with low values of alpha yielding sparse graphs and high values yielding dense graphs. The default value is 0.05. |
threads |
An integer value denoting the number of threads to use for parallelization of independence tests. The default value is -1, which will all available CPUs. |
possDsep |
A logical value indicating whether to perform the possible-D-Sep search stage of the FCI algorithm. The possible-D-Sep search is necessaey fro correctness but can be computationally expensive in dense or high-dimensional or graphs. If set to FALSE, the RFCI rule R0 will be applied to remove some of the extraneous adjacencies that would have been removed by possible-D-Sep search. The default value is TRUE. |
fdr |
A logical value indicating whether to use false discovery rate control for the discovery of adjacencies in the causal graph. The default value is FALSE. |
rank |
A logical value indicating whether to use the nonparanormal transform to learn rank-based associations. The default is FALSE. |
verbose |
A logical value indicating whether to print progress updates. The default is FALSE. |
Value
The PAG learned by FCI-Stable.
Examples
sim <- simRandomDAG(200, 50, deg=3)
g <- fciStable(sim$data)
print(g)
Implements StARS for FCI-Stable
Description
Runs StARS to select the value of alpha for FCI-Stable based on adjacency stability. Returns a graphSTARS object containing the PAG selected by StARS and the adjacency instabilities for each alpha.
Usage
fciStars(
data,
initialGraph = NULL,
knowledge = NULL,
orientRule = as.character(c("majority")),
alphas = NULL,
gamma = 0.01,
numSub = 20L,
subSize = -1L,
leaveOneOut = FALSE,
threads = -1L,
rank = FALSE,
verbose = FALSE
)
Arguments
data |
A data.frame containing the dataset to be used for estimating the MGM, with each row representing a sample and each column representing a variable. All continuous variables must be of the numeric type, while categorical variables must be factor or character. Any rows with missing values will be dropped. |
initialGraph |
An undirected rCausalMGM graph object containing the initial skeleton of adjacencies used in the causal discovery algorithm. This graph can be learned by 'mgm' or learned by another method and imported into an undirected rCausalMGM graph object from its adjacency matrix. The default is NULL, in which case a fully connected graph is used as the initial skeleton. |
knowledge |
A knowledge object containing prior knowledge about the causal interactions in a dataset. This knowledge can be used to forbid or require certain edges in the causal graph, helping to inform causal discovery an prevent orientations known to be nonsensical. The default is NULL, in which case no prior knowledge is provided to the causal discovery algorithm. |
orientRule |
Determines which of the four possible orientation rules will be utilized to orient colliders in the FCI-Stable algorithm. Possible options are "majority", "maxp", "conservative", and "sepsets". The default value is "majority". |
alphas |
A numeric vector containing values of alpha to test in the cross-validation procedure. The default value is NULL, in which case we set alpha = c(0.001, 0.005, 0.01, 0.05, 0.1, 0.15, 0.2). |
gamma |
The threshold for edge instability. The default value is 0.01, and it is not recommended to change this value. |
numSub |
The number of subsamples of the dataset used to estimate edge instability. The default value is 20. |
subSize |
The number of samples to be drawn without replacement for each subsample. The default value is -1. When subSize is -1, it is set to min(floor(0.75 * N), floor(10*sqrt(N))), where N is the number of samples. |
leaveOneOut |
If TRUE, performs leave-one-out subsampling. Defaults to FALSE. |
threads |
An integer value denoting the number of threads to use for parallelization of independence tests. The default value is -1, which will all available CPUs. |
rank |
A logical value indicating whether to use the nonparanormal transform to learn rank-based associations. The default is FALSE. |
verbose |
A logical value indicating whether to print progress updates. The default is FALSE. |
Value
A graphSTARS object containing the PAG selected by StARS and the instabilities at each value of alpha.
Examples
sim <- simRandomDAG(200, 25, deg=2)
g.stars <- fciStars(sim$data)
print(g.stars)
A function to generate a data.frame for objects from graph class. It incorporates adjacency and orientation frequency if estimates of edge stability are available.
Description
A function to generate a data.frame for objects from graph class. It incorporates adjacency and orientation frequency if estimates of edge stability are available.
Usage
graphTable(graph, stabilities = NULL)
Arguments
graph |
The graph object |
stabilities |
The stability data.frame from bootstrapping or StEPS. If NULL, the stabilities entry of the graph object is used. If that is also NULL, only edge interactions are returned. The default is NULL |
Value
A data.frame containing source, target, and interaction columns for each edge in the graph. If stabilities are available, then the adjFrequency and orientation frequencies (if applicable) are returned for each edge.
Runs the GRaSP causal discovery algorithm on the dataset
Description
Runs the GRaSP causal discovery algorithm on the dataset
Usage
grasp(
data,
depth = 2L,
numStarts = 3L,
penalty = 2,
bossInit = FALSE,
threads = -1L,
rank = FALSE,
verbose = FALSE
)
Arguments
data |
A data.frame containing the dataset to be used for estimating the MGM, with each row representing a sample and each column representing a variable. All continuous variables must be of the numeric type, while categorical variables must be factor or character. Any rows with missing values will be dropped. |
depth |
The maximum search depth used in the depth-first search in GRaSP. |
numStarts |
The number of restarts (with different randomly sampled initial topological orders). Reduces the variance that can result from being stuck with an unfavorable initial starting order. |
penalty |
A numeric value that represents the strength of the penalty for model complexity. The default value is 2, which corresponds to twice the BIC penalty. |
bossInit |
A logical value indicating whether to initialize the causal order for GRaSP with the forward search procedure of BOSS. |
threads |
An integer value denoting the number of threads to use for parallelization. The default value is -1, which will all available CPUs. |
rank |
A logical value indicating whether to use the nonparanormal transform to learn rank-based associations. The default is FALSE. |
verbose |
A logical value indicating whether to print progress updates. The default is FALSE. |
Value
The CPDAG learned by GRaSP
Examples
sim <- simRandomDAG(200, 25, deg=2)
g <- grasp(sim$data)
print(g)
Implements Grow-Shrink algorithm for Markov blanket identification
Description
Runs the Grow-Shrink algorithm to find the Markov blanket of a feature in a dataset
Usage
growShrinkMB(data, target, penalty = 1, rank = FALSE, verbose = FALSE)
Arguments
data |
A data.frame containing the dataset to be used for estimating the MGM, with each row representing a sample and each column representing a variable. All continuous variables must be of the numeric type, while categorical variables must be factor or character. Any rows with missing values will be dropped. |
target |
A string denoting the name of the target variable to identify the Markov blanket of. |
penalty |
A numeric value that represents the strength of the penalty for model complexity. The default value is 1, which corresponds to the BIC score. |
rank |
A logical value indicating whether to use the nonparanormal transform to learn rank-based associations. The default is FALSE. |
verbose |
A logical value indicating whether to print progress updates. The default is FALSE. |
Value
The list of features in the Markov Blanket and the BIC score
Examples
sim <- simRandomDAG(200, 25, deg=2)
mb <- growShrinkMB(sim$data, "X1")
print(mb)
Load a graph from a ".txt" file
Description
Load a graph from a ".txt" file
Usage
loadGraph(filename)
Arguments
filename |
The graph file |
Value
The graph as a graph object, which can be passed into search functions
Calculate the Mixed Graphical Model (MGM) graph on a dataset.
Description
Calculate the MGM graph on a dataset. The dataset may contain continuous and discrete variables. In the case that it contains only continuous variables, MGM reduces to a pseudo-likelihood estimate of the graphical LASSO, and in the case that it contains only discrete variables, MGM reduces to a pseudo-likelihood estimate of a pairwise Markov random field.
Usage
mgm(data, lambda = as.numeric(c(0.2, 0.2, 0.2)), rank = FALSE, verbose = FALSE)
Arguments
data |
A data.frame containing the dataset to be used for estimating the MGM, with each row representing a sample and each column representing a variable. All continuous variables must be of the numeric type, while categorical variables must be factor or character. Any rows with missing values will be dropped. |
lambda |
A numeric vector of three values for the regularization parameter lambda: the first for continuous-continuous edges, the second for continuous-discrete, and the third for discrete-discrete. Defaults to c(0.2, 0.2, 0.2). If a single value is provided, all three values in the vector will be set to that value. |
rank |
A logical value indicating whether to use the nonparanormal transform to learn rank-based associations. The default is FALSE. |
verbose |
A logical value indicating whether to print updates on the progress of optimizing MGM. The default is FALSE. |
Value
The calculated MGM graph
Examples
sim <- simRandomDAG(200, 25, deg=2)
g <- mgm(sim$data)
print(g)
Implements k-fold cross-validation for MGM
Description
Calculate the solution path for an MGM graph on a dataset with k-fold cross-validation. This function returns the graph that minimizes negative log(pseudolikelihood) and the graph selected by the one standard error rule.
Usage
mgmCV(
data,
lambdas = NULL,
nLambda = 30L,
nfolds = 5L,
foldid = NULL,
rank = FALSE,
verbose = FALSE
)
Arguments
data |
A data.frame containing the dataset to be used for estimating the MGM, with each row representing a sample and each column representing a variable. All continuous variables must be of the numeric type, while categorical variables must be factor or character. Any rows with missing values will be dropped. |
lambdas |
A numeric vector containing the values of lambda to learn an MGM with. The default value is NULL, in which case a log-spaced vector of nLambda values for lambda will be supplied instead. |
nLambda |
A numeric value indicating the number of lambda values to test when the lambdas vector is NULL. The default is 30. |
nfolds |
An integer value defining the number of folds to be used for cross-validation if foldid is NULL. The default value is 5. |
foldid |
An integer vector containing values in the range of 1 to K for each sample that identifies which test set that sample belongs to. This enables users to define their own cross-validation splits, for example in the case stratified cross-validation is needed. The default value is NULL. |
rank |
A logical value indicating whether to use the nonparanormal transform to learn rank-based associations. The default is FALSE. |
verbose |
A logical value indicating whether to print progress updates. The default is FALSE. |
Value
A graphCV object that contains the minimum and one standard error rule selected graphs.
Examples
sim <- simRandomDAG(200, 25, deg=2)
ig.cv <- mgmCV(sim$data)
print(ig.cv)
Estimates a solution path for MGM
Description
Calculate the solution path for an MGM graph on a dataset. It also returns the models selected by the BIC and AIC scores.
Usage
mgmPath(data, lambdas = NULL, nLambda = 30L, rank = FALSE, verbose = FALSE)
Arguments
data |
A data.frame containing the dataset to be used for estimating the MGM, with each row representing a sample and each column representing a variable. All continuous variables must be of the numeric type, while categorical variables must be factor or character. Any rows with missing values will be dropped. |
lambdas |
A numeric vector containing the values of lambda to learn an MGM with. The default value is NULL, in which case a log-spaced vector of nLambda values for lambda will be supplied instead. |
nLambda |
A numeric value indicating the number of lambda values to test when the lambdas vector is NULL. The default is 30. |
rank |
A logical value indicating whether to use the nonparanormal transform to learn rank-based associations. The default is FALSE. |
verbose |
A logical value indicating whether to print progress updates. The default is FALSE. |
Value
A graphPath object that contains MGM graphs learned by the solution path, as well as the BIC and AIC selected models
Examples
sim <- simRandomDAG(200, 25, deg=2)
ig.path <- mgmPath(sim$data)
print(ig.path)
Implements k-fold cross-validation for MGM-FCI-Stable
Description
Runs k-fold cross-validation to select the value of lambda, alpha, and the orientation rule for MGM-FCI-Stable. Returns a graphCV object containing the causal graphical models that minimize the negative log(pseudo-likelihood) and the sparsest model within one standard error of the minimum.
Usage
mgmfciCV(
data,
knowledge = NULL,
cvType = "random",
orientRule = as.character(c("majority", "maxp", "conservative")),
lambdas = NULL,
nLambda = 20L,
alphas = NULL,
numPoints = 60L,
nfolds = 5L,
foldid = NULL,
threads = -1L,
fdr = FALSE,
rank = FALSE,
verbose = FALSE
)
Arguments
data |
A data.frame containing the dataset to be used for estimating the MGM, with each row representing a sample and each column representing a variable. All continuous variables must be of the numeric type, while categorical variables must be factor or character. Any rows with missing values will be dropped. |
knowledge |
A knowledge object containing prior knowledge about the causal interactions in a dataset. This knowledge can be used to forbid or require certain edges in the causal graph, helping to inform causal discovery an prevent orientations known to be nonsensical. The default is NULL, in which case no prior knowledge is provided to the causal discovery algorithm. |
cvType |
A string determining whether to perform random search or grid search cross-validation, indicated by "random" or "grid" respectively. The default value is "random". |
orientRule |
A vector of strings to determine which of the orientation rules to test in the cross-validation procedure to select the optimal model. The default is a vector that contains the "majority", "maxp", and "conservative" orientation rules. |
lambdas |
A numeric vector containing the values of lambda to learn an MGM with. The default value is NULL, in which case a log-spaced vector of nLambda values for lambda will be supplied instead. |
nLambda |
A numeric value indicating the number of lambda values to test when the lambdas vector is NULL. The default is 20. |
alphas |
A numeric vector containing values of alpha to test in the cross-validation procedure. The default value is NULL, in which case we set alpha = c(0.001, 0.005, 0.01, 0.05, 0.1, 0.15, 0.2). |
numPoints |
An integer value containing indicating the number of samples to draw uniformly from the search space if performing random search cross-validation. The default is 60, the number of points required to have a 5% chance of sampling a model in the top 5% of the search space. |
nfolds |
An integer value defining the number of folds to be used for cross-validation if foldid is NULL. The default value is 5. |
foldid |
An integer vector containing values in the range of 1 to K for each sample that identifies which test set that sample belongs to. This enables users to define their own cross-validation splits, for example in the case stratified cross-validation is needed. The default value is NULL. |
threads |
An integer value denoting the number of threads to use for parallelization of independence tests. The default value is -1, which will all available CPUs. |
fdr |
A logical value indicating whether to use false discovery rate control for the discovery of adjacencies in the causal graph. The default value is FALSE. |
rank |
A logical value indicating whether to use the nonparanormal transform to learn rank-based associations. The default is FALSE. |
verbose |
A logical value indicating whether to print progress updates. The default is FALSE. |
Value
A graphCV object containing the PAGs selected by the minimum and one standard error rule.
Examples
sim <- simRandomDAG(200, 25, deg=2)
g.cv <- mgmfciCV(sim$data)
print(g.cv)
Implements k-fold cross-validation for MGM-PC-Stable
Description
Runs k-fold cross-validation to select the value of lambda, alpha, and the orientation rule for MGM-PC-Stable. Returns a graphCV object containing the causal graphical models that minimize the negative log(pseudo-likelihood) and the sparsest model within one standard error of the minimum.
Usage
mgmpcCV(
data,
knowledge = NULL,
cvType = "random",
orientRule = as.character(c("majority", "maxp", "conservative")),
lambdas = NULL,
nLambda = 20L,
alphas = NULL,
numPoints = 60L,
nfolds = 5L,
foldid = NULL,
threads = -1L,
fdr = FALSE,
rank = FALSE,
verbose = FALSE
)
Arguments
data |
A data.frame containing the dataset to be used for estimating the MGM, with each row representing a sample and each column representing a variable. All continuous variables must be of the numeric type, while categorical variables must be factor or character. Any rows with missing values will be dropped. |
knowledge |
A knowledge object containing prior knowledge about the causal interactions in a dataset. This knowledge can be used to forbid or require certain edges in the causal graph, helping to inform causal discovery an prevent orientations known to be nonsensical. The default is NULL, in which case no prior knowledge is provided to the causal discovery algorithm. |
cvType |
A string determining whether to perform random search or grid search cross-validation, indicated by "random" or "grid" respectively. The default value is "random". |
orientRule |
A vector of strings to determine which of the orientation rules to test in the cross-validation procedure to select the optimal model. The default is a vector that contains the "majority", "maxp", and "conservative" orientation rules. |
lambdas |
A numeric vector containing the values of lambda to learn an MGM with. The default value is NULL, in which case a log-spaced vector of nLambda values for lambda will be supplied instead. |
nLambda |
A numeric value indicating the number of lambda values to test when the lambdas vector is NULL. The default is 20. |
alphas |
A numeric vector containing values of alpha to test in the cross-validation procedure. The default value is NULL, in which case we set alpha = c(0.001, 0.005, 0.01, 0.05, 0.1, 0.15, 0.2). |
numPoints |
An integer value containing indicating the number of samples to draw uniformly from the search space if performing random search cross-validation. The default is 60, the number of points required to have a 5% chance of sampling a model in the top 5% of the search space. |
nfolds |
An integer value defining the number of folds to be used for cross-validation if foldid is NULL. The default value is 5. |
foldid |
An integer vector containing values in the range of 1 to K for each sample that identifies which test set that sample belongs to. This enables users to define their own cross-validation splits, for example in the case stratified cross-validation is needed. The default value is NULL. |
threads |
An integer value denoting the number of threads to use for parallelization of independence tests. The default value is -1, which will all available CPUs. |
fdr |
A logical value indicating whether to use false discovery rate control for the discovery of adjacencies in the causal graph. The default value is FALSE. |
rank |
A logical value indicating whether to use the nonparanormal transform to learn rank-based associations. The default is FALSE. |
verbose |
A logical value indicating whether to print progress updates. The default is FALSE. |
Value
A graphCV object containing the CPDAGs selected by the minimum and one standard error rule.
Examples
sim <- simRandomDAG(200, 25, deg=2)
g.cv <- mgmpcCV(sim$data)
print(g.cv)
Calculate the moral graph for a given DAG
Description
Create the moral graph for the input directed acyclic graph (DAG). The moral graph is the undirected graphical model that is equivalent to the input DAG.
Usage
moral(graph)
Arguments
graph |
The graph object used to generate the moral graph. Should be the ground-truth causal DAG |
Value
The moral graph corresponding to the input DAG
Examples
sim <- simRandomDAG(200, 25, deg=2)
sim$moral <- moral(sim$graph)
print(sim$moral)
Calculate the PAG for a given DAG and set of latent variables
Description
Create the partial ancestral graph (PAG) for the input directed acyclic graph (DAG). The PAG represents the Markov equivalence class of the true cauasl MAG. The FCI algorithms are only identifiable up to the Markov equivalence class, so assessments of causal structure recovery should be compared to the PAG rather than the causal MAG.
Usage
pag(graph, latent = NULL)
Arguments
graph |
The graph object used to generate the PAG. Should be the ground-truth causal DAG |
latent |
The names of latent (unobserved) variables in the causal DAG. The default is NULL. |
Value
The PAG corresponding to the input DAG
Examples
sim <- simRandomDAG(200, 25, deg=2)
sim$pag <- pag(sim$graph)
print(sim$pag)
Implements k-fold cross-validation for PC-Stable
Description
Runs k-fold cross-validation to select the value of alpha and orientation rule for PC-Stable. Returns a graphCV object containing the causal graphical models that minimize the negative log(pseudo-likelihood) and the sparsest model within one standard error of the minimum.
Usage
pcCV(
data,
initialGraph = NULL,
knowledge = NULL,
orientRule = as.character(c("majority", "maxp", "conservative")),
alphas = NULL,
nfolds = 5L,
foldid = NULL,
threads = -1L,
fdr = FALSE,
rank = FALSE,
verbose = FALSE
)
Arguments
data |
A data.frame containing the dataset to be used for estimating the MGM, with each row representing a sample and each column representing a variable. All continuous variables must be of the numeric type, while categorical variables must be factor or character. Any rows with missing values will be dropped. |
initialGraph |
An undirected rCausalMGM graph object containing the initial skeleton of adjacencies used in the causal discovery algorithm. This graph can be learned by 'mgm' or learned by another method and imported into an undirected rCausalMGM graph object from its adjacency matrix. The default is NULL, in which case a fully connected graph is used as the initial skeleton. |
knowledge |
A knowledge object containing prior knowledge about the causal interactions in a dataset. This knowledge can be used to forbid or require certain edges in the causal graph, helping to inform causal discovery an prevent orientations known to be nonsensical. The default is NULL, in which case no prior knowledge is provided to the causal discovery algorithm. |
orientRule |
A vector of strings to determine which of the orientation rules to test in the cross-validation procedure to select the optimal model. The default is a vector that contains the "majority", "maxp", and "conservative" orientation rules. |
alphas |
A numeric vector containing values of alpha to test in the cross-validation procedure. The default value is NULL, in which case we set alpha = c(0.001, 0.005, 0.01, 0.05, 0.1, 0.15, 0.2). |
nfolds |
An integer value defining the number of folds to be used for cross-validation if foldid is NULL. The default value is 5. |
foldid |
An integer vector containing values in the range of 1 to K for each sample that identifies which test set that sample belongs to. This enables users to define their own cross-validation splits, for example in the case stratified cross-validation is needed. The default value is NULL. |
threads |
An integer value denoting the number of threads to use for parallelization of independence tests. The default value is -1, which will all available CPUs. |
fdr |
A logical value indicating whether to use false discovery rate control for the discovery of adjacencies in the causal graph. The default value is FALSE. |
rank |
A logical value indicating whether to use the nonparanormal transform to learn rank-based associations. The default is FALSE. |
verbose |
A logical value indicating whether to print progress updates. The default is FALSE. |
Value
A graphCV object containing the CPDAGs selected by the minimum and one standard error rule.
Examples
sim <- simRandomDAG(200, 25, deg=2)
g.cv <- pcCV(sim$data)
print(g.cv)
Runs the causal discovery algorithm PC-Stable on a dataset.
Description
Runs the causal discovery algorithm PC-Stable on a dataset. The PC-Stable algorithm is designed to recover the Markov equivalence class of causal DAGs that could give rise to the observed conditional independence relationships under the assumption of causal sufficiency. A dataset is said to be causally sufficient if all variables relevant to the causal process are observed (i.e. there are no latent confounders). The resulting graph is a completed partially directed acyclic graph (CPDAG) containing directed edges where the causal orientation can be uniquely determined and an undirected edge where multiple orientations are possible.
Usage
pcStable(
data,
initialGraph = NULL,
knowledge = NULL,
orientRule = as.character(c("majority")),
alpha = 0.05,
threads = -1L,
fdr = FALSE,
rank = FALSE,
verbose = FALSE
)
Arguments
data |
A data.frame containing the dataset to be used for estimating the MGM, with each row representing a sample and each column representing a variable. All continuous variables must be of the numeric type, while categorical variables must be factor or character. Any rows with missing values will be dropped. |
initialGraph |
An undirected rCausalMGM graph object containing the initial skeleton of adjacencies used in the causal discovery algorithm. This graph can be learned by 'mgm' or learned by another method and imported into an undirected rCausalMGM graph object from its adjacency matrix. The default is NULL, in which case a fully connected graph is used as the initial skeleton. |
knowledge |
A knowledge object containing prior knowledge about the causal interactions in a dataset. This knowledge can be used to forbid or require certain edges in the causal graph, helping to inform causal discovery an prevent orientations known to be nonsensical. The default is NULL, in which case no prior knowledge is provided to the causal discovery algorithm. |
orientRule |
Determines which of the four possible orientation rules will be utilized to orient colliders in the PC-Stable algorithm. Possible options are "majority", "maxp", "conservative", and "sepsets". The default value is "majority". Additionally, a vector of valid orientation rules can be provided, and pcStable will return a list containing the graphs learned with each. |
alpha |
A numeric value containing the significance threshold alpha for the conditional independence tests used during constraint-based causal discovery. This parameter directly controls graph sparsity, with low values of alpha yielding sparse graphs and high values yielding dense graphs. The default value is 0.05. |
threads |
An integer value denoting the number of threads to use for parallelization of independence tests. The default value is -1, which will all available CPUs. |
fdr |
A logical value indicating whether to use false discovery rate control for the discovery of adjacencies in the causal graph. The default value is FALSE. |
rank |
A logical value indicating whether to use the nonparanormal transform to learn rank-based associations. The default is FALSE. |
verbose |
A logical value indicating whether to print progress updates. The default is FALSE. |
Value
The CPDAG learned by PC-Stable.
Examples
sim <- simRandomDAG(200, 25, deg=2)
g <- pcStable(sim$data)
print(g)
Implements StARS for PC-Stable
Description
Runs StARS to select the value of alpha for PC-Stable based on adjacency stability. Returns a graphSTARS object containing the CPDAG selected by StARS and the adjacency instabilities for each alpha.
Usage
pcStars(
data,
initialGraph = NULL,
knowledge = NULL,
orientRule = as.character(c("majority")),
alphas = NULL,
gamma = 0.01,
numSub = 20L,
subSize = -1L,
leaveOneOut = FALSE,
threads = -1L,
rank = FALSE,
verbose = FALSE
)
Arguments
data |
A data.frame containing the dataset to be used for estimating the MGM, with each row representing a sample and each column representing a variable. All continuous variables must be of the numeric type, while categorical variables must be factor or character. Any rows with missing values will be dropped. |
initialGraph |
An undirected rCausalMGM graph object containing the initial skeleton of adjacencies used in the causal discovery algorithm. This graph can be learned by 'mgm' or learned by another method and imported into an undirected rCausalMGM graph object from its adjacency matrix. The default is NULL, in which case a fully connected graph is used as the initial skeleton. |
knowledge |
A knowledge object containing prior knowledge about the causal interactions in a dataset. This knowledge can be used to forbid or require certain edges in the causal graph, helping to inform causal discovery an prevent orientations known to be nonsensical. The default is NULL, in which case no prior knowledge is provided to the causal discovery algorithm. |
orientRule |
Determines which of the four possible orientation rules will be utilized to orient colliders in the PC-Stable algorithm. Possible options are "majority", "maxp", "conservative", and "sepsets". The default value is "majority". |
alphas |
A numeric vector containing values of alpha to test in the cross-validation procedure. The default value is NULL, in which case we set alpha = c(0.001, 0.005, 0.01, 0.05, 0.1, 0.15, 0.2). |
gamma |
The threshold for edge instability. The default value is 0.01, and it is not recommended to change this value. |
numSub |
The number of subsamples of the dataset used to estimate edge instability. The default value is 20. |
subSize |
The number of samples to be drawn without replacement for each subsample. The default value is -1. When subSize is -1, it is set to min(floor(0.75 * N), floor(10*sqrt(N))), where N is the number of samples. |
leaveOneOut |
If TRUE, performs leave-one-out subsampling. Defaults to FALSE. |
threads |
An integer value denoting the number of threads to use for parallelization of independence tests. The default value is -1, which will all available CPUs. |
rank |
A logical value indicating whether to use the nonparanormal transform to learn rank-based associations. The default is FALSE. |
verbose |
A logical value indicating whether to print progress updates. The default is FALSE. |
Value
A graphSTARS object containing the CPDAG selected by StARS and the instabilities at each value of alpha.
Examples
sim <- simRandomDAG(200, 25, deg=2)
g.stars <- pcStars(sim$data)
print(g.stars)
A plot override function for the graph class
Description
A plot override function for the graph class
Usage
## S3 method for class 'graph'
plot(x, nodes = c(), nodeAttr = list(), edgeAttr = list(), ...)
Arguments
x |
The graph object |
nodes |
A subset of nodes in the graph to plot. If only a single node is supplied, then that node and its Markov blanket will be plotted. |
nodeAttr |
A list of options to modify graph nodes (e.g. fontsize). |
edgeAttr |
A list of options to modify graph edges. |
... |
Additional plot arguments |
Value
No return value, the function plots a graph object.
A plot override function for the graphCV class
Description
A plot override function for the graphCV class
Usage
## S3 method for class 'graphCV'
plot(x, ...)
Arguments
x |
The graph object |
... |
Additional plot arguments |
Value
No return value. This function plots graph sparsity, quantified by the average Markov blanket size for causal graphs or the regularization parameter for undirected graphs, against -log(pseudo-likelihood), with lines indicating the selected models.
A plot override function for the graphPath class
Description
A plot override function for the graphPath class
Usage
## S3 method for class 'graphPath'
plot(x, ...)
Arguments
x |
The graph object |
... |
Additional plot arguments |
Value
No return value. This function plots graph sparsity, quantified by the regularization parameter, against the AIC and BIC scores along a solution path, with lines indicating the selected models.
A plot override function for the graphSTARS class
Description
A plot override function for the graphSTARS class
Usage
## S3 method for class 'graphSTARS'
plot(x, ...)
Arguments
x |
The graph object |
... |
Additional plot arguments |
Value
No return value. This function plots graph sparsity, quantified by the significance threshold alpha, against the average edge instability used for stability-based model selection, with a horizontal line indicating the instability threshold and a vertical line indicating the selected threshold.
A plot override function for the graphSTEPS class
Description
A plot override function for the graphSTEPS class
Usage
## S3 method for class 'graphSTEPS'
plot(x, ...)
Arguments
x |
The graph object |
... |
Additional plot arguments |
Value
No return value. This function plots graph sparsity, quantified by the regularization parameters, against the average edge instability used for stability-based model selection, with a horizontal line indicating the instability threshold and vertical lines indicating the selected regularization parameters.
Combined adjaceny and orientation precision-recall metrics
Description
Calculate the precision, recall, F1, and Matthew's Correlation Coefficient (MCC) for the adjacencies and orientations of an estimated graph compared to the ground truth. This is the concatenated output of the adjacency PR metrics and the orientation PR metrics.
Usage
prMetrics(estimate, groundTruth, groundTruthDAG = NULL)
Arguments
estimate |
An estimated graph object |
groundTruth |
A ground truth graph object of the same type as the estimated graph object |
groundTruthDAG |
A ground truth graph object containing the true causal DAG. Only necessary for calculating the or precision, recall, F1, and MCC for partial ancestral graphs (PAGs) |
Value
The orientation precision, recall, F1, and MCC, between the two graph objects
Examples
sim <- simRandomDAG(200, 25, deg=2)
g <- pcStable(sim$data)
prMetrics(g, cpdag(sim$graph))
Adjacency Precision-Recall Metrics
Description
Calculate the skeleton precision, recall, F1, and Matthew's Correlation Coefficient (MCC) between an estimated and ground truth graph.
Usage
prMetricsAdjacency(estimate, groundTruth)
Arguments
estimate |
An estimated graph object |
groundTruth |
A ground truth graph object |
Value
The skeleton precision, recall, F1, and MCC, between the two graph objects
Examples
sim <- simRandomDAG(200, 25, deg=2)
g <- pcStable(sim$data)
prMetricsAdjacency(g, cpdag(sim$graph))
Causal Orientaion Precision-Recall Metrics for CPDAGs
Description
Calculate the causal orientation precision, recall, and F1 between an estimated CPDAG and ground truth graph causal DAG.
Usage
prMetricsCausal(estimate, groundTruthDAG)
Arguments
estimate |
An estimated graph object. |
groundTruthDAG |
A ground truth graph object of the type "directed acyclic graph". |
Value
The causal orientation precision, recall, and F1 between the two graph objects
Examples
sim <- simRandomDAG(200, 25, deg=2)
g <- pcStable(sim$data)
prMetricsCausal(g, sim$graph)
Orientation Precision-Recall Metrics
Description
Calculate the orientation precision, recall, F1, and Matthew's Correlation Coefficient (MCC) between an estimated and ground truth graph.
Usage
prMetricsOrientation(estimate, groundTruth, groundTruthDAG = NULL)
Arguments
estimate |
An estimated graph object |
groundTruth |
A ground truth graph object of the same type as the estimated graph object |
groundTruthDAG |
A ground truth graph object containing the true causal DAG. Only necessary for calculating the or precision, recall, F1, and MCC for partial ancestral graphs (PAGs) |
Value
The orientation precision, recall, F1, and MCC, between the two graph objects
Examples
sim <- simRandomDAG(200, 25, deg=2)
g <- pcStable(sim$data)
prMetricsOrientation(g, cpdag(sim$graph))
A print override function for the graph class
Description
A print override function for the graph class
Usage
## S3 method for class 'graph'
print(x, ...)
Arguments
x |
The graph object |
... |
Additional print arguments |
Value
No return value, the function prints a summary of the graph object.
A print override function for the graphCV class
Description
A print override function for the graphCV class
Usage
## S3 method for class 'graphCV'
print(x, ...)
Arguments
x |
The graphCV object |
... |
Additional print arguments |
Value
No return value, the function prints a summary of the graphCV object.
A print override function for the graphPath class
Description
A print override function for the graphPath class
Usage
## S3 method for class 'graphPath'
print(x, ...)
Arguments
x |
The graphPath object |
... |
Additional print arguments |
Value
No return value, the function prints a summary of the graphPath object.
A print override function for the graphSTARS class
Description
A print override function for the graphSTARS class
Usage
## S3 method for class 'graphSTARS'
print(x, ...)
Arguments
x |
The graphSTARS object |
... |
Additional print arguments |
Value
No return value, the function prints a summary of the graphSTARS object.
A print override function for the graphSTEPS class
Description
A print override function for the graphSTEPS class
Usage
## S3 method for class 'graphSTEPS'
print(x, ...)
Arguments
x |
The graphSTEPS object |
... |
Additional print arguments |
Value
No return value, the function prints a summary of the graphSTEPS object.
A print override function for the knowledge class
Description
A print override function for the knowledge class
Usage
## S3 method for class 'knowledge'
print(x, ...)
Arguments
x |
The knowledge object |
... |
Additional print arguments |
Value
No return value, the function prints a summary of the knowledge object.
Display a graph object as text.
Description
Display a graph object as text. This is the same format as written in ".txt" save files.
Usage
printGraph(graph)
Arguments
graph |
The graph object |
Value
No return value, this function prints the full details of a graph object, including nodes, edges, the algorithm used to learn the model, and relevant hyperparameters.
Examples
sim <- simRandomDAG(200, 25, deg=2)
g <- mgm(sim$data)
printGraph(g)
Save a graph to a file. Supported file types are ".txt" and ".sif".
Description
Save a graph to a file. Supported file types are ".txt" and ".sif".
Usage
saveGraph(graph, filename)
Arguments
graph |
The graph object |
filename |
The graph filename |
Value
No return value. This function saves the full details of a graph object to a .txt file, including nodes, edges, the algorithm used to learn the model, and relevant hyperparameters. This format can then be read back into R with the loadGraph function.
A function to simulate a random forward DAG from a SEM model.
Description
A function to simulate a random forward DAG from a SEM model.
Usage
simRandomDAG(
n = 1000,
p = 50,
r = 0,
discFrac = 0.5,
deg = 3,
coefMin = 0.5,
coefMax = 1.5,
noiseMin = 1,
noiseMax = 2,
censorRate = 0.3,
seed = NULL
)
Arguments
n |
The sample size of the generated dataset. The default is 1000. |
p |
The number of features in the generated dataset. The default is 50. |
r |
The number of censored features in the generated dataset. The default is 0. |
discFrac |
The fraction of variables in the dataset that are discrete. The default is 0.5. |
deg |
The average graph degree for the simulated graph. The default is 3. |
coefMin |
The lower bound on the magnitude of the effect size. The default is 0.5. |
coefMax |
The upper bound on the magnitude of the effect size. The default is 1.5. |
noiseMin |
The lower bound on the standard deviation of the Gaussian noise for continuous variables. The default is 1. |
noiseMax |
The upper bound on the standard deviation of the Gaussian noise for continuous variables. The default is 2. |
censorRate |
The rate censored variables are censored at. The default is 0.3. |
seed |
The random seed for generating the simulated DAG. The default is NULL. |
Value
A list containing the simulated dataset and the corresponding ground truth causal DAG.
Examples
sim <- simRandomDAG(200, 25)
print(sim$graph)
print(sim$data[1:6,])
Calculate the undirected skeleton for a given DAG
Description
Create the skeleton graph for the input directed acyclic graph (DAG). The skeleton graph is the undirected graph that contains the same adjacencies as the input DAG.
Usage
skeleton(graph)
Arguments
graph |
The graph object used to generate the skeleton graph. Should be the ground-truth causal DAG |
Value
The skeleton graph corresponding to the input DAG
Examples
sim <- simRandomDAG(200, 25, deg=2)
sim$skeleton <- skeleton(sim$graph)
print(sim$skeleton)
Implements StEPS and StARS for MGM
Description
Calculates the optimal lambda values for the MGM algorithm using StEPS and StARS. Returns a graphSTEPS object that contains the MGMs selected by StEPS and StARS as well as the instability at each value of lambda.
Usage
steps(
data,
lambdas = NULL,
nLambda = 30L,
gamma = 0.05,
numSub = 20L,
subSize = -1L,
leaveOneOut = FALSE,
threads = -1L,
rank = FALSE,
verbose = FALSE
)
Arguments
data |
A data.frame containing the dataset to be used for estimating the MGM, with each row representing a sample and each column representing a variable. All continuous variables must be of the numeric type, while categorical variables must be factor or character. Any rows with missing values will be dropped. |
lambdas |
A numeric vector containing the values of lambda to learn an MGM with. The default value is NULL, in which case a log-spaced vector of nLambda values for lambda will be supplied instead. |
nLambda |
A numeric value indicating the number of lambda values to test when the lambdas vector is NULL. The default is 30. |
gamma |
The threshold for edge instability. The default value is 0.05, and it is not recommended to change this value. |
numSub |
The number of subsamples of the dataset used to estimate edge instability. The default value is 20. |
subSize |
The number of samples to be drawn without replacement for each subsample. The default value is -1. When subSize is -1, it is set to min(floor(0.75 * N), floor(10*sqrt(N))), where N is the number of samples. |
leaveOneOut |
If TRUE, performs leave-one-out subsampling. Defaults to FALSE. |
threads |
An integer value denoting the number of threads to use for parallelization of learning MGMs across subsamples. The default value is -1, which will all available CPUs. |
rank |
A logical value indicating whether to use the nonparanormal transform to learn rank-based associations. The default is FALSE. |
verbose |
A logical value indicating whether to print progress updates. The default is FALSE. |
Value
A graphSTEPS object containing the MGMs selected by StEPS and StARS, as well as the instability of each edge type at each value of lambda.
Examples
sim <- simRandomDAG(200, 25, deg=2)
ig.steps <- steps(sim$data)
print(ig.steps)