| Type: | Package |
| Title: | Conditionally Symmetric Multidimensional Gaussian Mixture Model |
| Version: | 0.5.0 |
| Description: | Implements the conditionally symmetric multidimensional Gaussian mixture model (csmGmm) for large-scale testing of composite null hypotheses in genetic association applications such as mediation analysis, pleiotropy analysis, and replication analysis. In such analyses, we typically have J sets of K test statistics where K is a small number (e.g. 2 or 3) and J is large (e.g. 1 million). For each one of the J sets, we want to know if we can reject all K individual nulls. Please see the vignette for a quickstart guide. The paper describing these methods is "Testing a Large Number of Composite Null Hypotheses Using Conditionally Symmetric Multidimensional Gaussian Mixtures in Genome-Wide Studies" by Sun R, McCaw Z, & Lin X (Journal of the American Statistical Association 2025, <doi:10.1080/01621459.2024.2422124>). |
| License: | GPL-3 |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| Imports: | dplyr, mvtnorm, curl, data.table, ggplot2, rlang, magrittr, stats, utils |
| Suggests: | knitr, rmarkdown, R.utils |
| VignetteBuilder: | knitr |
| NeedsCompilation: | no |
| Packaged: | 2026-06-15 18:39:44 UTC; rsun3 |
| Author: | Ryan Sun [aut, cre], Emily Kim [aut] |
| Maintainer: | Ryan Sun <ryansun.work@gmail.com> |
| Depends: | R (≥ 4.1.0) |
| Repository: | CRAN |
| Date/Publication: | 2026-06-16 15:20:02 UTC |
calc_dens_cor.R
Description
For J*2 matrix of J sets, calculate the density of bivariate normal under fitted c-csmGmm.
Usage
calc_dens_cor(x, Zmat, corMat, log = FALSE)
Arguments
x |
2*1 vector of means. |
Zmat |
J*2 matrix of test statistics. |
corMat |
2*2 matrix describing correlation structure of test statistics. |
log |
return logarithm of density |
Value
A J*1 vector of densities for each row of Zmat.
Examples
x <- c(0, 0)
Zmat <- cbind(rnorm(10^5), rnorm(10^5))
calc_dens_cor(x, Zmat, corMat = cor(Zmat))
calc_dens_ind.R
Description
Calculate J bivariate normal densities (both dimensions are independent) under fitted csmGmm.
Usage
calc_dens_ind_2d(x, Zmat)
Arguments
x |
2*1 vector of means. |
Zmat |
J*2 matrix of test statistics. |
Value
A J*1 vector of densities for each row of Zmat.
Examples
x <- c(0, 0)
Zmat <- cbind(rnorm(10^5), rnorm(10^5))
calc_dens_ind_2d(x, Zmat)
Calculate J trivariate normal densities (all dimensions are independent) under fitted csmGmm.
Description
Calculate J trivariate normal densities (all dimensions are independent) under fitted csmGmm.
Usage
calc_dens_ind_3d(x, Zmat)
Arguments
x |
3*1 vector of means. |
Zmat |
J*3 matrix of test statistics. |
Value
A J*1 vector of densities for each row of Zmat.
Examples
x <- c(0, 0)
Zmat <- cbind(rnorm(10^5), rnorm(10^5), rnorm(10^5))
calc_dens_ind_3d(x, Zmat)
Calculate the density of K-dimensional multivariate normal (all dimensions are independent) under fitted acsGmm.
Description
Calculate the density of K-dimensional multivariate normal (all dimensions are independent) under fitted acsGmm.
Usage
calc_dens_ind_multiple(x, Zmat)
Arguments
x |
K*1 vector of means. |
Zmat |
J*K matrix of test statistics. |
Value
A J*1 vector of densities for each row of Zmat.
Examples
x <- c(0, 0)
Zmat <- cbind(rnorm(10^5), rnorm(10^5), rnorm(10^5), rnorm(10^5))
calc_dens_ind_multiple(x, Zmat)
check_incongruous.R
Description
Check the number of sets of test statistics that have a higher (less significant) lfdr value than other sets with test statistics of uniformly smaller magnitudes.
Usage
check_incongruous(zMatrix, lfdrVec)
Arguments
zMatrix |
J*K vector of all test statistics. |
lfdrVec |
J*1 vector of lfdr values corresponding to each set of test statistics. |
Value
A vector with all the indices of all sets that have a higher lfdr value those a set with smaller test statistic magnitudes.
Examples
zMatrix <- cbind(rnorm(10^4), rnorm(10^4))
lfdrVec <- runif(10^4)
check_incongruous(zMatrix = zMatrix, lfdrVec = lfdrVec)
create_plots.R
Description
Creates lfdr-value Manhttan plots and z-score plots to visualize composite null hypothesis testing results
Usage
create_plots(plotData)
Arguments
plotData |
data.frame with columns labeled (exactly) Chr, BP, cum_avg_lfdr, Z_Dataset1, Z_Dataset2 |
Value
A list with the elements:
manPlot |
A ggplot object of the Manhattan plot of the lfdr-values at which each SNP is significant |
zPlot |
A ggplot object of a scatterplot comparing the z-scores for the first two datasets |
Examples
set.seed(0)
plotData <- data.frame(Chr = rep(1:22, each=100), BP=runif(n=2200, min=1, max=10^8),
cum_avg_lfdr=runif(n=2200, min=0, max=22),
Z_Dataset1=rnorm(n=2200), Z_Dataset2=rnorm(n=2200))
create_plots(plotData = plotData)
Tells if row x if allTestStats is an incongruous result (has a higher lfdr than a set of test statistics with lower magnitudes). For K=2 case.
Description
Tells if row x if allTestStats is an incongruous result (has a higher lfdr than a set of test statistics with lower magnitudes). For K=2 case.
Usage
find_2d(x, allTestStats)
Arguments
x |
Scalar, which row of allTestStats to check. |
allTestStats |
J*K vector of all test statistics. |
Value
A scalar denoting the number of sets with lower lfdr and test statistics of lower magnitude. 0 means congruous result.
Examples
zMatrix <- cbind(rnorm(10^4), rnorm(10^4))
find_2d(x = 5, allTestStats = zMatrix)
Tells if row x if allTestStats is an incongruous result (has a higher lfdr than a set of test statistics with lower magnitudes). For K=3 case.
Description
Tells if row x if allTestStats is an incongruous result (has a higher lfdr than a set of test statistics with lower magnitudes). For K=3 case.
Usage
find_3d(x, allTestStats)
Arguments
x |
Scalar, which row of allTestStats to check. |
allTestStats |
J*K vector of all test statistics. |
Value
A scalar denoting the number of sets with lower lfdr and test statistics of lower magnitude. 0 means congruous result.
Examples
zMatrix <- cbind(rnorm(10^4), rnorm(10^4), rnorm(10^4))
find_3d(x = 5, allTestStats = zMatrix)
Tells if row x if allTestStats is an incongruous result (has a higher lfdr than a set of test statistics with lower magnitudes). For K=4 case.
Description
Tells if row x if allTestStats is an incongruous result (has a higher lfdr than a set of test statistics with lower magnitudes). For K=4 case.
Usage
find_4d(x, allTestStats)
Arguments
x |
Scalar, which row of allTestStats to check. |
allTestStats |
J*K vector of all test statistics. |
Value
A scalar denoting the number of sets with lower lfdr and test statistics of lower magnitude. 0 means congruous result.
Examples
zMatrix <- cbind(rnorm(10^4), rnorm(10^4), rnorm(10^4), rnorm(10^4))
find_4d(x = 5, allTestStats = zMatrix)
find_max_means.R
Description
Find maximum means for each dimension in null settings.
Usage
find_max_means(muInfo)
Arguments
muInfo |
A list with 2^K elements, where each element is a matrix with K rows and Mb columns. |
Value
A K*1 vector of the maximum means for each dimension under the null.
Examples
initMuList <- list(matrix(data=0, nrow=2, ncol=1), matrix(data=c(0, 3, 0, 6), nrow=2),
matrix(data=c(3, 0, 6, 0), nrow=2), matrix(data=c(8, 8), nrow=2))
find_max_means(initMuList)
generate_init_lists.R
Description
Generates initMuList and initPiList parameters for symm_fit_ind_EM() function.
Usage
generate_init_lists(K)
Arguments
K |
Scalar, number of datasets in the analysis. |
Value
A list with the elements:
initMuList |
List of 2^K elements where each element is a matrix with K rows and number of columns equal to the number of possible mean vectors for that binary representation |
initPiList |
List of 2^K elements where each element is a vector with number of elements equal to the number of possible mean vectors for that binary representation. |
Examples
K <- 2
generate_init_lists(K)
prepare_csmgmm_data.R
Description
Prepares J*K matrix of summary statistics for GWAS replication studies where J is the number of SNPs and K is the number of cohorts.
Usage
prepare_csmgmm_data(datasets, dataset_names = NULL, z_cap = 8.1)
Arguments
datasets |
list holding K datasets to use for replication analysis |
dataset_names |
optional vector of names for each dataset; otherwise function uses Dataset1,.., DatasetK as default. |
z_cap |
optional number to cap the most extreme z-scores, we recommend setting at 8.1 as R is not very accurate for very large z-scores |
Value
A list with the elements:
extra_dat |
A J*K matrix with J matching SNPs across K datasets, holds relevant SNP information. |
clean_dat |
J*K matrix of test statistics, holds J Z-scores for each dataset. |
n_snps |
A scalar returning the number of matching SNPs across all K datasets. |
Examples
library(dplyr)
set.seed(0)
J <- 1000
alleles <- c("A", "C", "T", "G")
dat1 <- data.frame(zstat = rnorm(J)) %>% mutate(se = runif(n=J, min=0.0005, max=0.0015)) %>%
mutate(beta = zstat * se, chr=1, pos=1:J, a1=sample(x=alleles, J, replace=TRUE),
a2=sample(x=alleles, J, replace=TRUE))
dat2 <- data.frame(zstat = rnorm(J)) %>% mutate(se = runif(n=J, min=0.0005, max=0.0015)) %>%
mutate(beta = zstat * se, chr=1, pos=1:J, a1=sample(x=alleles, J, replace=TRUE),
a2=sample(x=alleles, J, replace=TRUE))
dat3 <- data.frame(zstat = rnorm(J)) %>% mutate(se = runif(n=J, min=0.0005, max=0.0015)) %>%
mutate(beta = zstat * se, chr=1, pos=1:J, a1=sample(x=alleles, J, replace=TRUE),
a2=sample(x=alleles, J, replace=TRUE))
datasets = list(dat1, dat2, dat3)
prepare_csmgmm_data(datasets = datasets)
process_lfdr_results.R
Description
Sorts lfdrResults and calculates cumulative averages to return significant SNPs.
Usage
process_lfdr_results(orig_data, lfdrResults, fdr_threshold = 0.1)
Arguments
orig_data |
data.frame with J rows, one for each composite null hypothesis being tested, holds data such as position, z-statistics, etc. |
lfdrResults |
J*1 vector of all lfdr-values, the jth value corresponds to the jth row of orig_data. |
fdr_threshold |
Scalar between 0 and 1 determining the percentage of false discoveries, default set to 0.1. |
Value
A list with elements:
all_snps_info |
A data.frame of all SNPs, sorted by the lfdr-value at which they are significant (cum_avg_lfdr column) |
sig_snps_info |
A data.frame of the significant SNPs, sorted by the lfdr-value at which they are significant (cum_avg_lfdr column) |
n_significant |
A scalar returning the number of significant SNPs. |
Examples
set.seed(0)
orig_data <- data.frame(
variant_id = sample(c("A","B","C","D"), 10^4, replace = TRUE),
Chr = sample(1:22, 10^4, replace = TRUE),
BP = sample(1e6:2e6, 10^4),
Z_Dataset1 = rnorm(10^4),
Z_Dataset2 = rnorm(10^4)
)
lfdrResults <- runif(10^4)
process_lfdr_results(orig_data = orig_data, lfdrResults = lfdrResults, fdr_threshold = 0.05)
read_gwas.R
Description
Read 1 GWAS Catalog summary-statistics file into the current environment.
Usage
read_gwas_one(
url,
cache_dir = tempdir(),
overwrite = FALSE,
max_tries = 5,
fread_args = list()
)
Arguments
url |
Full https URL to a .tsv.gz file on the GWAS Catalog FTP. |
cache_dir |
Directory to keep downloaded files (default: tempdir()). Use a persistent path (e.g. "~/gwas_cache") to avoid re-downloading across R sessions. |
overwrite |
If TRUE, redownload even if a cached file exists. |
max_tries |
Number of download attempts before giving up. |
fread_args |
Named list of extra args forwarded to data.table::fread (e.g. list(select = c("variant_id", "p_value"))). |
Details
Strategy: download each .tsv.gz to a local cache directory, then parse with data.table::fread (which transparently handles gzip via local path). This is more robust than streaming over HTTPS for files of this size (~1 GB each): downloads can be resumed/retried, and re-reads are cheap once cached.
Value
A data.table with the downloaded data
Examples
# Ensure internet is available before running the example
if (curl::has_internet()) {
try({
dat <- read_gwas_one(paste0("http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/",
"GCST005001-GCST006000/GCST005527/tsoi_2012_23143594_pso_efo0000676_1_ichip.sumstats.tsv.gz"))
head(dat)
})
}
Read 2-5 GWAS Catalog summary-statistics files into the current environment.
Description
Read 2-5 GWAS Catalog summary-statistics files into the current environment.
Usage
read_gwas_set(urls, ...)
Arguments
urls |
Character vector of 2-5 https URLs to .tsv.gz files. If the vector has names, those names are used in the returned list; otherwise entries are named gwas1, gwas2, ... |
... |
Forwarded to read_gwas_one (cache_dir, overwrite, max_tries, fread_args). |
Value
Named list of data.tables, one per input URL.
Examples
# Ensure internet is available before running the example
if (curl::has_internet()) {
try({
dat <- read_gwas_set(c(
paste0("http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/",
"GCST005001-GCST006000/GCST005527/tsoi_2012_23143594_pso_efo0000676_1_ichip.sumstats.tsv.gz"),
paste0("http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/",
"GCST003001-GCST004000/GCST003044/CD_trans_ethnic_association_summ_stats_b37.txt.gz")
))
gwas1 <- dat$gwas1
gwas2 <- dat$gwas2
head(gwas1)
})
}
symm_fit_cor.R
Description
Fit the correlated csmGmm for sets of correlated elements. Currently restricted to K=2.
Usage
symm_fit_cor_EM(
testStats,
corMat,
initMuList,
initPiList,
eps = 10^(-5),
checkpoint = FALSE
)
Arguments
testStats |
J*K matrix of test statistics where J is the number of sets and K is number of elements in each set. |
corMat |
K*K matrix that describes the correlation structure of each set. |
initMuList |
List of 2^K elements where each element is a matrix with K rows and number of columns equal to the number of possible mean vectors for that binary group. |
initPiList |
List of 2^K elements where each element is a vector with number of elements equal to the number of possible mean vectors for that binary group. |
eps |
Scalar, stop the EM algorithm when L2 norm of difference in parameters is less than this value. |
checkpoint |
Boolean, set to TRUE to print iterations of EM. |
Value
A list with the elements:
muInfo |
List with same dimensions as initMuList, holds the final mean parameters. |
piInfo |
List with same dimensions as initPiList, holds the final probability parameters. |
iter |
Number of iterations run in EM algorithm. |
lfdrResults |
J*1 vector of all lfdr statistics. |
Examples
set.seed(0)
corMat <- matrix(data=c(1, 0.3, 0.3, 1), nrow=2)
testStats <- rbind(mvtnorm::rmvnorm(n=200, mean=c(3, 0), sigma=corMat),
mvtnorm::rmvnorm(n=200, mean=c(0, 4), sigma=corMat),
mvtnorm::rmvnorm(n=100, mean=c(7, 7), sigma=corMat),
mvtnorm::rmvnorm(n=10^4 - 500, mean=c(0, 0), sigma=corMat))
initMuList <- list(matrix(data=0, nrow=2, ncol=1), matrix(data=c(0, 3), nrow=2),
matrix(data=c(4, 0), nrow=2), matrix(data=c(5, 5), nrow=2))
initPiList <- list(c(0.9), c(0.04), c(0.04), c(0.02))
results <- symm_fit_cor_EM(testStats = testStats, corMat = cor(testStats),
initMuList = initMuList, initPiList = initPiList)
symm_fit_cor_fulllik.R
Description
Full likelihood, block correlation, blocks of size 2
Usage
symm_fit_cor_EM_fulllik(
testStats,
corMat,
initMuList,
initPiList,
eps = 10^(-5),
checkpoint = FALSE
)
Arguments
testStats |
J*K matrix of test statistics where J is the number of sets and K is number of elements in each set. |
corMat |
K*K matrix that describes the correlation structure of each 2 by 2 block. |
initMuList |
List of 2^K elements where each element is a matrix with K rows and number of columns equal to the number of possible mean vectors for that binary group. |
initPiList |
List of 2^K elements where each element is a vector with number of elements equal to the number of possible mean vectors for that binary group. |
eps |
Scalar, stop the EM algorithm when L2 norm of difference in parameters is less than this value. |
checkpoint |
Boolean, set to TRUE to print iterations of EM. |
Value
A list with the elements:
muInfo |
List with same dimensions as initMuList, holds the final mean parameters. |
piInfo |
List with same dimensions as initPiList, holds the final probability parameters. |
iter |
Number of iterations run in EM algorithm. |
lfdrResults |
J*1 vector of all lfdr statistics. |
Examples
set.seed(0)
testStats <- cbind(rnorm(10^4), rnorm(10^4))
testStats[1:100, 1] <- rnorm(100, mean=3)
testStats[101:200, 1] <- rnorm(100, mean=5)
testStats[201:300, 2] <- rnorm(100, mean=4)
testStats[301:400, 1:2] <- rnorm(200, mean=7)
initMuList <- list(matrix(data=0, nrow=2, ncol=1), matrix(data=c(0, 3), nrow=2, ncol=1),
matrix(data=c(3, 0), nrow=2, ncol=1), matrix(data=c(6, 6), nrow=2, ncol=1))
initPiList <- list(c(0.9), c(0.04),c(0.04), c(0.02))
results <- symm_fit_cor_EM_fulllik(testStats = testStats, corMat=diag(c(1,1)),
initMuList = initMuList, initPiList = initPiList)
symm_fit_cor_noAssumption.R
Description
Fit the correlated csmGmm for sets of correlated elements, but we don't assume that the means in the composite alternative are greater in magnitude than those in the composite null.
Usage
symm_fit_cor_EM_noAssumption(
testStats,
corMat,
initMuList,
initPiList,
eps = 10^(-5),
checkpoint = FALSE
)
Arguments
testStats |
J*K matrix of test statistics where J is the number of sets and K is number of elements in each set. |
corMat |
K*K matrix that describes the correlation structure of each set. |
initMuList |
List of 2^K elements where each element is a matrix with K rows and number of columns equal to the number of possible mean vectors for that binary group. |
initPiList |
List of 2^K elements where each element is a vector with number of elements equal to the number of possible mean vectors for that binary group. |
eps |
Scalar, stop the EM algorithm when L2 norm of difference in parameters is less than this value. |
checkpoint |
Boolean, set to TRUE to print iterations of EM. |
Value
A list with the elements:
muInfo |
List with same dimensions as initMuList, holds the final mean parameters. |
piInfo |
List with same dimensions as initPiList, holds the final probability parameters. |
iter |
Number of iterations run in EM algorithm. |
lfdrResults |
J*1 vector of all lfdr statistics. |
Examples
set.seed(0)
corMat <- matrix(data=c(1, 0.3, 0.3, 1), nrow=2)
testStats <- rbind(mvtnorm::rmvnorm(n=200, mean=c(3, 0), sigma=corMat),
mvtnorm::rmvnorm(n=200, mean=c(0, 4), sigma=corMat),
mvtnorm::rmvnorm(n=100, mean=c(7, 7), sigma=corMat),
mvtnorm::rmvnorm(n=10^4 - 500, mean=c(0, 0), sigma=corMat))
initMuList <- list(matrix(data=0, nrow=2, ncol=1), matrix(data=c(0, 3), nrow=2),
matrix(data=c(4, 0), nrow=2), matrix(data=c(5, 5), nrow=2))
initPiList <- list(c(0.9), c(0.04), c(0.04), c(0.02))
results <- symm_fit_cor_EM_noAssumption(testStats = testStats,
corMat = cor(testStats), initMuList = initMuList, initPiList = initPiList)
symm_fit_cor_rho.R
Description
Fit the correlated csmGmm for sets of correlated elements. Also fits the correlation parameter in EM algorithm.
Usage
symm_fit_cor_EM_rho(
testStats,
initRho,
initMuList,
initPiList,
eps = 10^(-5),
checkpoint = FALSE
)
Arguments
testStats |
J*K matrix of test statistics where J is the number of sets and K is number of elements in each set. |
initRho |
Initial value of rho, any reasonable guess should be ok. |
initMuList |
List of 2^K elements where each element is a matrix with K rows and number of columns equal to the number of possible mean vectors for that binary group. |
initPiList |
List of 2^K elements where each element is a vector with number of elements equal to the number of possible mean vectors for that binary group. |
eps |
Scalar, stop the EM algorithm when L2 norm of difference in parameters is less than this value. |
checkpoint |
Boolean, set to TRUE to print iterations of EM. |
Value
A list with the elements:
muInfo |
List with same dimensions as initMuList, holds the final mean parameters. |
piInfo |
List with same dimensions as initPiList, holds the final probability parameters. |
iter |
Number of iterations run in EM algorithm. |
lfdrResults |
J*1 vector of all lfdr statistics. |
Examples
set.seed(0)
corMat <- matrix(data=c(1, 0.3, 0.3, 1), nrow=2)
testStats <- rbind(mvtnorm::rmvnorm(n=200, mean=c(3, 0), sigma=corMat),
mvtnorm::rmvnorm(n=200, mean=c(0, 4), sigma=corMat),
mvtnorm::rmvnorm(n=100, mean=c(7, 7), sigma=corMat),
mvtnorm::rmvnorm(n=10^4 - 500, mean=c(0, 0), sigma=corMat))
initMuList <- list(matrix(data=0, nrow=2, ncol=1), matrix(data=c(0, 3), nrow=2),
matrix(data=c(4, 0), nrow=2), matrix(data=c(5, 5), nrow=2))
initPiList <- list(c(0.9), c(0.04), c(0.04), c(0.02))
results <- symm_fit_cor_EM_rho(testStats = testStats,
initRho = 0.1, initMuList = initMuList, initPiList = initPiList)
symm_fit_ind.R
Description
Fit the conditionally symmetric multidimensional Gaussian mixture model for sets of independent elements
Usage
symm_fit_ind_EM(
testStats,
initMuList,
initPiList,
sameDirAlt = FALSE,
eps = 10^(-5),
checkpoint = FALSE
)
Arguments
testStats |
J*K matrix of test statistics where J is the number of sets and K is number of elements in each set. |
initMuList |
List of 2^K elements where each element is a matrix with K rows and number of columns equal to the number of possible mean vectors for that binary representation. |
initPiList |
List of 2^K elements where each element is a vector with number of elements equal to the number of possible mean vectors for that binary representation. |
sameDirAlt |
Boolean, set to TRUE for replication testing, which uses a smaller alternative space. |
eps |
Scalar, stop the EM algorithm when L2 norm of difference in parameters is less than this value. |
checkpoint |
Boolean, set to TRUE to print iterations of EM |
Value
A list with the elements:
muInfo |
List with same dimensions as initMuList, holds the final mean parameters. |
piInfo |
List with same dimensions as initPiList, holds the final mixture proportions |
iter |
Number of iterations run in EM algorithm. |
lfdrResults |
J*1 vector of all lfdr statistics. |
Examples
set.seed(0)
testStats <- cbind(rnorm(10^4), rnorm(10^4))
testStats[1:200, 1] <- rnorm(100, mean=3)
testStats[201:400, 1] <- rnorm(100, mean=5)
testStats[401:600, 2] <- rnorm(100, mean=3)
testStats[601:800, 2] <- rnorm(100, mean=5)
testStats[801:1000, 1:2] <- rnorm(200, mean=7)
initMuList <- list(matrix(data=0, nrow=2, ncol=1), matrix(data=c(0, 3, 0, 5), nrow=2, ncol=2),
matrix(data=c(3, 0, 5, 0), nrow=2, ncol=2), matrix(data=c(7, 7), nrow=2, ncol=1))
initPiList <- list(c(0.9), c(0.02, 0.02),c(0.02, 0.02), c(0.02))
results <- symm_fit_ind_EM(testStats = testStats, initMuList = initMuList, initPiList = initPiList)
symm_fit_ind_noAssumption.R
Description
Fit the conditionally symmetric multidimensional Gaussian mixture model for sets of independent elements, but we don't assume that the means in the composite alternative are greater in magnitude than those in the composite null.
Usage
symm_fit_ind_EM_noAssumption(
testStats,
initMuList,
initPiList,
sameDirAlt = FALSE,
eps = 10^(-5),
checkpoint = FALSE
)
Arguments
testStats |
J*K matrix of test statistics where J is the number of sets and K is number of elements in each set. |
initMuList |
List of 2^K elements where each element is a matrix with K rows and number of columns equal to the number of possible mean vectors for that binary representation. |
initPiList |
List of 2^K elements where each element is a vector with number of elements equal to the number of possible mean vectors for that binary representation. |
sameDirAlt |
Boolean, set to TRUE for replication testing, which uses a smaller alternative space. |
eps |
Scalar, stop the EM algorithm when L2 norm of difference in parameters is less than this value. |
checkpoint |
Boolean, set to TRUE to print iterations of EM |
Value
A list with the elements:
muInfo |
List with same dimensions as initMuList, holds the final mean parameters. |
piInfo |
List with same dimensions as initPiList, holds the final mixture proportions |
iter |
Number of iterations run in EM algorithm. |
lfdrResults |
J*1 vector of all lfdr statistics. |
Examples
set.seed(0)
testStats <- cbind(rnorm(10^4), rnorm(10^4))
testStats[1:200, 1] <- rnorm(100, mean=3)
testStats[201:400, 1] <- rnorm(100, mean=5)
testStats[401:600, 2] <- rnorm(100, mean=3)
testStats[601:800, 2] <- rnorm(100, mean=5)
testStats[801:1000, 1:2] <- rnorm(200, mean=7)
initMuList <- list(matrix(data=0, nrow=2, ncol=1), matrix(data=c(0, 3, 0, 5), nrow=2, ncol=2),
matrix(data=c(3, 0, 5, 0), nrow=2, ncol=2), matrix(data=c(7, 7), nrow=2, ncol=1))
initPiList <- list(c(0.9), c(0.02, 0.02),c(0.02, 0.02), c(0.02))
results <- symm_fit_ind_EM_noAssumption(testStats = testStats,
initMuList = initMuList, initPiList = initPiList)