Title: Collapsed Variational Inference for Dirichlet Process (DP) Mixture Model
Version: 0.1.2
Description: Collapsed Variational Inference for a Dirichlet Process (DP) mixture model with unknown covariance matrix structure and DP concentration parameter. It enables efficient clustering of high-dimensional data with significantly improved computational speed than traditional MCMC methods. The package incorporates 8 parameterisations and corresponding prior choices for the unknown covariance matrix, from which the user can choose and apply accordingly.
Encoding: UTF-8
RoxygenNote: 7.3.3
License: MIT + file LICENSE
Imports: ggplot2, patchwork, Rcpp, Rfast, rlang, parallel, stats
Suggests: knitr, rmarkdown, pbapply, testthat (≥ 3.0.0)
Config/testthat/edition: 3
URL: https://github.com/annesh07/vimixr
BugReports: https://github.com/annesh07/vimixr/issues
LinkingTo: Rcpp, RcppEigen
VignetteBuilder: knitr
NeedsCompilation: yes
Packaged: 2026-01-09 10:15:09 UTC; ap15
Author: Annesh Pal ORCID iD [aut, cre], Boris Hejblum ORCID iD [aut]
Maintainer: Annesh Pal <sistm.soft.maintain@gmail.com>
Repository: CRAN
Date/Publication: 2026-01-12 20:30:02 UTC

Update of the variational parameters

Description

Update of the variational parameters

Usage

CVI_update_function(
  fixed_variance = FALSE,
  covariance_type = "diagonal",
  cluster_specific_covariance = TRUE,
  variance_prior_type = c("IW", "decomposed", "sparse", "off-diagonal normal"),
  X,
  inverts,
  params
)

Arguments

fixed_variance

whether the covariance is fixed or estimated. Default is FALSE which means it is estimated.

covariance_type

The assumed type of the covariance matrix. Can be either "diagonal" if it is the identify multiplied by a scalar, or "full" for a fully unspecified covariance matrix.

cluster_specific_covariance

whether the the covariance is shared across estimated clusters or is cluster specific. Default is TRUE which means it is cluster specific.

variance_prior_type

character string specifying the type of prior distribution for the covariance when cluster_specific_covariance is TRUE. Can be either "IW" or "decomposed" if cluster_specific_covariance is FALSE, and can be either "IW", "sparse" or "off-diagonal normal" otherwise.

X

the data matrix

inverts

a list of inverses

params

a list of required arguments

Value

Updated parameters


General ELBO function

Description

General ELBO function

Usage

ELBO_function(
  fixed_variance = FALSE,
  covariance_type = "diagonal",
  cluster_specific_covariance = TRUE,
  variance_prior_type = c("IW", "decomposed", "sparse", "off-diagonal normal"),
  X,
  inverts,
  params
)

Arguments

fixed_variance

whether the covariance is fixed or estimated. Default is FALSE which means it is estimated.

covariance_type

The assumed type of the covariance matrix. Can be either "diagonal" if it is the identify multiplied by a scalar, or "full" for a fully unspecified covariance matrix.

cluster_specific_covariance

whether the the covariance is shared across estimated clusters or is cluster specific. Default is TRUE which means it is cluster specific.

variance_prior_type

character string specifying the type of prior distribution for the covariance when cluster_specific_covariance is TRUE. Can be either "IW" or "decomposed" if cluster_specific_covariance is FALSE, and can be either "IW", "sparse" or "off-diagonal normal" otherwise.

X

the data matrix

inverts

a list of inverses

params

a list of required arguments

Value

ELBO values


cum_clustprop

Description

Calculate the columnwise sum of rowwise cummulative probability

Usage

cum_clustprop(P1)

Arguments

P1

probability matrix

Value

rowwise cummulative probability


cum_clustprop_var

Description

Calculate the columnwise sum of rowwise cummulative probability for variance

Usage

cum_clustprop_var(P1)

Arguments

P1

probability matrix

Value

rowwise cummulative probability for variance


Collapsed variational inference for non-parametric Bayesian mixture models

Description

Collapsed variational inference for non-parametric Bayesian mixture models

Usage

cvi_npmm(
  X,
  variational_params,
  prior_shape_alpha,
  prior_rate_alpha,
  post_shape_alpha,
  post_rate_alpha,
  prior_mean_eta,
  post_mean_eta,
  log_prob_matrix = NULL,
  maxit = 100,
  n_inits = 5,
  Seed = NULL,
  parallel = FALSE,
  covariance_type = "full",
  fixed_variance = FALSE,
  cluster_specific_covariance = TRUE,
  variance_prior_type = c("IW", "decomposed", "sparse", "off-diagonal normal"),
  ...
)

Arguments

X

input data as a matrix

variational_params

number of clusters in the variational distribution

prior_shape_alpha

shape parameter of Gamma prior for the DP concentration parameter alpha. Default is 0.001

prior_rate_alpha

rate parameter of Gamma prior for the DP concentration parameter alpha. Default is 0.001

post_shape_alpha

initial value for posterior update of shape parameter for alpha. Default is 0.001

post_rate_alpha

initial value for posterior update of ratee parameter for alpha. Default is 0.001

prior_mean_eta

mean vector of MVN prior for the DP mean parameters. Default is zero vector

post_mean_eta

initial value of posterior update for the DP mean parameter

log_prob_matrix

logarithm of cluster allocation probability matrix. Default is NULL

maxit

maximum number of iterations. Default is 100

n_inits

Number of random initialisations if log_prob_matrix and other case-specific hyperparameters are NULL. Default is 5

Seed

Seeds for random initialisation; either a vector of n_inits integers or NULL. Default is NULL.

parallel

Logical input for parallelisation. Default is FALSE

covariance_type

covariance matrix is considered diagonal or full. Default is 'full'

fixed_variance

covariance matrix of the data is considered known (fixed) or unknown. Default is FALSE

cluster_specific_covariance

covariance matrix is specific to a cluster allocation or it is same over all cluster choices. Default is TRUE

variance_prior_type

For unknown and full covariance matrix, choice of matrix prior is either Inverse-Wishart ('IW') or Cholesky-decomposed ('decomposed'). For unknown, full and cluster-specific covariance matrix, choice of matrix prior is either Inverse-Wishart ('IW'), element-wise Gamma and Laplace distributed ('sparse') or element-wise Gamma and Normal distributed ('off-diagonal normal')

...

additional parameters, further details given below

Details

The following models are supported in vimixr, listing their required input arguments in ... when calling cvi_npmm():

Value

⁠[vimixr()]⁠ returns a list with the following elements:

Examples


X <- rbind(matrix(rnorm(100, m=0, sd=0.5), ncol=2),
           matrix(rnorm(100, m=3, sd=0.5), ncol=2))

#for fixed-diagonal
res <- cvi_npmm(X, variational_params = 20, prior_shape_alpha = 0.001,
         prior_rate_alpha = 0.001, post_shape_alpha = 0.001,
         post_rate_alpha = 0.001, prior_mean_eta = matrix(0, 1, ncol(X)),
         post_mean_eta = matrix(0.001, 20, ncol(X)),
         log_prob_matrix = t(apply(matrix(-3, nrow(X), 20), 1,
                             function(x){x/sum(x)})), maxit = 100,
         fixed_variance = TRUE, covariance_type = "diagonal",
         prior_precision_scalar_eta = 0.001,
         post_precision_scalar_eta = matrix(0.001, 20, 1),
         cov_data = diag(ncol(X)))
 summary(res)
 plot(res)


Root for a0 hyper-parameter for Sparse DPMM

Description

Root for a0 hyper-parameter for Sparse DPMM

Usage

eBa0(
  logP,
  X,
  a_min = min(1e-08, 1/ncol(X)),
  a_max = max(1e+06, ncol(X)),
  grid_points = min(ncol(X), 10000)
)

Arguments

logP

log of probability allocation matrix

X

observed data

a_min

minimum value of a0 for grid search

a_max

maximum value of a0 for grid search

grid_points

number of points for grid search

Value

No return value, called for side effects.


ELBO calculating functions depending on type of model for covariance matrix

Description

ELBO calculating functions depending on type of model for covariance matrix

Usage

elbo_fixed_diagonal(X, inverts, params)

Arguments

X

the data matrix

inverts

a list of inverses

params

a list of required arguments

Value

No return value, called for side effects.


Generate random log Probability matrix if not provided

Description

Generate random log Probability matrix if not provided

Usage

generate_log_prob(N, T0, seed0)

Arguments

N

rows of the data matrix

T0

variational clusters

seed0

seed for generating log Probability matrix

Value

No return value, called for side effects.


mat_mult

Description

Calculate matrix multiplication

Usage

mat_mult(A, B)

Arguments

A

matrix

B

matrix

Value

A %*% B


mat_mult_t

Description

Calculate a combination of matrix multiplications

Usage

mat_mult_t(A, B, C)

Arguments

A

matrix

B

matrix

C

matrix

Value

A %% B %% t(C)


Function to check the list of type-specific arguments

Description

Function to check the list of type-specific arguments

Usage

params_check(
  params,
  fixed_variance = FALSE,
  covariance_type = "diagonal",
  cluster_specific_covariance = TRUE,
  variance_prior_type = c("IW", "decomposed", "sparse", "off-diagonal normal")
)

Arguments

params

the list of required parameters

fixed_variance

whether covariance is assumed fixed or not; can be TRUE or FALSE

covariance_type

structure of covariance matrix; can be "diagonal" or "full"

cluster_specific_covariance

whether covariance matrix is cluster specific or not; can be TRUE or FALSE

variance_prior_type

prior distribution for the covariance matrix; can be "IW" or "decomposed" when cluster_specific_covariance = FALSE, or can be "IW", "sparse" or "off-diagonal normal" otherwise

Value

stops the code if the required list of arguments are not present


S3 plotting function for CVIoutputobjects'

Description

S3 plotting function for CVIoutputobjects'

Usage

## S3 method for class 'CVIoutput'
plot(x, ...)

Arguments

x

a CVIoutput object

...

additional arguments

Value

A ggplot object representing visualisation


quadratic_form_diag

Description

Calculate a combination of matrix multiplications

Usage

quadratic_form_diag(A, B)

Arguments

A

matrix

B

matrix

Value

diag(A %% B %% t(A))


CVI implementation for one set of initial parameters

Description

CVI implementation for one set of initial parameters

Usage

run_single(
  config,
  X,
  N,
  D,
  T0,
  prior_shape_alpha,
  prior_rate_alpha,
  post_shape_alpha,
  post_rate_alpha,
  prior_mean_eta,
  post_mean_eta,
  fixed_variance,
  covariance_type,
  cluster_specific_covariance,
  variance_prior_type,
  maxit,
  varargs
)

Arguments

config

List of inputs that are generated if not user-provided

X

the data matrix

N

samples of X

D

dimensions of X

T0

variational clusters

prior_shape_alpha

shape parameter of Gamma prior for the DP concentration parameter alpha. Default is 0.001

prior_rate_alpha

rate parameter of Gamma prior for the DP concentration parameter alpha. Default is 0.001

post_shape_alpha

initial value for posterior update of shape parameter for alpha. Default is 0.001

post_rate_alpha

initial value for posterior update of ratee parameter for alpha. Default is 0.001

prior_mean_eta

mean vector of MVN prior for the DP mean parameters. Default is zero vector

post_mean_eta

initial value of posterior update for the DP mean parameter

fixed_variance

covariance matrix of the data is considered known (fixed) or unknown.

covariance_type

covariance matrix is considered diagonal or full.

cluster_specific_covariance

covariance matrix is specific to a cluster allocation or it is same over all cluster choices.

variance_prior_type

For unknown and full covariance matrix, choice of matrix prior is either Inverse-Wishart ('IW') or Cholesky-decomposed ('decomposed'). For unknown, full and cluster-specific covariance matrix, choice of matrix prior is either Inverse-Wishart ('IW'), element-wise Gamma and Laplace distributed ('sparse') or element-wise Gamma and Normal distributed ('off-diagonal normal')

maxit

Maximum number of iterations for variational updates

varargs

List of case specific parameters

Value

a list with the following elements:


sweep_3D

Description

A C++ alternative of sweep() function from base R

Usage

sweep_3D(A, R, dims, n_threads = 4L)

Arguments

A

a 3D array

R

a vector

dims

dimensions in 3D

n_threads

number of threads

Value

sweep(A, 3, R, "*")


t_mat_mult

Description

Calculate a combination of matrix multiplications

Usage

t_mat_mult(A, B, C)

Arguments

A

matrix

B

matrix

C

matrix

Value

t(A) %% B %% C

mirror server hosted at Truenetwork, Russian Federation.