| Title: | Coverage Correlation Coefficient and Testing for Independence |
| Version: | 1.1.0 |
| Maintainer: | Tengyao Wang <t.wang59@lse.ac.uk> |
| Description: | Computes the coverage correlation coefficient introduced in <doi:10.48550/arXiv.2508.06402> , a statistical measure that quantifies dependence between two random vectors by computing the union volume of data-centered hypercubes in a uniform space. |
| License: | GPL-3 |
| Imports: | transport, MASS, fields, grDevices |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| Suggests: | knitr, rmarkdown, testthat (≥ 3.0.0) |
| VignetteBuilder: | knitr |
| Config/testthat/edition: | 3 |
| NeedsCompilation: | yes |
| Packaged: | 2026-07-02 13:27:52 UTC; monaazadkia |
| Author: | Tengyao Wang [aut, cre], Mona Azadkia [aut, ctb], Xuzhi Yang [aut, ctb] |
| Depends: | R (≥ 3.5.0) |
| Repository: | CRAN |
| Date/Publication: | 2026-07-02 16:20:02 UTC |
Dataset: CD8+ T cell gene expression data
Description
The CD8T dataset provides the gene expression data of fetal CD8+ T cells obtained in a single-cell RNA-seq experiment.
Usage
data(CD8T)
Format
A data frame with 9369 rows (cells) and 1000 columns (genes).
Source
Suo et al., Science (2022).
References
Suo, C., Dann, E., Goh, I., Jardine, L., Kleshchevnikov, V., Park, J.-E., Botting, R. A., et al. "Mapping the developing human immune system across organs." Science 376(6597), eabo0510 (2022).
Monge–Kantorovich ranks (uniform OT via squared distances)
Description
Computes the optimal matching that maps each observation in X to a
reference point in U using uniform weights and squared Euclidean cost.
Internally uses transport::transport(method = "networkflow", p = 2).
In 1D, this reduces to a rank-based matching
sort(U)[rank(X, ties.method = "random")].
Usage
MK_rank(X, U)
Arguments
X |
Numeric vector of length |
U |
Numeric vector of length |
Details
Rows must match:
nrow(X) == nrow(U)(otherwise an error is thrown).Columns must match:
ncol(X) == ncol(U)(otherwise an error is thrown).Weights are uniform (
1/n) and the cost matrix is the sum of squared coordinate differences across columns.In 1D, ties in
Xare broken at random viaties.method = "random"; useset.seed()for reproducibility.
Value
If ncol(X) == 1, a numeric vector of length n
containing the entries of U reordered to match the ranks of
X. Otherwise, a numeric n \times d matrix whose i-th row
is the matched row of U corresponding to the i-th row of
X.
Dependencies
Requires the transport package.
Examples
# 1D example (set seed for reproducible tie-breaking)
set.seed(1)
x <- rnorm(10)
u <- seq(0, 1, length.out = 10)
MK_rank(x, u)
# 2D example
set.seed(42)
X <- matrix(rnorm(200), ncol = 2) # 100 x 2
U <- matrix(runif(200), ncol = 2) # 100 x 2
R <- MK_rank(X, U)
dim(R) # 100 2
Coverage-based Dependence Measure with Optional Visualisation
Description
Computes the coverage correlation coefficient between input x and y, as introduced in the arXiv preprint. This coefficient measures the dependence between two random variables or vectors.
Usage
coverage_correlation(
x,
y,
visualise = FALSE,
method = c("auto", "exact", "approx"),
M = NULL,
na.rm = TRUE
)
Arguments
x |
Numeric vector or matrix. |
y |
Numeric vector or matrix with the same number of rows as |
visualise |
Logical; if |
method |
Character string specifying the computation method. Options are |
M |
Integer; Number of Monte Carlo integration sample points (used when |
na.rm |
Logical; if |
Details
The procedure is as follows:
Calculate the rank transformations
(r_x, r_y)of the inputsxandy.Construct small cubes (in 2D, squares) of volume
n^{-1}centered at each rank-transformed point.Compute the total area of the union of these cubes, intersected with
[0,1]^dwhered = d_x + d_y.
The coverage correlation coefficient is then calculated based on this union area.
For more details, please refer to the original paper: the arXiv preprint.
The method argument controls how the computation is performed:
-
"exact": Computes the exact value. -
"approx": Uses a Monte Carlo approximation withMsample points. -
"auto": Automatically selects a method based on the total number of columns inxandy: if more than 6,"approx"is used (withM = nrow(x)^{1.5}ifMis not provided); otherwise,"exact"is used.
Value
A list with four elements:
-
stat– The numeric value of the coverage correlation coefficient. -
pval– The p-value, calculated using the exact variance under the null hypothesis of independence betweenxandy. -
method– A character string indicating the computation method used. -
mc_se– A numeric value. If method "approx" was usedmc_seis the standard error of the Monte Carlo approximation, otherwise it is 0.
Examples
set.seed(1)
n <- 100
x <- runif(n)
y <- sin(3*x) + runif(n) * 0.01
coverage_correlation(x, y, visualise = TRUE)
Coverage-based Dependence Measure for K Random Vectors
Description
Computes the coverage correlation coefficient jointly across K input
variables or vectors X_1, \ldots, X_K, following exactly the same logic
as coverage_correlation but generalised from a pair to an
arbitrary number of inputs. The coefficient measures the joint dependence
among the K inputs.
Usage
coverage_correlation_K(
data,
method = c("auto", "exact", "approx"),
M = NULL,
na.rm = TRUE
)
Arguments
data |
A list of |
method |
Character string specifying the computation method. Options are |
M |
Integer; Number of Monte Carlo integration sample points (used when |
na.rm |
Logical; if |
Details
The procedure is as follows:
Calculate the rank transformation of each input
X_k.Construct small cubes of volume
n^{-1}centered at each rank-transformed point in[0,1]^d, whered = d_1 + \cdots + d_K.Compute the total volume of the union of these cubes, intersected with
[0,1]^d.
The coverage correlation coefficient is then calculated based on this union volume.
For more details, please refer to the original paper: the arXiv preprint.
The method argument controls how the computation is performed:
-
"exact": Computes the exact value. -
"approx": Uses a Monte Carlo approximation withMsample points. -
"auto": Automatically selects a method based on the total number of columns across all inputs: if more than 6,"approx"is used (withM = n^{1.5}ifMis not provided); otherwise,"exact"is used.
Value
A list with four elements:
-
stat– The numeric value of the coverage correlation coefficient. -
pval– The p-value, calculated using the exact variance under the null hypothesis of mutual independence among the inputs. -
method– A character string indicating the computation method used. -
mc_se– A numeric value. If method "approx" was usedmc_seis the standard error of the Monte Carlo approximation, otherwise it is 0.
See Also
Examples
set.seed(1)
n <- 100
x1 <- runif(n)
x2 <- sin(3 * x1) + runif(n) * 0.01
x3 <- x1 * x2 + runif(n) * 0.1
coverage_correlation_K(list(x1, x2, x3))
Coverage-based Dependence Measure for K Random Vectors with Optional Reference Grids
Description
Identical to coverage_correlation_K, except that the reference
points used for the Monge–Kantorovich rank transformation may be supplied
directly via grid rather than being generated internally from
runif. This is useful for reproducibility, for using low-discrepancy
(e.g. quasi-Monte Carlo) reference grids, or for comparing results across
calls with a fixed reference set.
Usage
coverage_correlation_K_grid(
data,
grid = NULL,
method = c("auto", "exact", "approx"),
M = NULL,
na.rm = TRUE
)
Arguments
data |
A list of |
grid |
A list of |
method |
Character string specifying the computation method. Options are |
M |
Integer; Number of Monte Carlo integration sample points (used when |
na.rm |
Logical; if |
Details
When every input in data is one-dimensional, grid may be
omitted; in that case each reference grid defaults to the deterministic
uniform grid \{1/n, 2/n, \ldots, 1\}, which is the natural 1D reference
set and makes the rank transformation deterministic.
The method argument controls how the computation is performed:
-
"exact": Computes the exact value. -
"approx": Uses a Monte Carlo approximation withMsample points. -
"auto": Automatically selects a method based on the total number of columns across all inputs: if more than 6,"approx"is used (withM = n^{1.5}ifMis not provided); otherwise,"exact"is used.
When na.rm = TRUE, rows with NA in any input are removed, and
the corresponding rows of any supplied grids are removed as well so that the
reference grids stay aligned with the data. Default grids are constructed
after NA removal.
Value
A list with four elements:
-
stat– The numeric value of the coverage correlation coefficient. -
pval– The p-value, calculated using the exact variance under the null hypothesis of mutual independence among the inputs, if K > 2 returns NA. -
method– A character string indicating the computation method used. -
mc_se– A numeric value. If method "approx" was usedmc_seis the standard error of the Monte Carlo approximation, otherwise it is 0.
See Also
coverage_correlation_K, coverage_correlation_grid
Examples
set.seed(1)
n <- 100
x1 <- runif(n)
x2 <- sin(3 * x1) + runif(n) * 0.01
x3 <- x1 * x2 + runif(n) * 0.1
# all 1D: grid defaults to the uniform grid {1/n, ..., 1} for each input
coverage_correlation_K_grid(list(x1, x2, x3))
# supplying grids explicitly still works
g <- list(runif(n), runif(n), runif(n))
coverage_correlation_K_grid(list(x1, x2, x3), grid = g)
Coverage-based Dependence Measure with Optional Reference Grids
Description
Identical to coverage_correlation, except that the reference
points used for the Monge–Kantorovich rank transformation may be supplied
directly via u and v rather than being generated internally
from runif. This is useful for reproducibility, for using
low-discrepancy (e.g. quasi-Monte Carlo) reference grids, or for comparing
results across calls with a fixed reference set.
Usage
coverage_correlation_grid(
x,
y,
u = NULL,
v = NULL,
visualise = FALSE,
method = c("auto", "exact", "approx"),
M = NULL,
na.rm = TRUE
)
Arguments
x |
Numeric vector or matrix. |
y |
Numeric vector or matrix with the same number of rows as |
u |
Numeric vector or matrix of reference points for |
v |
Numeric vector or matrix of reference points for |
visualise |
Logical; if |
method |
Character string specifying the computation method. Options are |
M |
Integer; Number of Monte Carlo integration sample points (used when |
na.rm |
Logical; if |
Details
When both x and y are one-dimensional, u and v
may be omitted; in that case they default to the deterministic uniform grid
\{1/n, 2/n, \ldots, 1\}, which is the natural 1D reference set and
makes the rank transformation deterministic.
The method argument controls how the computation is performed:
-
"exact": Computes the exact value. -
"approx": Uses a Monte Carlo approximation withMsample points. -
"auto": Automatically selects a method based on the total number of columns inxandy: if more than 6,"approx"is used (withM = nrow(x)^{1.5}ifMis not provided); otherwise,"exact"is used.
When na.rm = TRUE, rows with NA in x or y are
removed, and the corresponding rows of any user-supplied u and
v are removed as well so that the reference grids stay aligned with
the data. Default grids are constructed after NA removal.
Value
A list with four elements:
-
stat– The numeric value of the coverage correlation coefficient. -
pval– The p-value, calculated using the exact variance under the null hypothesis of independence betweenxandy. -
method– A character string indicating the computation method used. -
mc_se– A numeric value. If method "approx" was usedmc_seis the standard error of the Monte Carlo approximation, otherwise it is 0.
See Also
Examples
set.seed(1)
n <- 100
x <- runif(n)
y <- sin(3*x) + runif(n) * 0.01
# 1D: u and v default to the uniform grid {1/n, ..., 1}
coverage_correlation_grid(x, y, visualise = TRUE)
# supplying grids explicitly still works
u <- runif(n)
v <- runif(n)
coverage_correlation_grid(x, y, u, v)
Coverage-based Dependence Measure (unified interface)
Description
A single front-end that dispatches to the appropriate coverage correlation
routine based on its inputs. It bundles together
coverage_correlation, coverage_correlation_grid,
coverage_correlation_K, and
coverage_correlation_K_grid.
Usage
covercorr(
x,
y = NULL,
reference = c("random", "deterministic"),
grid = NULL,
visualise = FALSE,
method = c("auto", "exact", "approx"),
M = NULL,
na.rm = TRUE
)
Arguments
x |
A numeric vector, matrix, data frame, data table, or a list of numeric vectors/matrices. See Details for how each type is dispatched. |
y |
Optional numeric vector or matrix with the same number of rows as
|
reference |
Character string, either |
grid |
Optional list of reference-point sets, one per variable, each
matching the shape of the variable it corresponds to (same number of rows
and columns). In the pairwise case |
visualise |
Logical; if |
method |
Character string specifying the computation method. Options are |
M |
Integer; Number of Monte Carlo integration sample points (used when |
na.rm |
Logical; if |
Details
Dispatch rules, based on the first argument x:
If
yis supplied, the pairwise case is used on(x, y).If
xis a list, each element is treated as one variable (possibly multivariate) and theK-input case is used.If
xis a matrix, data frame, or data table (andyis not supplied), each column is treated as a univariate variable: a two-column input uses the pairwise case, and an input with more than two columns uses theK-input case withKequal to the number of columns.
The reference argument selects the reference points used for the
Monge–Kantorovich rank transformation:
-
"random": independentrunifreference points are generated internally (routes tocoverage_correlation/coverage_correlation_K). -
"deterministic": the deterministic uniform grid\{1/n, \ldots, 1\}is used for each (univariate) variable, unless the user supplies their own reference points viagrid. Routes tocoverage_correlation_grid/coverage_correlation_K_grid.
Supplying grid implies reference = "deterministic"
automatically.
Value
A list with elements stat, pval, method, and
mc_se; see coverage_correlation for details.
See Also
coverage_correlation,
coverage_correlation_grid,
coverage_correlation_K,
coverage_correlation_K_grid
Examples
set.seed(1)
n <- 100
x <- runif(n)
y <- sin(3 * x) + runif(n) * 0.01
# pairwise, random reference points
covercorr(x, y)
# pairwise, deterministic grid, with visualisation
covercorr(x, y, reference = "deterministic", visualise = TRUE)
# pairwise, user-supplied reference points (implies deterministic)
covercorr(x, y, grid = list(runif(n), runif(n)))
# K variables as a matrix; supply custom grids for each column
z <- x * y + runif(n) * 0.1
covercorr(cbind(x, y, z), grid = list(runif(n), runif(n), runif(n)))
Total volume of union of rectangles
Description
Total volume of union of rectangles
Usage
covered_volume(zmin, zmax)
Arguments
zmin |
n x d matrix of bottomleft coordinates, one row per rectangle |
zmax |
n x d matrix of topright coordinates, one row per rectangle |
Details
This is a wrapper of the C_covered_volume_partitioned function in C
Value
a numeric value of the volume of the union
Total volume of union of rectangles using Monte Carlo integration
Description
Total volume of union of rectangles using Monte Carlo integration
Usage
covered_volume_mc(zmin_s, zmax_s, M)
Arguments
zmin_s |
n x d matrix of bottomleft coordinates, one row per rectangle |
zmax_s |
n x d matrix of topright coordinates, one row per rectangle |
M |
number of Monte Carlo integration sample points |
Details
This is a wrapper of the C_covered_volume_mc function in C
Value
a list of the estimated volume of the union and its standard error
Total volume of union of rectangles using volume hashing
Description
Total volume of union of rectangles using volume hashing
Usage
covered_volume_partitioned(zmin, zmax)
Arguments
zmin |
n x d matrix of bottomleft coordinates, one row per rectangle |
zmax |
n x d matrix of topright coordinates, one row per rectangle |
Details
This is a wrapper of the C_covered_volume_partitioned function in C
Value
a numeric value of the volume of the union
Plot a collection of axis-aligned rectangles in the unit square
Description
Draws rectangles specified by their xmin, xmax, ymin,
and ymax, optionally adding them to an existing plot. When
add = FALSE, a fresh [0,1]\times[0,1] plot with a grid and
equal aspect ratio is created.
Usage
plot_rectangles(xmin, xmax, ymin, ymax, add = FALSE)
Arguments
xmin |
Numeric vector of left x-coordinates. |
xmax |
Numeric vector of right x-coordinates (same length as |
ymin |
Numeric vector of bottom y-coordinates (same length as |
ymax |
Numeric vector of top y-coordinates (same length as |
add |
Logical; if |
Value
Invisibly returns NULL. Use this function for its plotting output, not for a returned value.
Split rectangles by wrapping them around edges of [0,1]^d
Description
Split rectangles by wrapping them around edges of [0,1]^d
Usage
split_rectangles(zmin, zmax)
Arguments
zmin |
n x d matrix of bottom-left coordinates, one row per rectangle |
zmax |
n x d matrix of top-right coordinates, one row per rectangle |
Details
This is a wrapper of the C_split_rectangles function implemented in C
Value
a list of zmin and zmax, describing the bottom-left and top-right coordinates of splitted rectangles
Variance of the the excess vacancy
Description
Exact formula for n times the variance of the excess vacancy.
For independent X and Y, the variance of the coverage correlation
coefficient is obtained by dividing the returned value by n(1 - e^{-1})^2.
check the arXiv preprint for more details
Usage
variance_formula(n, d)
Arguments
n |
sample size |
d |
dimension |
Value
variance formula in paper
Visualise the joint density of two rank vectors on the unit square
Description
Produces an image plot of the log-ratio of an estimated joint density to the
uniform density over [0,1]^2, given the (Monge–Kantorovich) rank
transformations of two variables. Regions where the pair of variables is more
concentrated than under independence appear in one colour and regions where
it is sparser appear in another, making departures from independence easy to
read off visually.
Usage
visualise_density(x_rank, y_rank)
Arguments
x_rank |
Numeric vector of rank values in |
y_rank |
Numeric vector of rank values in |
Details
The density is estimated with a two-dimensional kernel density estimate
(kde2d). To avoid edge effects near the boundary of the
unit square, the points are first tiled toroidally (see the internal
wrap_points helper) before the kernel density estimate is computed on
[0,1]^2. The estimate is normalised by its mean over the unit square,
so that a value of zero on the log scale corresponds to the uniform density.
Value
Invisibly returns NULL. Use this function for its plotting
output, not for a returned value.
See Also
Examples
set.seed(1)
n <- 500
x <- runif(n)
y <- sin(3 * x) + runif(n) * 0.1
x_rank <- MK_rank(as.matrix(x), runif(n))
y_rank <- MK_rank(as.matrix(y), runif(n))
visualise_density(x_rank, y_rank)