Help for package scPairs

Version:

0.1.8

Title:

Identifying Synergistic Gene Pairs in Single-Cell and Spatial Transcriptomics

Description:

Discovers synergistic gene pairs in single-cell RNA-seq and spatial transcriptomics data. Unlike conventional pairwise co-expression analyses that rely on a single correlation metric, scPairs integrates 14 complementary metrics across five orthogonal evidence layers to compute a composite synergy score with optional permutation-based significance testing. The five evidence layers span cell-level co-expression (Pearson, Spearman, biweight midcorrelation, mutual information, ratio consistency), neighbourhood-aware smoothing (KNN-smoothed correlation, neighbourhood co-expression, cluster pseudo-bulk, cross-cell-type, neighbourhood synergy), prior biological knowledge (GO/KEGG co-annotation Jaccard, pathway bridge score), trans-cellular interaction, and spatial co-variation (Lee's L, co-location quotient). This multi-scale design enables researchers to move beyond simple co-expression towards a comprehensive characterisation of cooperative gene regulation at transcriptomic and spatial resolution. For more information, see the package documentation at https://github.com/zhaoqing-wang/scPairs.

License:

MIT + file LICENSE

URL:

https://github.com/zhaoqing-wang/scPairs

BugReports:

https://github.com/zhaoqing-wang/scPairs/issues

Depends:

R (≥ 4.1.0)

Imports:

data.table, ggplot2, ggraph, ggrepel, igraph, methods, Matrix, patchwork, Seurat (≥ 4.0), SeuratObject, stats, tidygraph, tidyr

Suggests:

AnnotationDbi, org.Mm.eg.db, org.Hs.eg.db, crayon, ggExtra, RANN, testthat (≥ 3.0.0)

Encoding:

UTF-8

LazyData:

true

RoxygenNote:

7.3.3

Date:

2026-02-28

NeedsCompilation:

Packaged:

2026-02-28 10:16:08 UTC; Runaw

Author:

Zhaoqing Wang

[aut, cre]

Maintainer:

Zhaoqing Wang <zhaoqingwang@mail.sdu.edu.cn>

Repository:

CRAN

Date/Publication:

2026-03-05 10:30:02 UTC

Align spatial coordinates and expression matrix

Description

Align spatial coordinates and expression matrix

Usage

.align_spatial_data(coords, mat, min_cells = 20)

Arguments

coords

Data.frame with x, y columns, rownames = cell barcodes.

mat

Expression matrix (genes x cells).

min_cells

Integer; minimum cells required.

Value

A list with aligned coords, mat, and n. Returns NULL if insufficient cells.

Assign cells to micro-environment bins using k-means on PCA embedding

Description

Cells are clustered into spatial micro-environment bins so that each bin represents a local neighbourhood of the tissue. Crucially, cells of all types within the same bin are considered to share a micro-environment and can thus interact.

Usage

.assign_microenv_bins(embed, n_bins)

Arguments

embed

Numeric matrix (n_cells x n_dims) of PCA (or other) embedding.

n_bins

Integer; number of micro-environment bins.

Value

Integer vector of bin assignments (1..n_bins).

Biweight midcorrelation between two numeric vectors

Description

Implements the Tukey biweight robust correlation (Langfelder & Horvath 2012). Retained for single-pair computation in AssessGenePair().

Usage

.bicor(x, y)

Arguments

x, y

Numeric vectors of equal length.

Value

Scalar biweight midcorrelation.

Compute biweight midcorrelation matrix for all gene pairs

Description

Vectorised implementation that processes all genes at once using matrix operations. For genes where the MAD is zero (common in sparse scRNA-seq), falls back to Pearson on non-zero cells.

Usage

.bicor_matrix(mat)

Arguments

mat

Dense numeric matrix, genes in rows, cells in columns.

Value

Symmetric numeric matrix of biweight midcorrelations.

Compute bridge score and bridge genes for a single pair

Description

Compute bridge score and bridge genes for a single pair

Usage

.bridge_score(
  gene1,
  gene2,
  prior_net,
  expressed_genes,
  min_shared = 1,
  top_n = 20
)

Compute pathway bridge score for gene pairs

Description

For a gene pair (A, B), identifies intermediate "bridge" genes C such that C shares functional annotations with BOTH A and B, AND C is expressed in the current dataset. The bridge score reflects the strength of indirect connectivity:

Usage

.bridge_score_batch(pair_dt, prior_net, expressed_genes, min_shared = 1)

Arguments

pair_dt

data.table with gene1, gene2 columns.

prior_net

Prior network from .build_prior_network().

expressed_genes

Character vector of genes expressed in the dataset (bridge genes must be expressed to be relevant).

min_shared

Integer; minimum shared terms between a bridge gene and each member of the pair. Default 1.

Details

bridge\_score = \frac{n\_bridges}{\sqrt{|terms_A| \cdot |terms_B|}}

This captures synergistic relationships where two genes are not directly co-annotated but are connected through shared pathway intermediaries.

Value

A list with:

scores: Numeric vector of bridge scores.
bridges: List of character vectors; bridge genes for each pair.

Build a single heatmap panel for cross-cell-type correlation

Description

Build a single heatmap panel for cross-cell-type correlation

Usage

.build_heatmap_panel(
  detail,
  ct_levels,
  gene_src,
  gene_nbr,
  direction,
  global_r,
  show_n,
  diverging,
  n_col = "n_bins_valid",
  r_max = NULL
)

Build a KNN graph from a Seurat reduction embedding

Description

Build a KNN graph from a Seurat reduction embedding

Usage

.build_knn_graph(object, reduction = "pca", k = 20, dims = 1:30)

Arguments

object

Seurat object.

reduction

Character; reduction to use (default "pca").

k

Integer; number of nearest neighbours (default 20).

dims

Integer vector; dimensions to use (default 1:30).

Value

A sparse row-standardised weight matrix (n_cells x n_cells).

Build a common minimal theme for panel plots

Description

Build a common minimal theme for panel plots

Usage

.build_panel_theme(axis_text_size = NULL, title_size = 10)

Arguments

axis_text_size

Numeric; axis text size. NULL = hide axis text.

title_size

Numeric; title text size.

Value

A ggplot2 theme object.

Build a prior knowledge gene interaction network

Description

Constructs a gene-gene interaction list from available annotation sources. The function queries GO (Biological Process), KEGG pathways, and optionally user-supplied interaction databases. It returns a list structure that can be used for scoring gene pairs.

Usage

.build_prior_network(
  organism = "mouse",
  genes = NULL,
  sources = c("GO", "KEGG"),
  custom_pairs = NULL,
  min_genes = 5,
  max_genes = 500,
  verbose = TRUE
)

Arguments

organism

Character; organism identifier for annotation lookup. Supported: "mouse" (Mus musculus) or "human" (Homo sapiens).

genes

Character vector of gene symbols to include (typically the features in the Seurat object). Restricts the network to relevant genes.

sources

Character vector; knowledge sources to use. Any subset of c("GO", "KEGG", "custom"). Default: c("GO", "KEGG").

custom_pairs

Optional data.frame with columns gene1, gene2 (and optionally source, weight) for user-supplied interactions. This can include CellChatDB, CellPhoneDB, SCENIC regulon targets, etc.

min_genes

Integer; minimum number of genes in a GO/KEGG term to be included (avoids overly broad terms). Default 5.

max_genes

Integer; maximum genes per term (avoids overly broad terms like "protein binding"). Default 500.

verbose

Logical.

Value

A list with components:

gene_sets: Named list of character vectors; each element is a functional term/pathway mapping to its member genes.
gene_to_terms: Named list; for each gene, the set of terms it belongs to.
interactions: data.table with columns gene1, gene2, source, n_shared_terms, jaccard.
organism: Character.
n_genes: Integer; number of genes covered.
n_terms: Integer; number of functional terms.

Build standardised scPairs result object

Description

Build standardised scPairs result object

Usage

.build_result(pair_dt, features, object, has_spatial, params, mode = "all")

Build a spatial KNN weight matrix

Description

Shared helper used by both Lee's L and CLQ computations.

Usage

.build_spatial_knn(coords, k, row_standardise = TRUE)

Arguments

coords

Data.frame with x, y columns.

k

Integer; number of spatial nearest neighbours.

row_standardise

Logical; if TRUE, return row-standardised weights. If FALSE, return binary indicator matrix.

Value

A sparse matrix (n x n).

Single-pair cluster-level correlation

Description

Single-pair cluster-level correlation

Usage

.cluster_cor(x, y, cluster_ids)

Cluster-level (pseudo-bulk) correlation for gene pairs

Description

Computes the Pearson correlation of cluster-level mean expression. This captures patterns where two genes tend to be expressed in the same cell populations, even if they are never co-detected in the same cell.

Usage

.cluster_cor_batch(mat, cluster_ids, pair_dt)

Arguments

mat

Expression matrix (genes x cells).

cluster_ids

Factor of cluster assignments.

pair_dt

data.table with gene1, gene2.

Value

Numeric vector of cluster-level correlations.

Compute biweight midcorrelation (unified interface)

Description

Dispatches to matrix mode when y is NULL (batch computation for all gene pairs), or vector mode when both x and y are numeric vectors (single-pair computation).

Usage

.compute_bicor(x, y = NULL)

Arguments

x

Dense numeric matrix (genes x cells) for batch mode, or numeric vector for single-pair mode.

y

NULL for batch mode, or numeric vector for single-pair mode.

Value

Symmetric matrix of biweight midcorrelations (batch) or scalar biweight midcorrelation (single-pair).

Compute cluster-level pseudo-bulk correlation (unified interface)

Description

Compute cluster-level pseudo-bulk correlation (unified interface)

Usage

.compute_cluster_cor(x, y = NULL, cluster_ids, pair_dt = NULL)

Arguments

x

Expression matrix (genes x cells) for batch mode, or numeric vector for single-pair mode.

y

NULL for batch mode, or numeric vector.

cluster_ids

Factor of cluster assignments.

pair_dt

data.table with gene1, gene2 (batch only).

Value

Numeric vector (batch) or scalar (single-pair).

Compute pairwise co-expression metrics for a set of genes

Description

Calculates Pearson correlation, Spearman correlation, biweight midcorrelation, mutual information (discretised), and expression ratio consistency across cell clusters. Designed for speed: uses vectorised matrix algebra where possible and avoids per-pair loops for correlation.

Usage

.compute_coexpression(
  mat,
  features,
  cluster_ids = NULL,
  cor_method = c("pearson", "spearman", "biweight"),
  n_mi_bins = 5,
  min_cells_expressed = 10,
  verbose = TRUE
)

Arguments

mat

Numeric (or sparse) matrix, genes in rows, cells in columns. Should be log-normalised expression values.

features

Character vector of gene names to include (must be rownames of mat).

cluster_ids

Factor or character vector of cluster assignments, length = ncol(mat). Used for ratio-consistency calculation. NULL to skip that metric.

cor_method

Character vector of correlation types to compute. Any subset of c("pearson", "spearman", "biweight").

n_mi_bins

Integer; number of bins for mutual information discretisation. Set to 0 to skip MI.

min_cells_expressed

Integer; minimum number of cells where both genes of a pair must be expressed (> 0) to retain the pair.

verbose

Logical; print progress messages.

Details

Performance (v0.1.1): Co-expression filtering, biweight midcorrelation, mutual information, and ratio consistency are all vectorised where possible. The co-expression filter uses a single matrix crossproduct instead of per-pair loops. Biweight midcorrelation uses a fast vectorised implementation that processes all pairs at once via matrix operations. Mutual information uses pre-computed bin matrices. Ratio consistency is vectorised over clusters using matrix operations.

Biweight midcorrelation (Langfelder & Horvath, 2012) is a robust alternative to Pearson correlation that down-weights outlier observations, which is particularly valuable for noisy single-cell data.

Ratio consistency measures whether the expression ratio of two genes is stable across clusters: for each cluster the log-ratio median is computed, and consistency = 1 - CoV of those medians (bounded to [0, 1]). High values indicate the two genes maintain a fixed stoichiometric relationship across cell populations – a hallmark of genuine co-regulation.

Mutual information captures non-linear dependencies missed by correlation. Expression values are discretised into equal-frequency bins and MI is estimated via the plug-in estimator.

Value

A data.table with columns: gene1, gene2, plus any computed metric columns (cor_pearson, cor_spearman, cor_biweight, mi_score, ratio_consistency).

Compute cross-cell-type interaction score (unified interface)

Description

Dispatches to batch mode when pair_dt is provided (matrix in mat), or single-pair mode when x and y are vectors.

Usage

.compute_cross_celltype(
  x,
  y = NULL,
  pair_dt = NULL,
  cluster_ids,
  embed,
  n_bins = 50,
  min_cells_per_bin = 5,
  min_bins = 8,
  min_pct_expressed = 0.01
)

Arguments

x

Expression matrix (genes x cells) for batch mode, or numeric vector for single-pair mode.

y

NULL for batch mode, or numeric vector.

pair_dt

data.table with gene1, gene2 columns (batch only).

cluster_ids

Factor of cell-type assignments.

embed

PCA embedding matrix.

n_bins

Integer; number of micro-environment bins.

min_cells_per_bin

Integer; minimum cells per type per bin.

min_bins

Integer; minimum valid bins per type pair.

min_pct_expressed

Numeric; minimum expression % per cell type.

Value

Numeric vector of scores (batch) or list with detailed results (single-pair).

Compute mutual information (unified interface)

Description

Dispatches to batch mode when pair_idx is provided, or single-pair mode when x and y are vectors.

Usage

.compute_mi(x, y = NULL, pair_idx = NULL, n_bins = 5)

Arguments

x

Dense numeric matrix (genes x cells) for batch mode, or numeric vector for single-pair mode.

y

NULL for batch mode, or numeric vector for single-pair mode.

pair_idx

2-row integer matrix of pair indices (batch mode only).

n_bins

Number of bins for discretisation.

Value

Numeric vector of MI values (batch) or scalar MI (single-pair).

Compute neighbourhood co-expression score (unified interface)

Description

Compute neighbourhood co-expression score (unified interface)

Usage

.compute_neighbourhood_score(
  x,
  y = NULL,
  pair_dt = NULL,
  W,
  k = NULL,
  expr_threshold = 0
)

Arguments

x

Expression matrix (genes x cells) for batch mode, or numeric vector for single-pair mode.

y

NULL for batch mode, or numeric vector.

pair_dt

data.table with gene1, gene2 (batch only).

W

KNN weight matrix.

k

Neighbourhood size (single-pair only).

expr_threshold

Expression threshold.

Value

Numeric vector (batch) or scalar (single-pair).

Unified Core Metric Engine for scPairs

Description

Shared computation backbone invoked by all three discovery functions (FindAllPairs, FindGenePairs, AssessGenePair). Accepts a pre-built pair table and computes the requested metric layers (expression, neighbourhood, prior knowledge, spatial) on demand.

Users control which layers to compute via dedicated flags. When mode = "prior_only", only prior knowledge metrics are returned; when mode = "expression", only expression-based metrics are computed; the default mode = "all" computes every available layer.

Usage

.compute_pair_metrics(
  mat,
  pair_dt,
  cluster_ids,
  object,
  mode = c("all", "expression", "prior_only"),
  cor_method = c("pearson", "spearman", "biweight"),
  n_mi_bins = 5,
  min_cells_expressed = 10,
  use_neighbourhood = TRUE,
  neighbourhood_k = 20,
  neighbourhood_reduction = "pca",
  smooth_alpha = 0.3,
  use_prior = TRUE,
  organism = "mouse",
  custom_pairs = NULL,
  use_spatial = TRUE,
  spatial_k = 6,
  n_perm = 0,
  weights = NULL,
  verbose = TRUE
)

Arguments

mat

Expression matrix (genes x cells, dense or sparse).

pair_dt

data.table with columns gene1, gene2.

cluster_ids

Factor of cluster assignments.

object

Seurat object (needed for KNN graph / embedding).

mode

Character; one of "all" (default), "expression", "prior_only".

cor_method

Character vector of correlation methods.

n_mi_bins

Integer; bins for mutual information.

min_cells_expressed

Integer; minimum co-expressing cells.

use_neighbourhood

Logical; compute neighbourhood metrics.

neighbourhood_k

Integer; KNN k.

neighbourhood_reduction

Character; reduction for KNN.

smooth_alpha

Numeric; self-weight for KNN smoothing.

use_prior

Logical; compute prior knowledge metrics.

organism

Character; "mouse" or "human".

custom_pairs

Optional data.frame of custom interactions.

use_spatial

Logical; compute spatial metrics.

spatial_k

Integer; spatial neighbourhood k.

n_perm

Integer; permutations for score integration.

weights

Named numeric; metric weights.

verbose

Logical.

Value

A list with:

pair_dt: Updated data.table with all metric columns.
prior_net: Prior network object (if computed).
W: KNN weight matrix (if computed).
has_spatial: Logical.
has_neighbourhood: Logical.

Compute ratio consistency (unified interface)

Description

Dispatches to batch mode when pair_idx is provided, or single-pair mode when x and y are vectors.

Usage

.compute_ratio_consistency(x, y = NULL, pair_idx = NULL, cluster_ids)

Arguments

x

Dense numeric matrix (genes x cells) for batch mode, or numeric vector for single-pair mode.

y

NULL for batch mode, or numeric vector for single-pair mode.

pair_idx

2-row integer matrix of pair indices (batch mode only).

cluster_ids

Factor of cluster assignments.

Value

Numeric vector of ratio consistency values (batch) or scalar (single-pair).

Compute KNN-smoothed correlation (unified interface)

Description

Dispatches to batch mode when pair_dt is provided (matrix in x), or single-pair mode when x and y are vectors.

Usage

.compute_smoothed_cor(
  x,
  y = NULL,
  pair_dt = NULL,
  W,
  alpha = 0.3,
  method = "pearson"
)

Arguments

x

Expression matrix (genes x cells) for batch mode, or a 1-row matrix / numeric vector for single-pair mode.

y

NULL for batch mode, or a 1-row matrix / numeric vector.

pair_dt

data.table with gene1, gene2 columns (batch only).

W

KNN weight matrix.

alpha

Self-weight for smoothing.

method

Correlation method.

Value

Numeric vector (batch) or scalar (single-pair).

Compute co-location quotient (CLQ) for gene pairs in spatial data

Description

The co-location quotient measures whether cells expressing gene A are disproportionately located near cells expressing gene B, relative to a random spatial arrangement. CLQ > 1 indicates spatial co-location (the two expression patterns are spatially attracted); CLQ < 1 indicates spatial segregation; CLQ = 1 indicates random spatial mixing.

Usage

.compute_spatial_clq(
  coords,
  mat,
  pair_dt,
  k = 6,
  expr_threshold = 0,
  verbose = TRUE
)

Arguments

coords

Data.frame with x, y columns, rownames = cell barcodes.

mat

Expression matrix (genes x cells).

pair_dt

data.table with columns gene1, gene2.

k

Integer; neighbourhood size.

expr_threshold

Numeric; threshold above which a gene is considered "expressed" in a cell.

verbose

Logical.

Details

For each cell i expressing gene A, we count how many of its k nearest neighbours express gene B, and compare to the global proportion of cells expressing gene B. The CLQ is the ratio of observed-to-expected proportions:

CLQ_{A \to B} = \frac{1}{N_A} \sum_{i \in A} \frac{n_{iB}/k}{N_B / N}

where N_A and N_B are numbers of cells expressing A and B, n_{iB} is the number of B-expressing neighbours of cell i, k is the neighbourhood size, and N is the total number of cells.

The symmetric CLQ is computed as the geometric mean of CLQ_{A \to B} and CLQ_{B \to A}.

Performance (v0.1.1): The neighbour expression counts are computed via matrix multiplication.

Value

The input pair_dt with added column spatial_clq.

References

Leslie, T.F. & Kronenfeld, B.J. (2011). The colocation quotient: a new measure of spatial association between categorical subsets of points. Geographical Analysis, 43(3), 306-326.

Compute bivariate spatial autocorrelation (Lee's L) for gene pairs

Description

Lee's L statistic (Lee, 2001) generalises Moran's I to the bivariate case, measuring spatial co-variation of two variables simultaneously. A positive L means that nearby locations tend to have similar joint expression patterns; a negative L indicates spatial segregation.

Usage

.compute_spatial_lee(coords, mat, pair_dt, k = 6, n_perm = 199, verbose = TRUE)

Arguments

coords

Data.frame with columns x, y and rownames = cell barcodes.

mat

Expression matrix (genes x cells).

pair_dt

data.table with columns gene1, gene2.

k

Integer; number of spatial nearest neighbours (default 6 for Visium hexagonal grids; 15 for general ST).

n_perm

Integer; number of permutations for empirical p-values. Set to 0 to skip.

verbose

Logical.

Details

For efficiency, a k-nearest-neighbour spatial weights matrix is used rather than a full distance matrix.

Performance (v0.1.1): The spatial lag is computed via sparse matrix multiplication (W %*% t(mat)) for all genes at once, replacing the previous per-pair R-level loop. Lee's L for all pairs is then computed via vectorised column inner products. Permutation testing is similarly vectorised: the entire gene expression matrix is permuted and all pairs evaluated per permutation.

Lee's L statistic is defined as:

L(x,y) = \frac{n}{S_0} \cdot \frac{(Wx)^\top (Wy)}{(\sum x_i^2)^{1/2} (\sum y_i^2)^{1/2}}

where W is a row-standardised spatial weight matrix and S_0 = n (under row-standardisation). The statistic ranges from -1 to 1.

Value

The input pair_dt with added columns spatial_lee_L and optionally spatial_lee_p.

References

Lee, S.-I. (2001). Developing a bivariate spatial association measure: An integration of Pearson's r and Moran's I. Journal of Geographical Systems, 3(4), 369-385.

Cross-cell-type interaction score for a single gene pair

Description

Measures whether gene A expressed in one cell type correlates with gene B expressed in a different cell type sharing the same tissue micro-environment.

Usage

.cross_celltype(
  x,
  y,
  cluster_ids,
  embed,
  n_bins = 50,
  min_cells_per_bin = 5,
  min_bins = 8,
  min_pct_expressed = 0.01
)

Arguments

x

Numeric vector; expression of gene1 (length n_cells).

y

Numeric vector; expression of gene2 (length n_cells).

cluster_ids

Factor of cluster / cell-type assignments.

embed

Numeric matrix of PCA embedding (n_cells x n_dims).

n_bins

Integer; number of micro-environment bins. Default 50.

min_cells_per_bin

Integer; minimum cells of a type per bin. Default 5.

min_bins

Integer; minimum valid bins per type pair. Default 8.

min_pct_expressed

Numeric; minimum percentage of cells (0-1) in a cell type that must express a gene for that type to be considered. Default 0.01 (1%). Prevents spurious correlations with very sparse genes.

Details

Unlike the KNN-graph-based approach, this method does not require cells of different types to be neighbours in PCA/UMAP space. Instead, it partitions cells into micro-environment bins (using k-means on the embedding) and computes pseudo-bulk correlations across bins.

Value

A list with components:

score: Geometric mean of |agg_r(A->B)| and |agg_r(B->A)|.
r_ab: Weighted mean r for gene1-in-source, gene2-in-neighbour.
r_ba: Weighted mean r for gene2-in-source, gene1-in-neighbour.
n_type_pairs: Number of cell-type pairs contributing.
per_celltype_pair: data.frame with per-type-pair breakdown.

Compute cross-cell-type interaction scores for gene pairs

Description

For each gene pair (A, B), this metric measures whether expression of gene A in cells of one type correlates with expression of gene B in cells of a different type that share the same tissue micro-environment.

Usage

.cross_celltype_batch(
  mat,
  pair_dt,
  cluster_ids,
  embed,
  n_bins = 50,
  min_cells_per_bin = 5,
  min_bins = 8,
  min_pct_expressed = 0.01
)

Arguments

mat

Expression matrix (genes x cells, dense or sparse).

pair_dt

data.table with columns gene1, gene2.

cluster_ids

Factor of cell-type / cluster assignments.

embed

Numeric matrix of PCA embedding (n_cells x n_dims).

n_bins

Integer; number of micro-environment bins. Default 50.

min_cells_per_bin

Integer; minimum cells of a given type in a bin for that bin to contribute to the correlation. Default 5.

min_bins

Integer; minimum bins with both types present to compute a correlation for a given cell-type pair. Default 8.

min_pct_expressed

Numeric; minimum percentage of cells (0-1) in a cell type that must express a gene for that type to be considered. Default 0.01 (1%).

Value

Numeric vector of cross-cell-type interaction scores (one per row of pair_dt).

Algorithm

Partition all cells into micro-environment bins using k-means on the PCA embedding. Each bin represents a local tissue context containing cells of multiple types.
For each cell-type pair (type_i, type_j, i != j), compute pseudo-bulk expression per bin: mean(gene_A) in type_i cells of that bin and mean(gene_B) in type_j cells of that bin.
Correlate the paired pseudo-bulk vectors across bins (Pearson r). This gives the directed score A -> B.
Repeat for B -> A.
Aggregate across all cell-type pairs using a weighted mean (weighted by number of bins with sufficient cells of both types).
Final score = geometric mean of |aggregated r(A->B)| and |aggregated r(B->A)|.

Format subtitle with global cross-cell-type stats

Description

Format subtitle with global cross-cell-type stats

Usage

.cross_subtitle(score, r_ab, r_ba)

Extract pairwise values from a symmetric matrix given a pair table

Description

Extract pairwise values from a symmetric matrix given a pair table

Usage

.extract_pair_vals(sym_mat, pair_dt)

Extract pairs data.frame from any scPairs result type

Description

All three result classes (scPairs_result, scPairs_gene_result, scPairs_pair_result) and plain data.frames are supported.

Usage

.extract_pairs_df(result, require_cols = c("gene1", "gene2", "synergy_score"))

Arguments

result

An scPairs result object or data.frame.

require_cols

Character vector of required columns.

Value

A data.frame with the requested columns.

Extract normalised expression matrix (genes x cells, dense or sparse)

Description

For speed-critical operations the matrix is kept sparse (dgCMatrix) when possible. Only the data slot (log-normalised) is used by default.

Usage

.get_expression_matrix(object, features = NULL, assay = NULL, slot = "data")

Arguments

object

Seurat object.

features

Character vector of gene names. NULL = all genes.

assay

Character; assay name. NULL = default assay.

slot

Attempts GetTissueCoordinates() first; falls back to common meta.data column patterns.

Usage

.get_spatial_coords(object)

Arguments

object

Seurat object.

Value

A data.frame with columns x and y, rownames = cell barcodes.

Detect whether a Seurat object contains spatial information

Description

Detect whether a Seurat object contains spatial information

Usage

.has_spatial(object)

Arguments

object

A Seurat object.

Value

Logical scalar.

Integrate multi-evidence scores into a composite synergy score

Description

Combines co-expression metrics (correlation, mutual information, ratio consistency) and spatial metrics (Lee's L, CLQ) into a single synergy score. Scores are rank-normalised to [0, 1] before weighted combination so that different metric scales do not dominate.

Usage

.integrate_scores(
  pair_dt,
  weights = NULL,
  n_perm = 0,
  mat = NULL,
  cluster_ids = NULL,
  coords = NULL,
  spatial_k = 6,
  verbose = TRUE
)

Arguments

pair_dt

data.table from .compute_coexpression and optionally spatial functions, containing any subset of metric columns.

weights

Named numeric vector of metric weights. Names must match column names in pair_dt. Any missing metric is silently skipped. Default weights: * cor_pearson = 1, cor_spearman = 1, cor_biweight = 1.5 * mi_score = 1, ratio_consistency = 1.2 * spatial_lee_L = 1.5, spatial_clq = 1.2

n_perm

Integer; permutations for composite score p-value (0 to skip).

mat

Expression matrix (needed for permutation null).

cluster_ids

Cluster factor (for permutation null).

coords

Spatial coordinates (for spatial permutation null).

spatial_k

KNN k for spatial metrics in permutation null.

verbose

Logical.

Details

An empirical p-value is obtained by permuting cell labels.

Performance (v0.1.1): Permutation testing now reuses the vectorised co-expression and spatial pipelines. P-values are computed via a single vectorised comparison instead of per-pair vapply. The null score matrix uses colSums(null >= observed) for batch comparison.

Value

pair_dt with added columns:

synergy_score – composite score in [0, 1].
p_value – permutation-based p-value (if n_perm > 0).
p_adj – BH-adjusted p-value.
rank – rank by synergy score (1 = strongest).
confidence – categorical label: "High" (p_adj < 0.01), "Medium" (< 0.05), "Low" (< 0.1), "NS" otherwise.

Print a progress message (respects verbose flag)

Description

Print a progress message (respects verbose flag)

Usage

.msg(..., verbose = TRUE)

Arguments

...

Parts of the message (passed to paste0).

verbose

Logical.

Plug-in mutual information estimator with equal-frequency binning

Description

Single-pair version retained for AssessGenePair().

Usage

.mutual_info(x, y, n_bins = 5)

Arguments

x, y

Numeric vectors.

n_bins

Number of bins.

Value

Scalar MI in nats.

Batch mutual information computation for all gene pairs

Description

Vectorised implementation that pre-computes bin assignments for all genes once, then uses fast table lookups for each pair.

Usage

.mutual_info_batch(mat, pair_idx, n_bins = 5)

Arguments

mat

Dense numeric matrix, genes in rows, cells in columns.

pair_idx

2-row integer matrix of pair indices (from combn).

n_bins

Number of bins for discretisation.

Value

Numeric vector of MI values for each pair.

Single-pair neighbourhood co-expression score

Description

Single-pair neighbourhood co-expression score

Usage

.neighbourhood_coexpr(x, y, W, k = NULL)

Neighbourhood co-expression score for gene pairs

Description

For each cell expressing gene A, we compute the fraction of its k neighbours that express gene B (and vice versa). The neighbourhood co-expression score is the geometric mean of the two directional enrichments:

Usage

.neighbourhood_coexpr_batch(mat, pair_dt, W, expr_threshold = 0)

Arguments

mat

Expression matrix (genes x cells, dense).

pair_dt

data.table with gene1, gene2.

W

KNN weight matrix (row-standardised).

expr_threshold

Numeric; expression above this is "expressed".

Details

NCS_{A \to B} = \frac{\text{mean}(\text{neigh\_B\_frac for A-cells})} {\text{global\_frac\_B}}

NCS = \sqrt{NCS_{A \to B} \cdot NCS_{B \to A}}

A score > 1 indicates that expressing cells of one gene tend to have neighbours expressing the other gene more than expected by chance.

Value

Numeric vector of NCS values.

Single-pair neighbourhood synergy score

Description

Single-pair neighbourhood synergy score

Usage

.neighbourhood_synergy(x, y, W)

Compute neighbourhood synergy score for gene pairs

Description

Unlike the existing neighbourhood co-expression score (which measures whether A- and B-expressing cells tend to share neighbours), this metric explicitly quantifies directional neighbourhood enrichment:

Usage

.neighbourhood_synergy_batch(mat, pair_dt, W)

Arguments

mat

Expression matrix (genes x cells).

pair_dt

data.table with gene1, gene2.

W

KNN weight matrix.

Details

For cells highly expressing gene A (top quartile), is gene B's expression in their neighbourhood significantly higher than expected? The score is:

NS = \frac{\text{mean neigh expr B for top-A cells}} {\text{mean neigh expr B for all cells}}

This captures paracrine-like interactions where one gene's product in one cell influences the expression of another gene in nearby cells.

Value

Numeric vector of neighbourhood synergy scores.

Enhanced bridge gene network (internal)

Description

Uses MDS on the full Jaccard-distance matrix (focal pair + all bridge genes) to derive angular positions that reflect inter-gene pathway similarity. Radial distance is then overridden by each bridge gene's total shared term count with the focal pair (more shared => closer to centre). This gives a layout where functionally similar bridge genes cluster angularly, and the most important bridging intermediaries sit in the inner ring.

Usage

.plot_bridge_network_enhanced(
  gene1,
  gene2,
  bridge_genes,
  shared_terms,
  mat,
  prior_net,
  top_n = 15,
  layout = "auto",
  label_size = 3,
  pt_size_range = c(3, 9),
  edge_width_range = c(0.4, 2),
  title = NULL,
  sim_threshold = 0.05
)

Arguments

gene1, gene2

Focal gene pair.

bridge_genes

Character vector of bridge genes.

shared_terms

Character vector of shared GO/KEGG terms.

mat

Expression matrix (genes x cells).

prior_net

Prior network object.

top_n

Maximum bridges to display.

layout

Ignored (kept for API compatibility); layout is MDS-radial.

label_size

Numeric; gene label font size.

pt_size_range

Numeric vector of length 2; node size range for bridge genes.

edge_width_range

Numeric vector of length 2; edge width range for focal edges.

title

Character; plot title.

sim_threshold

Numeric (0 to 1); minimum Jaccard similarity between bridge genes to draw a dotted similarity edge. Default 0.05.

Value

A ggplot object.

Prepare expression data for plotting a gene pair

Description

Extracts expression, converts to dense, aligns with embeddings.

Usage

.prepare_expression_data(object, gene1, gene2, assay = NULL, slot = "data")

Arguments

object

Seurat object.

gene1, gene2

Gene names.

assay

Assay name.

slot

Data slot.

Value

Dense matrix (2 x n_cells).

Compute prior interaction score for a single gene pair

Description

Compute prior interaction score for a single gene pair

Usage

.prior_score(gene1, gene2, prior_net)

Compute prior interaction score for gene pairs

Description

For each gene pair, computes the Jaccard similarity of their functional annotation sets (GO terms + KEGG pathways). A high Jaccard indicates that the two genes participate in many of the same biological processes – evidence for functional relatedness beyond mere co-expression.

Usage

.prior_score_batch(pair_dt, prior_net)

Arguments

pair_dt

data.table with gene1, gene2 columns.

prior_net

Prior network from .build_prior_network().

Value

Numeric vector of prior scores in [0, 1].

Expression ratio consistency across clusters

Description

Single-pair version retained for AssessGenePair(). For each cluster, computes the median log2(expr_g1 + 1) - log2(expr_g2 + 1). Returns 1 - CoV of cluster medians, bounded to [0, 1].

Usage

.ratio_consistency(x, y, clusters)

Arguments

x, y

Numeric vectors (expression of gene 1, gene 2).

clusters

Factor of cluster assignments.

Value

Scalar in [0, 1].

Batch ratio consistency computation for all gene pairs

Description

Vectorised implementation that computes log-ratios and cluster medians using matrix operations instead of per-pair tapply calls.

Usage

.ratio_consistency_batch(mat, pair_idx, cluster_ids)

Arguments

mat

Dense numeric matrix, genes in rows, cells in columns.

pair_idx

2-row integer matrix of pair indices.

cluster_ids

Factor of cluster assignments.

Value

Numeric vector of ratio consistency values.

Resolve cluster IDs from Seurat object

Description

Resolve cluster IDs from Seurat object

Usage

.resolve_cluster_ids(object, cluster_col = NULL)

Fast row-wise variance for sparse matrices

Description

Uses Matrix::rowMeans and Matrix::rowSums directly on sparse matrices to avoid unnecessary densification.

Usage

.row_vars(x)

Arguments

x

Sparse or dense matrix (genes x cells).

Value

Named numeric vector of variances.

Select highly-variable genes or top-expressed genes for analysis

Description

When the number of genes is very large we pre-filter to a tractable set. Priority order: user-supplied genes > Seurat VariableFeatures > top genes by mean expression.

Usage

.select_features(object, features = NULL, n_top = 2000, assay = NULL)

Arguments

object

Seurat object.

features

Character vector of gene names; NULL for auto-selection.

n_top

Integer; maximum number of genes to select.

assay

Character; assay name.

Value

Character vector of gene names.

Smooth expression vectors using a KNN weight matrix

Description

For each cell, the smoothed expression is a weighted average of its neighbours' expression values:

\tilde{x}_i = \alpha \cdot x_i + (1 - \alpha) \cdot \sum_{j \in N(i)} w_{ij} x_j

Usage

.smooth_expression(mat, W, alpha = 0.3)

Arguments

mat

Expression matrix (genes x cells).

W

Sparse weight matrix (n_cells x n_cells), row-standardised.

alpha

Numeric in [0, 1]; self-weight. Default 0.3.

Value

Smoothed expression matrix (same dimensions).

Compute smoothed correlation for a single gene pair

Description

Compute smoothed correlation for a single gene pair

Usage

.smoothed_cor(x, y, W, alpha = 0.3)

Compute correlation on KNN-smoothed expression for all gene pairs

Description

Compute correlation on KNN-smoothed expression for all gene pairs

Usage

.smoothed_cor_batch(mat, pair_dt, W, alpha = 0.3, method = "pearson")

Arguments

mat

Expression matrix (genes x cells).

pair_dt

data.table with gene1, gene2 columns.

W

KNN weight matrix.

alpha

Self-weight for smoothing.

method

Checks that gene names exist and reduction is available.

Usage

.validate_plot_inputs(
  object,
  gene1,
  gene2,
  assay = NULL,
  slot = "data",
  reduction = NULL
)

Arguments

object

Seurat object.

gene1, gene2

Gene names.

assay

Assay name (NULL = default).

slot

Data slot.

reduction

Reduction name (NULL to skip check).

Value

A list with validated assay.

Validate that input is a Seurat object

Description

Validate that input is a Seurat object

Usage

.validate_seurat(object, require_spatial = FALSE)

Arguments

object

Object to validate.

require_spatial

Logical; if TRUE, require spatial coordinates.

Value

TRUE invisibly; stops with informative error otherwise.

Metrics that should be absolute-valued before rank normalisation

Description

Metrics that should be absolute-valued before rank normalisation

Usage

ABS_METRICS

Format

An object of class character of length 6.

Assess the Synergy of a Specific Gene Pair

Description

Given two genes, AssessGenePair performs an in-depth evaluation of their co-regulatory relationship. In addition to the standard multi-metric scoring, it computes:

Per-cluster co-expression – correlation within each cell cluster.
Expression distribution overlap – Jaccard index of expressing cells.
Permutation-based significance – 999 permutations by default.

Usage

AssessGenePair(
  object,
  gene1,
  gene2,
  assay = NULL,
  slot = "data",
  cluster_col = NULL,
  mode = c("all", "expression", "prior_only"),
  use_prior = TRUE,
  organism = "mouse",
  custom_pairs = NULL,
  use_neighbourhood = TRUE,
  neighbourhood_k = 20,
  neighbourhood_reduction = "pca",
  smooth_alpha = 0.3,
  use_spatial = TRUE,
  spatial_k = 6,
  n_perm = 999,
  verbose = TRUE
)

Arguments

object

A Seurat object.

gene1

Character; first gene.

gene2

Character; second gene.

assay

Character; assay name.

slot

Character; data slot.

cluster_col

Character; cluster column.

mode

Character; "all", "expression", or "prior_only".

use_prior

Logical; prior knowledge scores.

organism

Character; "mouse" or "human".

custom_pairs

Optional data.frame of custom interactions.

use_neighbourhood

Logical; neighbourhood metrics.

neighbourhood_k

Integer; KNN k.

neighbourhood_reduction

Character; reduction for KNN.

smooth_alpha

Numeric; self-weight for smoothing.

use_spatial

Logical.

spatial_k

Integer; spatial KNN k.

n_perm

Integer; permutations (default 999).

verbose

Logical.

Details

The mode parameter controls which layers are scored:

"all" (default) – full multi-evidence assessment.
"expression" – expression and neighbourhood metrics only.
"prior_only" – prior knowledge scores only.

Value

A list with class "scPairs_pair_result":

gene1, gene2: The query genes.
pairs: Single-row data.table with all metric columns and synergy_score, rank, confidence (same format as FindAllPairs output for unified downstream processing).
metrics: Named list of all computed metrics.
per_cluster: data.frame of per-cluster correlations.
synergy_score: Composite score.
p_value: Permutation p-value.
confidence: Categorical confidence label.
jaccard_index: Expression overlap Jaccard index.
has_spatial: Logical.
n_cells: Integer.
mode: Character.

Examples


# Assess the injected co-expressed pair GENE3 & GENE4.
result <- AssessGenePair(scpairs_testdata,
                         gene1   = "GENE3",
                         gene2   = "GENE4",
                         mode    = "expression",
                         verbose = FALSE)
print(result)

Confidence classification thresholds (based on adjusted p-values)

Description

Confidence classification thresholds (based on adjusted p-values)

Usage

CONFIDENCE_THRESHOLDS

Format

An object of class list of length 3.

Default analysis parameters

Description

Default analysis parameters

Usage

DEFAULT_PARAMS

Format

An object of class list of length 13.

Default metric weights for score integration

Description

Default metric weights for score integration

Usage

DEFAULT_WEIGHTS

Format

An object of class numeric of length 14.

Discover All Synergistic Gene Pairs

Description

The primary discovery function of scPairs. Given a Seurat object, FindAllPairs identifies synergistic gene pairs by integrating multiple lines of evidence: co-expression, neighbourhood smoothing, prior biological knowledge, and spatial co-variation.

Usage

FindAllPairs(
  object,
  features = NULL,
  n_top_genes = 2000,
  assay = NULL,
  slot = "data",
  cluster_col = NULL,
  mode = c("all", "expression", "prior_only"),
  cor_method = c("pearson", "spearman", "biweight"),
  n_mi_bins = 5,
  min_cells_expressed = 10,
  use_prior = TRUE,
  organism = "mouse",
  custom_pairs = NULL,
  use_neighbourhood = TRUE,
  neighbourhood_k = 20,
  neighbourhood_reduction = "pca",
  smooth_alpha = 0.3,
  use_spatial = TRUE,
  spatial_k = 6,
  n_perm = 0,
  weights = NULL,
  top_n = NULL,
  verbose = TRUE
)

Arguments

object

A Seurat object (scRNA-seq or spatial).

features

Character vector of gene names to consider. NULL (default) uses Seurat VariableFeatures; if unavailable, selects the top n_top_genes by mean expression.

n_top_genes

Integer; maximum number of genes to analyse when features = NULL. Default 2000.

assay

Character; assay to use. Default: DefaultAssay(object).

slot

Character; data slot. Default "data" (log-normalised).

cluster_col

Character; column in meta.data with cluster IDs. NULL = use Idents(object).

mode

Character; "all", "expression", or "prior_only".

cor_method

Character vector; correlation methods to compute. Default c("pearson", "spearman", "biweight").

n_mi_bins

Integer; bins for mutual information. 0 = skip MI.

min_cells_expressed

Integer; minimum co-expressing cells to keep a pair. Default 10.

use_prior

Logical; integrate prior knowledge (GO/KEGG). Default TRUE.

organism

Character; "mouse" or "human".

custom_pairs

Optional data.frame with columns gene1, gene2.

use_neighbourhood

Logical; compute neighbourhood-aware metrics. Default TRUE.

neighbourhood_k

Integer; KNN k. Default 20.

neighbourhood_reduction

Character; reduction for KNN. Default "pca".

smooth_alpha

Numeric in [0,1]; self-weight for KNN smoothing.

use_spatial

Logical; compute spatial metrics when available.

spatial_k

Integer; spatial KNN k.

n_perm

Integer; permutations for p-values. 0 = skip.

weights

Named numeric; metric weights for score integration.

top_n

Integer or NULL; return only top n pairs.

verbose

Logical.

Details

Metrics are rank-normalised and combined via weighted summation. Optional permutation testing provides empirical p-values.

The mode parameter controls which metric layers are computed:

"all" (default) – compute all available metrics.
"expression" – expression and neighbourhood metrics only (no prior knowledge).
"prior_only" – prior knowledge scores only (fast).

Value

A list with class "scPairs_result" containing:

pairs: data.table of gene pairs with all metric columns, synergy_score, rank, p_value (if permutation), p_adj, confidence.
parameters: List of analysis parameters.
n_genes: Number of genes analysed.
n_cells: Number of cells.
has_spatial: Logical.
mode: Character; the mode used.

Examples

# scpairs_testdata is a built-in Seurat object with 100 cells x 20 genes.
# GENE3 & GENE4 are injected as the top co-expressed pair.
result <- FindAllPairs(scpairs_testdata,
                       n_top_genes = 20,
                       top_n       = 10,
                       mode        = "expression",
                       verbose     = FALSE)
print(result)

Find Synergistic Partners for a Given Gene

Description

Given a gene of interest, FindGenePairs identifies and ranks all genes that act synergistically with it. Uses the same multi-evidence framework as FindAllPairs() but focuses computation on pairs involving the query gene, making it much faster for targeted queries.

Usage

FindGenePairs(
  object,
  gene,
  candidates = NULL,
  n_top_genes = 2000,
  assay = NULL,
  slot = "data",
  cluster_col = NULL,
  mode = c("all", "expression", "prior_only"),
  cor_method = c("pearson", "spearman", "biweight"),
  n_mi_bins = 5,
  min_cells_expressed = 10,
  use_prior = TRUE,
  organism = "mouse",
  custom_pairs = NULL,
  use_neighbourhood = TRUE,
  neighbourhood_k = 20,
  neighbourhood_reduction = "pca",
  smooth_alpha = 0.3,
  use_spatial = TRUE,
  spatial_k = 6,
  n_perm = 0,
  weights = NULL,
  top_n = NULL,
  verbose = TRUE
)

Arguments

object

A Seurat object.

gene

Character; the query gene name.

candidates

Character vector of candidate partner genes. NULL = auto-select.

n_top_genes

Integer; max candidates when candidates = NULL.

assay

Character; assay name.

slot

Character; data slot.

cluster_col

Character; cluster column in meta.data.

mode

Character; "all", "expression", or "prior_only".

cor_method

Correlation methods.

n_mi_bins

Bins for mutual information.

min_cells_expressed

Minimum cells co-expressing both genes.

use_prior

Logical; compute prior knowledge metrics.

organism

Character; "mouse" or "human".

custom_pairs

Optional data.frame of custom interactions.

use_neighbourhood

Logical; compute neighbourhood metrics.

neighbourhood_k

Integer; KNN k.

neighbourhood_reduction

Character; reduction for KNN.

smooth_alpha

Numeric; self-weight for smoothing.

use_spatial

Logical; compute spatial metrics.

spatial_k

Integer; spatial neighbourhood k.

n_perm

Integer; permutations for p-values.

weights

Named numeric; metric weights.

top_n

Integer; return only top partners.

verbose

Logical.

Details

The mode parameter controls which metric layers are computed:

"all" (default) – all available metrics.
"expression" – expression and neighbourhood only.
"prior_only" – prior knowledge scores only.

Value

A list with class "scPairs_gene_result":

query_gene: The input gene.
pairs: data.table of partners ranked by synergy score.
parameters: Analysis parameters.
n_candidates: Number of candidates tested.
n_cells: Number of cells.
has_spatial: Logical.
mode: Character.

Examples

# Find synergistic partners of GENE3.  GENE4 is expected to rank first.
result <- FindGenePairs(scpairs_testdata,
                        gene    = "GENE3",
                        top_n   = 10,
                        mode    = "expression",
                        verbose = FALSE)
print(result)

Plot Bridge Gene Network

Description

Draws a publication-ready radial bridge gene network showing the prior-knowledge connections between a focal gene pair via shared GO/KEGG pathway intermediaries (bridge genes). Focal genes are placed at the centre; bridge genes are arranged on a ring whose radius is inversely proportional to their shared pathway count with the focal pair (more shared terms => closer to centre, reflecting stronger biological relevance). Solid edges connect focal genes to bridge genes: red edges originate from gene1, blue edges from gene2. Edge width encodes shared term count. Pairwise Jaccard similarity between bridge genes is overlaid as thin dotted lines, whose opacity reflects similarity strength, revealing functional clusters among the intermediaries.

Usage

PlotBridgeNetwork(
  object,
  gene1,
  gene2,
  organism = "mouse",
  prior_net = NULL,
  top_bridges = 15,
  layout = "auto",
  assay = NULL,
  slot = "data",
  label_size = 3,
  pt_size_range = c(3, 9),
  edge_width_range = c(0.4, 2),
  sim_threshold = 0.05,
  title = NULL
)

Arguments

object

A Seurat object.

gene1

Character; first focal gene.

gene2

Character; second focal gene.

organism

Character; "mouse" or "human". Used when prior_net is NULL.

prior_net

Optional prior network object from .build_prior_network(). If NULL, built automatically using organism.

top_bridges

Integer; maximum number of bridge genes to display. Default 15.

layout

Ignored; layout is always radial (kept for API compatibility).

assay

Character; assay name. NULL uses the default assay.

slot

Character; data layer/slot. Default "data".

label_size

Numeric; gene label font size. Default 3.

pt_size_range

Numeric vector of length 2; minimum and maximum node sizes for bridge genes. Default c(3, 9).

edge_width_range

Numeric vector of length 2; minimum and maximum edge widths for focal-to-bridge connections, scaled by shared term count. Default c(0.4, 2).

sim_threshold

Numeric (0 to 1); minimum Jaccard similarity between two bridge genes required to draw a dotted similarity edge. Default 0.05.

title

Character; plot title. NULL generates a default title.

Value

A ggplot object. Focal genes appear as large red nodes at the centre. Bridge genes are arranged radially, sized by node degree and coloured by mean expression. Solid coloured edges (red = gene1, blue = gene2) connect focal genes to bridge genes, with width proportional to shared term count. Thin dotted grey lines between bridge genes encode Jaccard pathway similarity.

Examples

## Not run: 
# Requires Bioconductor annotation packages (org.Hs.eg.db or org.Mm.eg.db)
PlotBridgeNetwork(seurat_obj, gene1 = "Adora2a", gene2 = "Ido1",
                  organism = "mouse")

## End(Not run)

Plot Cross-Cell-Type Interaction Heatmap

Description

Visualises the cross-cell-type interaction structure for a gene pair as a heatmap. Each tile represents a directed cell-type pair (source type \to neighbour type), coloured by the Pearson correlation between gene A expression in the source cells and gene B expression in the neighbouring cells.

This is the primary visualisation for the trans-cellular synergy metric introduced in scPairs 0.1.3. It reveals which cell-type interfaces carry the cross-type signal (e.g. Adora2a in T-cells correlated with Ido1 in dendritic cells).

Usage

PlotPairCrossType(
  object,
  gene1,
  gene2,
  result = NULL,
  assay = NULL,
  slot = "data",
  cluster_col = NULL,
  neighbourhood_k = 20,
  neighbourhood_reduction = "pca",
  min_cross_pairs = 30,
  min_pct_expressed = 0.01,
  show_n = TRUE,
  show_reverse = TRUE,
  diverging = TRUE,
  title = NULL
)

Arguments

object

A Seurat object.

gene1

Character; first gene.

gene2

Character; second gene.

result

Optional scPairs_pair_result from AssessGenePair. If NULL, the pair is assessed internally.

assay

Character; assay name. Default: DefaultAssay(object).

slot

Character; data slot. Default "data".

cluster_col

Character; meta.data column with cell-type labels. NULL = use Idents(object).

neighbourhood_k

Integer; k for KNN graph. Default 20.

neighbourhood_reduction

Character; reduction for KNN graph. Default "pca".

min_cross_pairs

Integer; minimum cross-type pairs per tile. Tiles with fewer pairs are greyed out. Default 30.

min_pct_expressed

Numeric; minimum percentage of cells (0-1) in a cell type that must express a gene. Default 0.01 (1%). Prevents spurious correlations with very sparse genes.

show_n

Logical; annotate each tile with the number of cross-type neighbour pairs. Default TRUE.

show_reverse

Logical; if TRUE (default), show a second panel for the reverse direction (gene2 in source \to gene1 in neighbour).

diverging

Logical; use a diverging red–white–blue colour scale centred at 0. Default TRUE.

title

Character; overall title. NULL = auto-generated.

Value

A ggplot heatmap (or two-panel patchwork when show_reverse = TRUE) with cell-type pairs on axes and synergy enrichment encoded by colour.

Examples

# scpairs_testdata has clusters (seurat_clusters) and PCA already built in.
PlotPairCrossType(scpairs_testdata,
                 gene1 = "GENE3",
                 gene2 = "GENE4")

Plot Gene Pair Co-Expression on UMAP / Dimensionality Reduction

Description

Displays the co-expression of two genes on the UMAP (or other reduction) embedding. Three panels show: gene 1 expression, gene 2 expression, and their element-wise product (co-expression intensity). This allows visual assessment of whether co-expressing cells cluster together.

Usage

PlotPairDimplot(
  object,
  gene1,
  gene2,
  reduction = "umap",
  assay = NULL,
  slot = "data",
  pt_size = 0.5,
  alpha = 0.8,
  title = NULL
)

Arguments

object

A Seurat object with a dimensionality reduction.

gene1

Character; first gene.

gene2

Character; second gene.

reduction

Character; reduction to use. Default "umap".

assay

Character; assay.

slot

Character; data slot.

pt_size

Numeric; point size.

alpha

Numeric; point alpha.

title

Character; overall title.

Value

A combined ggplot (patchwork) of three panels: individual gene expression and their co-expression product, all overlaid on the dimensionality-reduction embedding.

Examples

# scpairs_testdata has a real UMAP embedding; GENE3 & GENE4 are co-expressed.
PlotPairDimplot(scpairs_testdata, gene1 = "GENE3", gene2 = "GENE4")

Plot Synergy Score Heatmap

Description

Displays a symmetric heatmap of synergy scores among a set of genes. Useful for visualising the overall co-expression landscape of top synergistic genes or genes of interest.

Usage

PlotPairHeatmap(
  result,
  top_n = 30,
  genes = NULL,
  cluster_genes = TRUE,
  low_color = "#F7FBFF",
  high_color = "#08306B",
  title = "Gene pair synergy heatmap"
)

Arguments

result

An scPairs_result or scPairs_gene_result object, or a data.frame with gene1, gene2, synergy_score.

top_n

Integer; include the top N genes by number of significant partnerships. Default 30.

genes

Character vector; specific genes to include. NULL = auto.

cluster_genes

Logical; cluster rows/columns by score similarity. Default TRUE.

low_color

Character; colour for low scores.

high_color

Character; colour for high scores.

title

Character; plot title.

Value

A ggplot object; rows and columns are genes, fill encodes synergy score.

Examples

result <- FindAllPairs(scpairs_testdata,
                       n_top_genes = 20,
                       top_n       = 15,
                       mode        = "expression",
                       verbose     = FALSE)
PlotPairHeatmap(result, top_n = 10)

Plot Gene Interaction Network

Description

Draws a publication-ready gene interaction network from scPairs results. Nodes represent genes; edges represent synergistic relationships. Edge width encodes synergy score; edge colour encodes confidence. Node size optionally reflects the number of significant partners (degree centrality).

Usage

PlotPairNetwork(
  result,
  top_n = 50,
  min_score = 0,
  confidence = NULL,
  layout = "fr",
  node_color = "#2C3E50",
  edge_palette = c(High = "#E74C3C", Medium = "#F39C12", Low = "#95A5A6", NS = "#D5D8DC"),
  label_size = 3.5,
  title = NULL,
  show_legend = TRUE
)

Arguments

result

An object of class "scPairs_result", "scPairs_gene_result", or a data.frame / data.table with columns gene1, gene2, synergy_score.

top_n

Integer; show only the top N edges. Default 50.

min_score

Numeric; minimum synergy score to display an edge.

confidence

Character vector; filter to these confidence levels (e.g., c("High", "Medium")). NULL = no filter.

layout

Character; ggraph layout algorithm. Default "fr" (Fruchterman-Reingold).

node_color

Character; colour for nodes. Default "#2C3E50".

edge_palette

Character vector of 3 colours for confidence (High, Medium, Low). Default blue-orange-grey scheme.

label_size

Numeric; node label font size.

title

Character; plot title.

show_legend

Logical.

Value

A ggplot object; nodes are genes, edges are gene pairs coloured and weighted by synergy score.

Examples

result <- FindAllPairs(scpairs_testdata,
                       n_top_genes = 20,
                       top_n       = 10,
                       mode        = "expression",
                       verbose     = FALSE)
PlotPairNetwork(result, top_n = 8)

Scatter Plot of Two Genes (Cell-Level)

Description

Plots cell-level expression of gene1 vs. gene2 as a scatter plot, coloured by cluster identity. Marginal density curves (optional) help reveal cluster-specific co-expression patterns.

Usage

PlotPairScatter(
  object,
  gene1,
  gene2,
  group_by = NULL,
  assay = NULL,
  slot = "data",
  pt_size = 0.5,
  alpha = 0.6,
  add_density = FALSE,
  title = NULL
)

Arguments

object

Seurat object.

gene1

Character; x-axis gene.

gene2

Character; y-axis gene.

group_by

Character; colour cells by this meta.data column.

assay

Character; assay.

slot

Character; data slot.

pt_size

Numeric.

alpha

Numeric.

add_density

Logical; add marginal density. Requires ggExtra package.

title

Character.

Value

A ggplot scatter plot; cells are coloured by cluster/group, with optional marginal density panels when ggExtra is installed.

Examples

PlotPairScatter(scpairs_testdata, "GENE3", "GENE4",
               group_by = "seurat_clusters")

Enhanced Co-Expression Visualization with Neighbourhood Smoothing

Description

A six-panel visualization that shows both raw and KNN-smoothed expression for a gene pair on the UMAP (or other reduction) embedding. The top row shows raw expression (gene1, gene2, product); the bottom row shows KNN-smoothed expression. This is particularly informative for gene pairs that are not co-expressed in the same cell but share neighbourhood-level co-expression patterns.

Usage

PlotPairSmoothed(
  object,
  gene1,
  gene2,
  reduction = "umap",
  smooth_reduction = "pca",
  k = 20,
  alpha = 0.3,
  assay = NULL,
  slot = "data",
  pt_size = 0.3,
  pt_alpha = 0.8,
  title = NULL
)

Arguments

object

A Seurat object with a dimensionality reduction.

gene1

Character; first gene.

gene2

Character; second gene.

reduction

Character; reduction for plotting. Default "umap".

smooth_reduction

Character; reduction for KNN graph. Default "pca".

k

Integer; neighbourhood size for smoothing. Default 20.

alpha

Numeric in [0,1]; self-weight for smoothing. Default 0.3.

assay

Character; assay.

slot

Character; data slot.

pt_size

Numeric; point size.

pt_alpha

Numeric; point alpha.

title

Character; overall title.

Value

A combined ggplot (patchwork) with 6 panels: three showing raw expression and three showing KNN-smoothed expression, for gene1, gene2, and their co-expression product.

Examples

# scpairs_testdata has PCA (smooth_reduction) and UMAP (reduction) ready.
PlotPairSmoothed(scpairs_testdata, gene1 = "GENE3", gene2 = "GENE4")

Plot Spatial Co-Expression Map

Description

For spatial transcriptomics data, visualises the spatial distribution of two genes and their co-expression product on the tissue. Three panels are shown side by side:

Expression of gene 1.
Expression of gene 2.
Co-expression product (gene1 * gene2), highlighting spots where both genes are simultaneously active.

Usage

PlotPairSpatial(
  object,
  gene1,
  gene2,
  assay = NULL,
  slot = "data",
  pt_size = 1.2,
  alpha = 0.8,
  title = NULL
)

Arguments

object

A Seurat object with spatial coordinates.

gene1

Character; first gene.

gene2

Character; second gene.

assay

Character; assay.

slot

Character; data slot.

pt_size

Numeric; point size.

alpha

Numeric; point alpha.

title

Character; overall title.

Value

A combined ggplot (patchwork) with three panels: spatial expression of gene1, spatial expression of gene2, and their co-expression product, overlaid on physical tissue coordinates.

Examples

## Not run: 
# Requires a Seurat object with spatial assay (e.g. Visium, MERFISH)
PlotPairSpatial(spatial_obj, gene1 = "CD8A", gene2 = "CD8B")

## End(Not run)

Comprehensive Synergy Summary Plot

Description

A multi-panel publication-ready figure combining:

Raw UMAP co-expression (3 panels)
KNN-smoothed UMAP (3 panels)
Per-cluster expression comparison
Metric radar/bar chart

Usage

PlotPairSummary(
  object,
  gene1,
  gene2,
  result = NULL,
  reduction = "umap",
  smooth_reduction = "pca",
  k = 20,
  alpha = 0.3,
  assay = NULL,
  slot = "data",
  pt_size = 0.3
)

Arguments

object

A Seurat object.

gene1

Character; first gene.

gene2

Character; second gene.

result

Optional scPairs_pair_result from AssessGenePair(). If NULL, assessment is run internally.

reduction

Character; reduction for plotting.

smooth_reduction

Character; reduction for KNN graph.

k

Integer; KNN k for smoothing.

alpha

Numeric; smoothing alpha.

assay

Character; assay.

slot

Character; data slot.

pt_size

Numeric; point size.

Value

A combined ggplot (patchwork) with up to 10 panels: raw UMAP co-expression (3 panels), KNN-smoothed UMAP (3 panels), per-cluster expression bar chart, and metric evidence bar chart.

Examples


PlotPairSummary(scpairs_testdata, gene1 = "GENE3", gene2 = "GENE4")

Visualize Synergistic Relationship Between Gene Pairs

Description

Publication-ready multi-panel visualization that integrates prior knowledge, expression evidence, and neighbourhood context to show the synergistic relationship between two genes. This goes beyond co-expression to reveal why two genes may be functionally synergistic.

Usage

PlotPairSynergy(
  object,
  gene1,
  gene2,
  prior_net = NULL,
  organism = "mouse",
  reduction = "umap",
  smooth_reduction = "pca",
  k = 20,
  alpha = 0.3,
  cluster_col = NULL,
  assay = NULL,
  slot = "data",
  top_bridges = 10,
  pt_size = 0.3
)

Arguments

object

A Seurat object.

gene1

Character; first gene.

gene2

Character; second gene.

prior_net

Optional prior network from .build_prior_network(). If NULL, built automatically.

organism

Character; "mouse" or "human". Used if prior_net is NULL.

reduction

Character; reduction for UMAP plotting.

smooth_reduction

Character; reduction for KNN.

k

Integer; KNN k.

alpha

Numeric; smoothing alpha.

cluster_col

Character; cluster column in meta.data.

assay

Character; assay.

slot

Character; data slot.

top_bridges

Integer; maximum bridge genes to show.

pt_size

Numeric; point size.

Value

A combined ggplot (patchwork) with up to 4 panels:

UMAP coloured by per-cell neighbourhood synergy score.
Bridge gene network showing shared GO/KEGG pathway intermediaries.
Per-cluster expression bar chart for both genes.
Multi-evidence metric comparison bar chart (expression + prior).

Falls back gracefully when prior knowledge is unavailable (panels 2 and 4 are omitted).

Examples

## Not run: 
# Requires Bioconductor annotation packages: org.Hs.eg.db or org.Mm.eg.db
# and AnnotationDbi.
PlotPairSynergy(scpairs_testdata, gene1 = "GENE3", gene2 = "GENE4",
                organism = "human")

## End(Not run)

Violin Plot of Pair Expression Across Clusters

Description

Displays side-by-side violin plots of two genes across cell clusters or groups, enabling visual assessment of whether their expression patterns are coordinated across populations.

Usage

PlotPairViolin(
  object,
  gene1,
  gene2,
  group_by = NULL,
  assay = NULL,
  slot = "data",
  pt_size = 0,
  title = NULL
)

Arguments

object

Seurat object.

gene1

Character; first gene.

gene2

Character; second gene.

group_by

Character; column in meta.data for grouping. NULL = Idents.

assay

Character; assay.

slot

Character; data slot.

pt_size

Point size for jitter (0 = no points).

title

Character.

Value

A ggplot with violin (and optional jitter) panels for gene1, gene2, and their expression product, split by group.

Examples

PlotPairViolin(scpairs_testdata, "GENE3", "GENE4",
              group_by = "seurat_clusters")

Metrics used as raw values (not absolute-valued)

Description

Metrics used as raw values (not absolute-valued)

Usage

RAW_METRICS

Format

An object of class character of length 8.

Standard column names for result data.tables

Description

Standard column names for result data.tables

Usage

RESULT_COLUMNS

Format

An object of class list of length 6.

Score-based confidence thresholds (when no p-values available)

Description

Score-based confidence thresholds (when no p-values available)

Usage

SCORE_CONFIDENCE_QUANTILES

Format

An object of class list of length 3.

Cross-cell-type interaction metrics

Description

Standard co-expression metrics measure whether two genes are expressed together in the same cell. However, many biologically important interactions operate across cell types – gene A expressed in one cell type signals to (or synergises with) gene B expressed in a neighbouring cell of a different type. Classic examples include Adora2a–Ido1 trans-cellular signalling and ligand–receptor pairs.

Details

This module provides a cross-cell-type interaction score that detects such trans-cellular synergies. The key insight is that "neighbouring" in this context means cells that could plausibly interact – not cells that are close in PCA/UMAP space (which groups cells of the same type together).

Algorithm: For each ordered pair of cell types (type_i, type_j where i != j):

Partition cells of each type into n_bins matched groups (bins) by splitting the shared PCA embedding into spatial micro-environments, so that cells in the same bin represent the same local tissue context.
Compute pseudo-bulk expression of gene A in type_i cells per bin, and pseudo-bulk expression of gene B in type_j cells per bin.
Correlate these paired pseudo-bulks across bins.

This captures whether tissue regions where type_i cells express high gene A tend to be regions where type_j cells express high gene B – exactly the signal expected from paracrine signalling or trans-cellular co-regulation.

Null-coalescing operator (if not already imported from Seurat)

Description

Null-coalescing operator (if not already imported from Seurat)

Usage

a %||% b

Arguments

a

First value.

b

Second value (returned if a is NULL).

Value

a if not NULL, otherwise b.

Compute neighbourhood-aware co-expression metrics

Description

Standard single-cell co-expression methods require two genes to be detected in the same cell. Due to the inherent sparsity of scRNA-seq (dropout, low capture efficiency) and the stochastic nature of transcription, many genuinely synergistic gene pairs are rarely or never co-detected in the same cell – especially when one or both genes are expressed at low levels.

Details

This module addresses the problem by leveraging neighbourhood information from the cell-cell similarity graph (typically derived from a UMAP / PCA embedding or spatial coordinates). Instead of asking "are these two genes detected in the same cell?", we ask:

KNN-smoothed correlation – After smoothing each gene's expression over its k nearest neighbours, do the smoothed profiles correlate?
Neighbourhood co-expression score – For cells expressing gene A, are their neighbours enriched for gene B expression (and vice versa)?
Cluster-level (pseudo-bulk) correlation – At the cluster level (aggregated expression), do the two genes correlate?

These three metrics form a complementary evidence layer that is particularly powerful for detecting synergistic pairs where cell-level co-expression is absent due to technical or biological sparsity.

Shared Plotting Utilities for scPairs

Description

Internal functions providing common validation, data preparation, and theming logic used across all plot_*.R files.

Print method for scPairs_gene_result

Description

Print method for scPairs_gene_result

Usage

## S3 method for class 'scPairs_gene_result'
print(x, ...)

Arguments

x

An scPairs_gene_result object.

...

Ignored.

Value

The input object x, returned invisibly.

Print method for scPairs_pair_result

Description

Print method for scPairs_pair_result

Usage

## S3 method for class 'scPairs_pair_result'
print(x, ...)

Arguments

x

An scPairs_pair_result object.

...

Ignored.

Value

The input object x, returned invisibly.

Print method for scPairs_result

Description

Print method for scPairs_result

Usage

## S3 method for class 'scPairs_result'
print(x, ...)

Arguments

x

An scPairs_result object.

...

Ignored.

Value

The input object x, returned invisibly.

Prior Knowledge Integration for Synergistic Gene Pair Discovery

Description

Integrates biological prior knowledge databases to shift the scoring from pure co-expression towards functional synergy. Three complementary evidence layers are provided:

Prior interaction score – direct gene-gene functional annotation overlap from GO, KEGG, and curated interaction databases.
Pathway bridge score – indirect synergy through shared intermediary genes that connect the pair in biological pathways AND are expressed in the current dataset.
Neighbourhood synergy score – directional enrichment of gene B in the biological neighbourhood of gene A-expressing cells, capturing paracrine/juxtacrine interactions even when the two genes are never co-detected in the same cell.

These scores are designed to up-weight gene pairs with biological plausibility for functional synergy, even when cell-level co-expression is weak or absent.

Centralized Schema Definitions for scPairs

Description

Single source of truth for column names, default parameters, metric weights, and classification thresholds used throughout the package.

Synthetic Seurat Test Object for scPairs Examples and Tests

Description

A minimal synthetic Seurat object with deliberately injected co-expression patterns, intended for use in package examples and unit tests. The object ships with normalised expression, variable-feature selection, scaled data, a 5-component PCA, and a 2-D UMAP embedding, so it can be passed directly to every scPairs discovery and visualisation function without any additional setup.

Usage

data(scpairs_testdata)

Format

A Seurat object with:

Assay RNA

counts: raw integer count matrix (20 genes x 100 cells).
data: log-normalised expression matrix.
scale.data: z-score-scaled matrix for all variable features.

Reductions

pca: 5-component principal-component embedding.
umap: 2-D UMAP embedding derived from the top 5 PCs.

Metadata

seurat_clusters: factor with three balanced cluster labels ("1", "2", "3").

Genes

GENE1–GENE20 (synthetic gene identifiers).

Cells

CELL001–CELL100.

Details

Two co-expression patterns are injected at data-generation time:

GENE3 & GENE4 – strongly correlated across all 100 cells (Pearson r approximately 0.89 in the normalised data). These are the recommended genes for discovery and assessment examples.
GENE1 & GENE2 – moderately correlated within cluster 1 only (cluster-specific pattern).
All remaining gene pairs are near-independent noise.

The data are generated with a fixed random seed (set.seed(7391)) so the object is fully reproducible. The generation script is provided in data-raw/make_testdata.R.

Source

Generated by data-raw/make_testdata.R with set.seed(7391).

Examples

# Load and inspect the object
data(scpairs_testdata)
scpairs_testdata

# Verify the injected GENE3 / GENE4 co-expression
norm <- SeuratObject::LayerData(scpairs_testdata, layer = "data")
cor(as.numeric(norm["GENE3", ]), as.numeric(norm["GENE4", ]))

Spatial Transcriptomics Metrics for scPairs

Description

This module combines bivariate spatial autocorrelation (Lee's L) and co-location quotient (CLQ) metrics for spatial transcriptomics data. Both metrics measure different aspects of spatial co-expression:

Lee's L captures spatial co-variation of continuous expression values.
CLQ measures whether expressing cells are spatially attracted.

Input Validation Helpers for scPairs

Description

Internal functions for validating user inputs to main scPairs functions. These ensure parameters are within reasonable ranges and provide helpful error messages.

Package {scPairs}

Align spatial coordinates and expression matrix

Description

Usage

Arguments

Value

Assign cells to micro-environment bins using k-means on PCA embedding

Description

Usage

Arguments

Value

Biweight midcorrelation between two numeric vectors

Description

Usage

Arguments

Value

Compute biweight midcorrelation matrix for all gene pairs

Description

Usage

Arguments

Value

Compute bridge score and bridge genes for a single pair

Description

Usage

Compute pathway bridge score for gene pairs

Description

Usage

Arguments

Details

Value

Build a single heatmap panel for cross-cell-type correlation

Description

Usage

Build a KNN graph from a Seurat reduction embedding

Description

Usage

Arguments

Value

Build a common minimal theme for panel plots

Description

Usage

Arguments

Value

Build a prior knowledge gene interaction network

Description

Usage

Arguments

Value

Build standardised scPairs result object

Description

Usage

Build a spatial KNN weight matrix

Description

Usage

Arguments

Value

Single-pair cluster-level correlation

Description

Usage

Cluster-level (pseudo-bulk) correlation for gene pairs

Description

Usage

Arguments

Value

Compute biweight midcorrelation (unified interface)

Description

Usage

Arguments

Value

Compute cluster-level pseudo-bulk correlation (unified interface)

Description

Usage

Arguments

Value

Compute pairwise co-expression metrics for a set of genes

Description

Usage

Arguments

Details

Value