scPairs: Identifying Synergistic Gene Pairs in Single-Cell and Spatial Transcriptomics

Overview

scPairs Logo

scPairs is an R package for the systematic identification and multi-evidence evaluation of synergistic gene pairs in single-cell and spatial transcriptomics data. Unlike conventional pairwise co-expression analyses that rely on a single correlation metric, scPairs integrates 14 complementary metrics across five orthogonal evidence layers to compute a composite synergy score with optional permutation-based significance testing.

The five evidence layers span cell-level co-expression (Pearson, Spearman, biweight midcorrelation, mutual information, ratio consistency), neighbourhood-aware smoothing (KNN-smoothed correlation, neighbourhood co-expression, cluster pseudo-bulk, cross-cell-type, neighbourhood synergy), prior biological knowledge (GO/KEGG co-annotation Jaccard, pathway bridge score), trans-cellular interaction, and spatial co-variation (Lee’s L, co-location quotient). This multi-scale design enables researchers to move beyond simple co-expression towards a comprehensive characterisation of cooperative gene regulation at transcriptomic and spatial resolution.

Preparation
- 1.1 Installation
- 1.2 Dependencies
Quick Start
Architecture
Visualization Gallery
Functions
- 5.1 Discovery (3 functions)
- 5.2 Visualization (11 functions)
Built-in Test Dataset
Citation
License
Contact

1. Preparation

1.1 Installation

# Install from GitHub
if (!require("devtools")) install.packages("devtools")
devtools::install_github("zhaoqing-wang/scPairs")

# Optional: prior knowledge integration (GO/KEGG annotation)
if (!require("BiocManager")) install.packages("BiocManager")
BiocManager::install(c("org.Mm.eg.db", "org.Hs.eg.db", "AnnotationDbi"))

Install missing CRAN dependencies manually

install.packages(c("data.table", "ggplot2", "ggraph", "ggrepel",
                   "igraph", "Matrix", "patchwork", "Seurat",
                   "tidygraph", "tidyr"))
# Optional accelerators and extras
install.packages(c("RANN", "ggExtra", "crayon"))

1.2 Dependencies

Required: Seurat (≥ 4.0), data.table, ggplot2, ggraph, ggrepel, igraph, Matrix, patchwork, tidygraph, tidyr

Optional:

Package	Purpose
`org.Mm.eg.db` / `org.Hs.eg.db`	GO & KEGG prior knowledge (mouse / human)
`AnnotationDbi`	Gene annotation infrastructure
`RANN`	Fast approximate KNN (10–100× speedup for neighbourhood metrics)
`ggExtra`	Marginal density panels in `PlotPairScatter()`
`crayon`	Colourised startup message

2. Quick Start

library(scPairs)
sce <- readRDS("your_data.rds")  # Seurat object, normalised

2.1 Global Pair Discovery

# All 14 metrics (default)
result <- FindAllPairs(sce, n_top_genes = 1000, top_n = 200)

# Expression metrics only — no annotation databases required
result <- FindAllPairs(sce, mode = "expression")

# Prior knowledge only — fast pathway-based screening
result <- FindAllPairs(sce, mode = "prior_only", organism = "mouse")

# With permutation testing (empirical p-values)
result <- FindAllPairs(sce, n_top_genes = 500, n_perm = 999)

# Visualise
PlotPairNetwork(result, top_n = 50)
PlotPairHeatmap(result, top_n = 25)

2.2 Query-Centric Partner Search

tp53_partners <- FindGenePairs(sce, gene = "TP53", top_n = 20)
PlotPairNetwork(tp53_partners)
PlotPairDimplot(sce, gene1 = "TP53", gene2 = tp53_partners$pairs$gene2[1])

2.3 In-Depth Pair Assessment

assessment <- AssessGenePair(sce, gene1 = "CD8A", gene2 = "CD8B")
print(assessment)                                          # structured summary

PlotPairSmoothed(sce, gene1 = "CD8A", gene2 = "CD8B")    # raw vs KNN-smoothed
PlotPairSummary(sce,  gene1 = "CD8A", gene2 = "CD8B",
                result = assessment)                       # full evidence dashboard

# All visualisation functions accept any result class interchangeably
PlotPairNetwork(assessment)
PlotPairHeatmap(assessment)

2.4 Prior Knowledge & Bridge Gene Network

# Requires: BiocManager::install(c("org.Mm.eg.db", "AnnotationDbi"))
result <- AssessGenePair(sce, gene1 = "Adora2a", gene2 = "Ido1",
                         organism = "mouse")

# 6-panel synergy dashboard (expression + neighbourhood + prior evidence)
PlotPairSynergy(sce, gene1 = "Adora2a", gene2 = "Ido1")

# Standalone bridge gene network
PlotBridgeNetwork(sce, gene1 = "Adora2a", gene2 = "Ido1",
                  organism = "mouse", top_bridges = 15)

# Custom interaction databases (CellChatDB, CellPhoneDB, SCENIC, etc.)
custom_db <- data.frame(gene1 = c("Adora2a"), gene2 = c("Ido1"))
result <- FindGenePairs(sce, gene = "Adora2a", custom_pairs = custom_db)

2.5 Spatial Transcriptomics

Spatial metrics are detected and computed automatically for Visium, MERFISH, Slide-seq, and similar modalities:

result <- FindAllPairs(spatial_obj, n_top_genes = 500)   # spatial metrics auto-enabled
PlotPairSpatial(spatial_obj, gene1 = "EPCAM", gene2 = "KRT8")

3. Architecture

3.1 Computation Modes

Mode	Metrics Computed	Dependencies	Use Case
`"all"` (default)	All 14 metrics	All packages	Full multi-evidence analysis
`"expression"`	Co-expression + neighbourhood (10)	No annotation DBs	No external databases needed
`"prior_only"`	Prior knowledge scores only (2)	org.*.db, AnnotationDbi	Fast pathway screening

3.2 Five Evidence Layers (14 Metrics)

Layer	Metrics	Weight
Cell-level (5)	Pearson (`cor_pearson`), Spearman (`cor_spearman`), Biweight midcorrelation (`cor_biweight`), Mutual information (`mi_score`), Ratio consistency (`ratio_consistency`)	1.0 – 1.5
Neighbourhood (5)	KNN-smoothed correlation (`smoothed_cor`), Neighbourhood score (`neighbourhood_score`), Cluster correlation (`cluster_cor`), Cross-cell-type score (`cross_celltype_score`), Neighbourhood synergy (`neighbourhood_synergy`)	1.2 – 1.5
Prior knowledge (2)	GO/KEGG co-annotation Jaccard (`prior_score`), Pathway bridge score (`bridge_score`)	1.8 – 2.0
Spatial (2)	Lee’s L (`spatial_lee_L`), Co-location quotient (`spatial_clq`)	1.2 – 1.5

Metrics are rank-normalised to [0, 1] and combined via weighted summation:

\[\text{synergy\_score} = \frac{\sum_i w_i \cdot \text{rank\_norm}(m_i)}{\sum_i w_i}\]

3.3 Unified Output Schema

All three workflows produce a pairs data.table with consistent columns: gene1, gene2, synergy_score, rank, confidence, plus all metric columns. Every visualisation function accepts any result class interchangeably.

Confidence assignment:

With permutation p-values	Without p-values (score quantiles)
High: p_adj < 0.01	High: ≥ 95th percentile
Medium: p_adj < 0.05	Medium: ≥ 80th percentile
Low: p_adj < 0.10	Low: ≥ 50th percentile

4. Visualization Gallery

The following plots are generated from the built-in scpairs_testdata object (100 cells × 20 genes, synthetic co-expression patterns). GENE3 and GENE4 are the injected globally co-expressed pair (Pearson r ≈ 0.89).

Pair Discovery — Network & Heatmap

Gene Pair Network	Synergy Score Heatmap

Left: PlotPairNetwork() — synergy-weighted interaction graph; node size ∝ connectivity, edge width ∝ synergy score.
Right: PlotPairHeatmap() — pairwise synergy matrix; top-ranked pairs clustered by score.

Co-expression — UMAP Overlays

Raw expression (3-panel):

PlotPairDimplot() — UMAP coloured by individual gene expression and co-expression product. GENE3 and GENE4 show overlapping high-expression regions, revealing their synergistic co-activation.

Raw vs KNN-smoothed (6-panel):

PlotPairSmoothed() — top row: raw expression; bottom row: KNN-smoothed. Smoothing reduces single-cell noise and clarifies the co-expression domain across the embedding.

Expression Distribution & Scatter

Violin by Cluster	Gene–Gene Scatter

Left: PlotPairViolin() — expression distributions of GENE3, GENE4, and their product per cluster.
Right: PlotPairScatter() — cell-level gene–gene scatter coloured by cluster; positive correlation visible across all three groups.

5. Functions

5.1 Discovery (3 functions)

Function	Purpose
`FindAllPairs()`	Global discovery of synergistic gene pairs across all variable genes
`FindGenePairs()`	Find synergistic partners for a specific query gene
`AssessGenePair()`	In-depth assessment of a specific gene pair with per-cluster detail

5.2 Visualization (11 functions)

Function	Input	Purpose
`PlotPairNetwork()`	Any result class	Synergy-weighted gene interaction network
`PlotPairHeatmap()`	Any result class	Synergy score heatmap across top gene pairs
`PlotPairDimplot()`	Seurat object	UMAP co-expression overlay (3-panel)
`PlotPairSmoothed()`	Seurat object	Raw + KNN-smoothed UMAP expression (6-panel)
`PlotPairSummary()`	Seurat object	Comprehensive multi-panel summary figure
`PlotPairSpatial()`	Spatial Seurat	Spatial co-expression map (3-panel)
`PlotPairCrossType()`	Seurat object	Cross-cell-type interaction heatmap
`PlotPairViolin()`	Seurat object	Expression distributions by cluster
`PlotPairScatter()`	Seurat object	Gene–gene scatter with optional marginal densities
`PlotPairSynergy()`	Seurat object	Multi-evidence synergy dashboard (4-panel)
`PlotBridgeNetwork()`	Seurat object	Bridge gene network with pathway-weighted edges

6. Built-in Test Dataset

scpairs_testdata is a minimal synthetic Seurat object shipped with the package for immediate use in examples and tests:

data(scpairs_testdata)
scpairs_testdata
#> An object of class Seurat
#> 20 features across 100 samples within 1 assay
#> Active assay: RNA (20 features, 20 variable features)
#>  3 layers present: counts, data, scale.data
#>  2 dimensional reductions calculated: pca, umap

Property	Value
Genes	GENE1–GENE20 (synthetic)
Cells	CELL001–CELL100
Clusters	3 balanced clusters (`seurat_clusters`)
Reductions	PCA (5 components), UMAP (2 dimensions)
Co-expression	GENE3/GENE4 globally (r ≈ 0.89); GENE1/GENE2 cluster-1-specific
File size	26 KB (xz-compressed)

The dataset is generated by data-raw/make_testdata.R with set.seed(7391) for full reproducibility.

7. Citation

Wang Z (2026). scPairs: Identifying Synergistic Gene Pairs in Single-Cell and
Spatial Transcriptomics. R package version 0.1.8.
https://github.com/zhaoqing-wang/scPairs

8. License

MIT

9. Contact

Author: Zhaoqing Wang (ORCID) | Email: zhaoqingwang@mail.sdu.edu.cn | Issues: scPairs Issues

mirror server hosted at Truenetwork, Russian Federation.