R version: R version 4.5.2 (2025-10-31)
Bioconductor version: 3.22
Package: 0.1.2
Sharing biological data such as genomic, epigenomic, and transcriptomic data is a crucial initiative that enhances research transparency, promotes data reuse, and facilitates large-scale data reanalysis. Representative public repositories include the Gene Expression Omnibus (GEO) (Clough et al. 2023), the Sequence Read Archive (SRA) (Katz et al. 2021), and ArrayExpress (Sarkans et al. 2020), have greatly contributed to this effort. As the value of such data continues to increase, careful consideration is required when handling individual research datasets. Each study inherently involves various technical factors (e.g., experimenters, reagents, and measurement instruments), as well as biological factors (e.g., species, developmental stage, and tissue). Both technical and biological variations have been well documented to influence the reproducibility of microarray and RNA sequencing (RNA-Seq) experiments (Fischer and Hoffmann 2022).
Meta-analysis has emerged as a promising strategy for addressing these challenges, enabling the identification of reliable gene expression changes and underlying biological processes. It is a statistical method for integrating the results of multiple studies on the same topic, and is widely applied in genome research (Tseng, Ghosh, and Feingold 2012). For example, on the basis of the vote-counting method, earlier studies identified abiotic stress-responsive genes in A. thaliana and Oryza sativa (Tamura and Bono 2022; Shintani, Tamura, and Bono 2024). This method led to the development of a p53 Expression Score to capture p53-dependent gene expression (Fischer et al. 2016). Reanalysis of transcriptomic data across multiple studies is expected to identify significant differences that were not detected in individual studies. For example, see (De Toma 2021; Nozu et al. 2024).
In our previous study (Fukuda, Kawaguchi, and Fukushima 2025), we introduced the Stress Response score (SRscore) as a new metric using a modified vote-counting method based on methods reported in earlier studies (Ono and Bono 2021; Tamura and Bono 2022). When calculating fold-changes (FCs), whereas the HN-score (Ono and Bono 2021; Tamura and Bono 2022) restricts the predefined combinations of experimental and control groups, the SRscore considers all possible combinations.
We developed the R package SRscore to facilitate a simple and intuitive transcriptome meta-analysis across multiple research projects. The SRscore package was designed to address two key issues: (1) although some existing tools employ the vote-counting method, they are not structured for seamless integration into downstream analyses, and (2) the substantial effort and time required for meta-analyses often limits research efficiency. The SRscore package offers a standardized meta-analysis pipeline that can be readily utilized by researchers without specialized expertise in bioinformatics. Its applicability extends beyond stress response analysis, encompassing a broad spectrum of two-group comparisons, including organ-specific and drug-induced responses. The primary output, the SRscore, serves as an integrative metric that quantifies the number of studies supporting a particular finding, thereby providing an intuitively interpretable measure of reproducibility and consensus.
# library(devtools)
# install_github("fusk-kpu/SRscore", build_vignettes = TRUE)
library(SRscore)
expand_by_group() Creates a data frame containing all possible control-treatment sample combinations within each group.
calcSRratio() Calculates the gene expression ratio between control and treatment samples.
calcSRscore() Computes the SRscore, summarizing overall gene expression trends.
directly_calcSRscore() Executes all steps above and returns the results as a list.
find_diffexp() Retrieves expression ratios for the specified gene across experiments.
The SRscore package includes three demonstration datasets derived from microarray studies available in the GEO database (https://www.ncbi.nlm.nih.gov/geo/) and our previous study (Fukuda, Kawaguchi, and Fukushima 2025). Of these, two datasets include six GEO series examining A. thaliana under abscisic acid (ABA) treatment conditions, corresponding to the accession numbers GSE6638, GSE7112, GSE19520, GSE28800, GSE39384, and GSE135489. The demo datasets are a practical resource for users to gain a basic understanding of the SRscore analysis workflow. The following datasets are available in the SRscore package.
The SRscore package includes metadata, provided as object MetadataABA. These datasets contain information corresponding to the microarray studies described above. The following details were extracted from GEO records associated with the listed GEO accession numbers.
The primary role of the metadata in the SRscore package is to define the sample combinations to be compared within each study. Consequently, the minimum required information comprises only the study ID and sample ID.
library(tibble)
data("MetadataABA")
tibble(MetadataABA)
#> # A tibble: 19 × 5
#> Series control_sample treated_sample treatment tissue
#> <chr> <chr> <chr> <chr> <chr>
#> 1 GSE135489 GSM4012617 GSM4012620 1mM shoot system
#> 2 GSE135489 GSM4012618 GSM4012621 1mM shoot system
#> 3 GSE135489 GSM4012619 GSM4012622 1mM shoot system
#> 4 GSE39384 GSM967132 GSM967140 10uM seedling
#> 5 GSE39384 GSM967133 GSM967141 10uM seedling
#> 6 GSE39384 GSM967148 GSM967156 10uM seedling
#> 7 GSE39384 GSM967149 GSM967157 10uM seedling
#> 8 GSE39384 GSM967164 GSM967172 10uM seedling
#> 9 GSE39384 GSM967165 GSM967173 10uM seedling
#> 10 GSE28800 GSM713249 GSM713255 10uM seedling
#> 11 GSE28800 GSM713250 GSM713256 10uM seedling
#> 12 GSE28800 GSM713251 GSM713257 10uM seedling
#> 13 GSE19520 GSM486916 GSM486928 50uM leaf
#> 14 GSE19520 GSM486917 GSM486929 50uM leaf
#> 15 GSE19520 GSM486918 GSM486930 50uM leaf
#> 16 GSE6638 GSM153922 GSM153924 0.5uM seedling
#> 17 GSE6638 GSM153923 GSM153925 0.5uM seedling
#> 18 GSE7112 GSM170896 GSM170897 50uM leaf
#> 19 GSE7112 GSM170923 GSM170931 50uM leaf
The package includes a gene expression dataset called TranscriptomeABA, which stores sample × gene expression profiles. All microarray data were downloaded from GEO as raw CEL files and preprocessed using the Robust Multi-array Average (RMA) (Irizarry et al. 2003), as implemented in the Bioconductor package affy (Gautier et al. 2004).
Users can also perform the analysis workflow with their own data by replacing the contents of the demo datasets, provided that the data conform to the same format. This flexibility allows the workflow to be extended to other platforms, such as RNA-Seq, to different omics fields, such as metabolomics, and to experimental conditions beyond stress responses, including drug treatments.
data("TranscriptomeABA")
tibble(TranscriptomeABA)
#> # A tibble: 1,000 × 39
#> ensembl_gene_id GSM153922 GSM153923 GSM153924 GSM153925 GSM170896 GSM170897
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 AT1G01010 6.56 7.17 6.66 6.92 7.13 7.01
#> 2 AT1G01030 6.85 6.68 7.01 6.28 6.65 6.74
#> 3 AT1G01040 6.61 7.47 6.81 7.32 6.70 6.28
#> 4 AT1G01050 11.7 11.8 11.8 11.9 11.5 11.4
#> 5 AT1G01060 9.68 10.7 9.30 10.7 9.68 9.29
#> 6 AT1G01070 7.21 7.34 6.81 5.90 8.15 8.87
#> 7 AT1G01080 9.39 10.6 8.71 10.4 11.2 11.1
#> 8 AT1G01090 13.2 12.5 12.6 12.3 12.0 10.8
#> 9 AT1G01100 13.6 13.8 14.1 13.8 12.7 12.8
#> 10 AT1G01110 6.89 6.62 6.95 6.73 6.12 6.28
#> # ℹ 990 more rows
#> # ℹ 32 more variables: GSM170923 <dbl>, GSM170931 <dbl>, GSM4012617 <dbl>,
#> # GSM4012618 <dbl>, GSM4012619 <dbl>, GSM4012620 <dbl>, GSM4012621 <dbl>,
#> # GSM4012622 <dbl>, GSM486916 <dbl>, GSM486917 <dbl>, GSM486918 <dbl>,
#> # GSM486928 <dbl>, GSM486929 <dbl>, GSM486930 <dbl>, GSM713249 <dbl>,
#> # GSM713250 <dbl>, GSM713251 <dbl>, GSM713255 <dbl>, GSM713256 <dbl>,
#> # GSM713257 <dbl>, GSM967132 <dbl>, GSM967133 <dbl>, GSM967140 <dbl>, …
These data were generated by integrating SRscores calculated separately from the experimental datasets under 11 different stress conditions (Fukuda, Kawaguchi, and Fukushima 2025). Given that the SRscore scale differs among different conditions, the values were standardized using z-scores. In subsequent Template Matching (Pavlidis and Noble 2001), we searched for genes with similar SRscore patterns under different stress conditions based on SRGA.
data("SRGA")
tibble(SRGA)
#> # A tibble: 1,000 × 13
#> ensembl_gene_id ABA Cold DC3000 Drought Heat `High-light` Hypoxia Osmotic
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 AT1G01010 0 0 3 1 0 0 0 0
#> 2 AT1G01030 0 0 1 1 0 0 0 0
#> 3 AT1G01040 0 0 0 0 0 0 0 0
#> 4 AT1G01050 0 0 0 0 0 0 0 0
#> 5 AT1G01060 0 4 -1 1 -1 -4 -2 1
#> 6 AT1G01070 2 0 0 0 0 0 -1 1
#> 7 AT1G01080 0 -1 -3 -2 -1 0 -1 -4
#> 8 AT1G01090 0 0 0 0 0 0 0 0
#> 9 AT1G01100 0 0 0 0 0 0 0 0
#> 10 AT1G01110 0 0 0 0 0 0 0 0
#> # ℹ 990 more rows
#> # ℹ 4 more variables: Oxidation <dbl>, Salt <dbl>, Wound <dbl>, SYMBOL <chr>
The expand_by_group() function generates all possible combinations between the two groups (control and stress-treated groups) in each research project.
grp <- "Series"
var1 <- "control_sample"
var2 <- "treated_sample"
ebg <- expand_by_group(
.data = MetadataABA,
grp = grp,
var1 = var1,
var2 = var2
)
unique_series <- unique(MetadataABA$Series)
unique_series
#> [1] "GSE135489" "GSE39384" "GSE28800" "GSE19520" "GSE6638" "GSE7112"
lapply(unique_series, function(x) subset(ebg, Series == x))
#> [[1]]
#> Series control_sample treated_sample
#> 1 GSE135489 GSM4012617 GSM4012620
#> 2 GSE135489 GSM4012617 GSM4012621
#> 3 GSE135489 GSM4012617 GSM4012622
#> 4 GSE135489 GSM4012618 GSM4012620
#> 5 GSE135489 GSM4012618 GSM4012621
#> 6 GSE135489 GSM4012618 GSM4012622
#> 7 GSE135489 GSM4012619 GSM4012620
#> 8 GSE135489 GSM4012619 GSM4012621
#> 9 GSE135489 GSM4012619 GSM4012622
#>
#> [[2]]
#> Series control_sample treated_sample
#> 28 GSE39384 GSM967132 GSM967140
#> 29 GSE39384 GSM967132 GSM967141
#> 30 GSE39384 GSM967132 GSM967156
#> 31 GSE39384 GSM967132 GSM967157
#> 32 GSE39384 GSM967132 GSM967172
#> 33 GSE39384 GSM967132 GSM967173
#> 34 GSE39384 GSM967133 GSM967140
#> 35 GSE39384 GSM967133 GSM967141
#> 36 GSE39384 GSM967133 GSM967156
#> 37 GSE39384 GSM967133 GSM967157
#> 38 GSE39384 GSM967133 GSM967172
#> 39 GSE39384 GSM967133 GSM967173
#> 40 GSE39384 GSM967148 GSM967140
#> 41 GSE39384 GSM967148 GSM967141
#> 42 GSE39384 GSM967148 GSM967156
#> 43 GSE39384 GSM967148 GSM967157
#> 44 GSE39384 GSM967148 GSM967172
#> 45 GSE39384 GSM967148 GSM967173
#> 46 GSE39384 GSM967149 GSM967140
#> 47 GSE39384 GSM967149 GSM967141
#> 48 GSE39384 GSM967149 GSM967156
#> 49 GSE39384 GSM967149 GSM967157
#> 50 GSE39384 GSM967149 GSM967172
#> 51 GSE39384 GSM967149 GSM967173
#> 52 GSE39384 GSM967164 GSM967140
#> 53 GSE39384 GSM967164 GSM967141
#> 54 GSE39384 GSM967164 GSM967156
#> 55 GSE39384 GSM967164 GSM967157
#> 56 GSE39384 GSM967164 GSM967172
#> 57 GSE39384 GSM967164 GSM967173
#> 58 GSE39384 GSM967165 GSM967140
#> 59 GSE39384 GSM967165 GSM967141
#> 60 GSE39384 GSM967165 GSM967156
#> 61 GSE39384 GSM967165 GSM967157
#> 62 GSE39384 GSM967165 GSM967172
#> 63 GSE39384 GSM967165 GSM967173
#>
#> [[3]]
#> Series control_sample treated_sample
#> 19 GSE28800 GSM713249 GSM713255
#> 20 GSE28800 GSM713249 GSM713256
#> 21 GSE28800 GSM713249 GSM713257
#> 22 GSE28800 GSM713250 GSM713255
#> 23 GSE28800 GSM713250 GSM713256
#> 24 GSE28800 GSM713250 GSM713257
#> 25 GSE28800 GSM713251 GSM713255
#> 26 GSE28800 GSM713251 GSM713256
#> 27 GSE28800 GSM713251 GSM713257
#>
#> [[4]]
#> Series control_sample treated_sample
#> 10 GSE19520 GSM486916 GSM486928
#> 11 GSE19520 GSM486916 GSM486929
#> 12 GSE19520 GSM486916 GSM486930
#> 13 GSE19520 GSM486917 GSM486928
#> 14 GSE19520 GSM486917 GSM486929
#> 15 GSE19520 GSM486917 GSM486930
#> 16 GSE19520 GSM486918 GSM486928
#> 17 GSE19520 GSM486918 GSM486929
#> 18 GSE19520 GSM486918 GSM486930
#>
#> [[5]]
#> Series control_sample treated_sample
#> 64 GSE6638 GSM153922 GSM153924
#> 65 GSE6638 GSM153922 GSM153925
#> 66 GSE6638 GSM153923 GSM153924
#> 67 GSE6638 GSM153923 GSM153925
#>
#> [[6]]
#> Series control_sample treated_sample
#> 68 GSE7112 GSM170896 GSM170897
#> 69 GSE7112 GSM170896 GSM170931
#> 70 GSE7112 GSM170923 GSM170897
#> 71 GSE7112 GSM170923 GSM170931
The calcSRratio() function calculates the SRratio, which is the average gene expression ratio calculated based on combinations. If the argument is.log2 is set to FALSE, the logarithmic conversion is performed using 2 as the base.
SRratio <- calcSRratio(
.data = TranscriptomeABA,
var1 = var1,
var2 = var2,
pair = ebg,
is.log2 = TRUE
)
tibble(SRratio)
#> # A tibble: 1,000 × 20
#> ensembl_gene_id GSM4012620 GSM4012621 GSM4012622 GSM486928 GSM486929
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 AT1G01010 0.110 0.161 0.364 0.326 0.195
#> 2 AT1G01030 -0.267 0.152 0.00155 0.253 0.0655
#> 3 AT1G01040 0.280 0.118 0.0964 -1.20 -1.62
#> 4 AT1G01050 -0.00145 -0.0498 0.0914 -0.134 -0.0378
#> 5 AT1G01060 -0.00587 0.221 0.220 -0.0879 -1.63
#> 6 AT1G01070 0.0928 0.0119 0.152 2.49 1.95
#> 7 AT1G01080 0.0545 0.0377 -0.129 -0.759 0.0305
#> 8 AT1G01090 -0.00156 -0.0615 0.0319 -1.01 -0.511
#> 9 AT1G01100 -0.139 -0.107 -0.0671 0.0846 0.851
#> 10 AT1G01110 0.0816 0.0434 0.141 -0.220 0.0114
#> # ℹ 990 more rows
#> # ℹ 14 more variables: GSM486930 <dbl>, GSM713255 <dbl>, GSM713256 <dbl>,
#> # GSM713257 <dbl>, GSM967140 <dbl>, GSM967141 <dbl>, GSM967156 <dbl>,
#> # GSM967157 <dbl>, GSM967172 <dbl>, GSM967173 <dbl>, GSM153924 <dbl>,
#> # GSM153925 <dbl>, GSM170897 <dbl>, GSM170931 <dbl>
Alternatively, you can calculate SRratio directly without using expand_by_group(), as in (Tamura and Bono 2022).
conventional_SRratio <- calcSRratio(
TranscriptomeABA,
var1 = var1,
var2 = var2,
pair = MetadataABA,
is.log2 = TRUE
)
tibble(conventional_SRratio)
#> # A tibble: 1,000 × 20
#> ensembl_gene_id GSM4012620 GSM4012621 GSM4012622 GSM967140 GSM967141
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 AT1G01010 0.244 0.0300 0.361 0.0989 -0.114
#> 2 AT1G01030 -0.277 -0.0872 0.251 -0.0687 -0.0984
#> 3 AT1G01040 0.333 0.0626 0.0982 -0.321 -0.147
#> 4 AT1G01050 0.0394 -0.0205 0.0213 0.0828 0.0433
#> 5 AT1G01060 -0.00125 0.173 0.265 0.104 -0.132
#> 6 AT1G01070 -0.146 0.113 0.289 -0.491 -0.110
#> 7 AT1G01080 -0.0353 -0.0239 0.0223 -0.248 0.121
#> 8 AT1G01090 0.0373 -0.00145 -0.0671 -0.0131 -0.0894
#> 9 AT1G01100 -0.0588 -0.185 -0.0700 0.0326 0.0345
#> 10 AT1G01110 0.206 -0.145 0.205 -0.230 0.00207
#> # ℹ 990 more rows
#> # ℹ 14 more variables: GSM967156 <dbl>, GSM967157 <dbl>, GSM967172 <dbl>,
#> # GSM967173 <dbl>, GSM713255 <dbl>, GSM713256 <dbl>, GSM713257 <dbl>,
#> # GSM486928 <dbl>, GSM486929 <dbl>, GSM486930 <dbl>, GSM153924 <dbl>,
#> # GSM153925 <dbl>, GSM170897 <dbl>, GSM170931 <dbl>
The SRscore summarizes gene expression variations across the entire dataset by classifying the SRratio according to a pre-specified threshold. However, although this threshold can be freely specified, by default, log2FC values above 1 (corresponding to a 2-fold change on the linear scale) are classified as an increase in expression, whereas values below -1 (corresponding to a 1/2-fold change on the linear scale) correspond to a decline in expression, and values between -1 and 1 signify no change. The SRscore is calculated by subtracting the total number of expression decrease determinations from the total number of expression increase determinations.
SRscore <- calcSRscore(SRratio, threshold = c(-2, 2))
head(SRscore)
#> ensembl_gene_id up dn unchange all score
#> 1 AT1G01010 0 0 19 19 0
#> 2 AT1G01030 0 0 19 19 0
#> 3 AT1G01040 0 0 19 19 0
#> 4 AT1G01050 0 0 19 19 0
#> 5 AT1G01060 0 0 19 19 0
#> 6 AT1G01070 3 0 16 19 3
tibble(SRscore)
#> # A tibble: 1,000 × 6
#> ensembl_gene_id up dn unchange all score
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 AT1G01010 0 0 19 19 0
#> 2 AT1G01030 0 0 19 19 0
#> 3 AT1G01040 0 0 19 19 0
#> 4 AT1G01050 0 0 19 19 0
#> 5 AT1G01060 0 0 19 19 0
#> 6 AT1G01070 3 0 16 19 3
#> 7 AT1G01080 0 0 19 19 0
#> 8 AT1G01090 0 0 19 19 0
#> 9 AT1G01100 0 0 19 19 0
#> 10 AT1G01110 0 0 19 19 0
#> # ℹ 990 more rows
Here, we introduce fundamental visualization functions to aid in the intuitive understanding of the SRscore, the primary output of the SRscore package. Utilizing these functions allows for the straightforward assessment of the distribution of SRscores and the identification of gene groups exhibiting distinctive SRscores.
plot_SRscore_distr() aggregates the number of genes for each SRscore value (excluding zero) and visualizes this distribution as a bar chart. Setting the log argument to TRUE (default is FALSE) converts the y-axis to a logarithmic scale, making distributions across a wide range of values easier to interpret.
plot_SRscore_distr(SRscore)
plot_SRscore_distr(SRscore, log = TRUE)
plot_SRscore_rank() sorts genes in descending order of SRscore and visualizes their distribution. The top or bottom range can be highlighted with color coding using the threshold argument (default is c(-1, 1)).
plot_SRscore_rank(SRscore)
plot_SRscore_top() extracts genes with the highest absolute SRscore values and visualizes them as a bar chart. The number of top genes to visualize can be flexibly specified using the argument top_n (default is 20).
plot_SRscore_top(SRscore)
plot_SRscore_top(SRscore, top_n = 30)
directly_calcSRscore() aggregates the results of expand_by_group(), calcSRratio(), and calcSRscore() into a list.
res <- directly_calcSRscore(
.data1 = MetadataABA,
grp = grp,
var1 = var1,
var2 = var2,
.data2 = TranscriptomeABA,
is.log2 = TRUE,
threshold = c(-2, 2)
)
names(res)
#> [1] "pairs" "SRratio" "SRscore"
tibble(res$SRscore)
#> # A tibble: 1,000 × 6
#> ensembl_gene_id up dn unchange all score
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 AT1G01010 0 0 19 19 0
#> 2 AT1G01030 0 0 19 19 0
#> 3 AT1G01040 0 0 19 19 0
#> 4 AT1G01050 0 0 19 19 0
#> 5 AT1G01060 0 0 19 19 0
#> 6 AT1G01070 3 0 16 19 3
#> 7 AT1G01080 0 0 19 19 0
#> 8 AT1G01090 0 0 19 19 0
#> 9 AT1G01100 0 0 19 19 0
#> 10 AT1G01110 0 0 19 19 0
#> # ℹ 990 more rows
The find_diffexp() function extracts the pre-calculated SRratio and corresponding metadata (research ID, group information, sample ID, etc.) for the specified gene. This enables users to intuitively understand how the expression of a specific gene changes in studies and samples.
set.seed(1)
res <- find_diffexp(
sample(SRratio$ensembl_gene_id, 1),
SRratio,
SRscore,
MetadataABA
)
tibble(res$result)
#> # A tibble: 19 × 6
#> AT1G10230 Series control_sample treated_sample treatment tissue
#> <dbl> <chr> <chr> <chr> <chr> <chr>
#> 1 -0.228 GSE135489 GSM4012617 GSM4012620 1mM shoot system
#> 2 -0.178 GSE135489 GSM4012618 GSM4012621 1mM shoot system
#> 3 -0.246 GSE135489 GSM4012619 GSM4012622 1mM shoot system
#> 4 1.18 GSE19520 GSM486916 GSM486928 50uM leaf
#> 5 1.35 GSE19520 GSM486917 GSM486929 50uM leaf
#> 6 0.932 GSE19520 GSM486918 GSM486930 50uM leaf
#> 7 -0.433 GSE28800 GSM713249 GSM713255 10uM seedling
#> 8 -0.0363 GSE28800 GSM713250 GSM713256 10uM seedling
#> 9 0.0359 GSE28800 GSM713251 GSM713257 10uM seedling
#> 10 0.0581 GSE39384 GSM967132 GSM967140 10uM seedling
#> 11 0.229 GSE39384 GSM967133 GSM967141 10uM seedling
#> 12 0.214 GSE39384 GSM967148 GSM967156 10uM seedling
#> 13 0.477 GSE39384 GSM967149 GSM967157 10uM seedling
#> 14 0.150 GSE39384 GSM967164 GSM967172 10uM seedling
#> 15 0.418 GSE39384 GSM967165 GSM967173 10uM seedling
#> 16 -0.222 GSE6638 GSM153922 GSM153924 0.5uM seedling
#> 17 0.282 GSE6638 GSM153923 GSM153925 0.5uM seedling
#> 18 0.608 GSE7112 GSM170896 GSM170897 50uM leaf
#> 19 -0.417 GSE7112 GSM170923 GSM170931 50uM leaf
tibble(res$SRscore)
#> # A tibble: 1 × 6
#> ensembl_gene_id up dn unchange all score
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 AT1G10230 0 0 19 19 0
Multiple genes can also be specified:
set.seed(1)
res2 <- find_diffexp(
sample(SRratio$ensembl_gene_id, 10),
SRratio,
SRscore,
MetadataABA
)
tibble(res2$result)
#> # A tibble: 19 × 15
#> AT1G02490 AT1G03130 AT1G04040 AT1G04330 AT1G06160 AT1G06580 AT1G08430
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 -0.0967 0.0417 -0.424 0.0660 -1.13 -0.182 0.247
#> 2 -0.0619 0.0828 -0.614 0.192 -0.676 -0.0644 0.323
#> 3 -0.228 -0.108 -0.600 0.190 -0.915 -0.0900 0.582
#> 4 -0.0803 -0.760 -1.68 0.371 -1.08 -0.0134 0.0308
#> 5 0.271 -0.840 -1.74 0.357 -1.48 0.322 0.117
#> 6 0.124 -0.937 -2.00 0.189 0.0219 -0.0672 0.342
#> 7 -0.229 -1.43 -0.335 0.0928 -2.44 -0.102 0.701
#> 8 -0.102 -0.393 -1.67 -0.0750 -3.00 0.0930 0.233
#> 9 -0.0385 -0.493 -1.82 0.0482 -2.86 -0.106 0.521
#> 10 -0.0524 -0.188 -0.00345 -0.166 -1.49 -0.0212 -0.229
#> 11 -0.0957 0.0790 0.107 -0.0390 -1.17 0.118 -0.567
#> 12 0.178 -0.0976 -0.0455 0.168 -1.79 -0.160 0.626
#> 13 0.287 0.0909 -0.0420 -0.223 -1.65 0.0514 -0.103
#> 14 0.0181 -0.126 -1.32 0.165 -1.80 -0.0835 -0.0648
#> 15 0.481 -0.0347 -1.14 -0.0000311 -1.90 0.145 -1.10
#> 16 0.171 -1.79 -1.67 -0.107 0.407 -0.263 0.121
#> 17 0.0578 -5.29 -0.869 -0.213 -0.111 0.206 0.719
#> 18 0.0334 -0.966 -0.466 0.172 0.150 0.211 0.0766
#> 19 -0.0158 0.0653 0.0327 -0.0908 0.169 0.0440 -0.104
#> # ℹ 8 more variables: AT1G10230 <dbl>, AT1G11290 <dbl>, AT1G11820 <dbl>,
#> # Series <chr>, control_sample <chr>, treated_sample <chr>, treatment <chr>,
#> # tissue <chr>
tibble(res2$SRscore)
#> # A tibble: 10 × 6
#> ensembl_gene_id up dn unchange all score
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 AT1G02490 0 0 19 19 0
#> 2 AT1G03130 0 1 18 19 -1
#> 3 AT1G04040 0 1 18 19 -1
#> 4 AT1G04330 0 0 19 19 0
#> 5 AT1G06160 0 3 16 19 -3
#> 6 AT1G06580 0 0 19 19 0
#> 7 AT1G08430 0 0 19 19 0
#> 8 AT1G10230 0 0 19 19 0
#> 9 AT1G11290 0 0 19 19 0
#> 10 AT1G11820 0 0 19 19 0
The SRscore package can be applied to various downstream analyses (e.g., enrichment analysis, heatmap, and template matching).
With respect to downstream analysis, it is important to identify the biological functions and pathways associated with the listed genes extracted based on SRscore values. This can be performed based on enrichment analysis using the clusterProfiler package (Yu et al. 2012). In the present study, genes assigned an SRscore of 1 or higher were designated as up-regulated genes, for which Gene Ontology (GO) enrichment analysis was performed using the clusterProfiler package. This accordingly revealed that the GO term “response to abscisic acid,” associated with the ABA stress response, was the most significantly enriched.
library(clusterProfiler)
library(ggplot2)
ego <- enrichGO(
gene = SRscore$ensembl_gene_id[SRscore$score >= 1],
OrgDb = "org.At.tair.db",
keyType = "TAIR",
ont = "BP",
maxGSSize = 2000
)
dotplot(ego, showCategory = 5, font.size = 14) +
theme(text = element_text(size = 14))
Using the find_diffexp() function to extract the SRratio for a specified group of genes linked to metadata, and applying the ComplexHeatmap package (Gu, Eils, and Schlesner 2016), provides visual confirmation of the extracted information
library(ComplexHeatmap)
#> Loading required package: grid
#> ========================================
#> ComplexHeatmap version 2.26.0
#> Bioconductor page: http://bioconductor.org/packages/ComplexHeatmap/
#> Github page: https://github.com/jokergoo/ComplexHeatmap
#> Documentation: http://jokergoo.github.io/ComplexHeatmap-reference
#>
#> If you use it in published research, please cite either one:
#> - Gu, Z. Complex Heatmap Visualization. iMeta 2022.
#> - Gu, Z. Complex heatmaps reveal patterns and correlations in multidimensional
#> genomic data. Bioinformatics 2016.
#>
#>
#> The new InteractiveComplexHeatmap package can directly export static
#> complex heatmaps into an interactive Shiny app with zero effort. Have a try!
#>
#> This message can be suppressed by:
#> suppressPackageStartupMessages(library(ComplexHeatmap))
#> ========================================
library(RColorBrewer)
cor_breaks <- seq(-2, 2, length.out = 51)
cor_color <- colorRampPalette(c("blue", "white", "red"))(51)
annotation_row <- res2$result[, c("treatment", "tissue")]
pal_treatment <- brewer.pal(length(unique(annotation_row$treatment)), "Set1")
pal_tissue <- brewer.pal(length(unique(annotation_row$tissue)), "Set2")
names(pal_treatment) <- unique(annotation_row$treatment)
names(pal_tissue) <- unique(annotation_row$tissue)
ComplexHeatmap::pheatmap(
as.matrix(res2$result[, sapply(res2$result, is.numeric)]),
breaks = cor_breaks,
color = cor_color,
cluster_rows = FALSE,
name = "SRratio",
annotation_row = annotation_row,
annotation_colors = list(
treatment = pal_treatment,
tissue = pal_tissue
)
)
By applying Template Matching method (Pavlidis and Noble 2001), other genes with similar patterns can be identified based on the specific SRscore pattern (template) of a given gene. Using galactinol synthase 3 (GolS3), one of the cold stress response genes, as an example, Figure 4 shows the results obtained when extracting the top five genes with the most similar SRscore patterns from the 1,000 genes contained within the sample data SRGA.
library(genefilter)
#>
#> Attaching package: 'genefilter'
#> The following object is masked from 'package:ComplexHeatmap':
#>
#> dist2
library(DT)
cl <- colnames(Filter(is.numeric, SRGA))
df <- as.matrix(column_to_rownames(SRGA, var = "ensembl_gene_id")[cl])
template <- "AT1G09350"
close_genes <- genefinder(
df,
ilist = template,
numResults = 5,
method = "euclidean"
)
datatable(
SRGA[SRGA$ensembl_gene_id == template, ],
options = list(dom = "lrtBip"),
rownames = FALSE
)
datatable(
SRGA[close_genes[[1]]$indices, ],
options = list(dom = "lrtBip"),
rownames = FALSE,
)
This study demonstrates the utility and applicability of the SRscore package (https://github.com/fusk-kpu/SRscore) by applying it to real datasets. This presents a rational and reproducible workflow for generating control-treatment comparisons, calculating gene expression ratios, and scoring expression patterns across multiple transcriptome datasets. The package streamlines transcriptome meta-analysis, enhances reproducibility, and enables researchers to focus on interpreting the biological significance of the results rather than expending effort on data organization and manual calculations. Furthermore, the core functions of the SRscore package are not restricted to transcriptomic data but can also be applied to other omics data, such as metabolomics, thereby making it a versatile tool for future applications.
sessionInfo()
#> R version 4.5.2 (2025-10-31)
#> Platform: x86_64-apple-darwin20
#> Running under: macOS Sequoia 15.5
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
#>
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: Asia/Tokyo
#> tzcode source: internal
#>
#> attached base packages:
#> [1] grid stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] DT_0.34.0 genefilter_1.92.0 RColorBrewer_1.1-3
#> [4] ComplexHeatmap_2.26.0 tibble_3.3.0 SRscore_0.1.2
#> [7] BiocStyle_2.38.0
#>
#> loaded via a namespace (and not attached):
#> [1] KEGGREST_1.50.0 circlize_0.4.17 shape_1.4.6.1
#> [4] rjson_0.2.23 xfun_0.55 bslib_0.9.0
#> [7] htmlwidgets_1.6.4 GlobalOptions_0.1.3 lattice_0.22-7
#> [10] Biobase_2.70.0 crosstalk_1.2.2 vctrs_0.6.5
#> [13] tools_4.5.2 generics_0.1.4 stats4_4.5.2
#> [16] parallel_4.5.2 AnnotationDbi_1.72.0 RSQLite_2.4.5
#> [19] cluster_2.1.8.1 blob_1.2.4 pkgconfig_2.0.3
#> [22] Matrix_1.7-4 S4Vectors_0.48.0 lifecycle_1.0.4
#> [25] compiler_4.5.2 Biostrings_2.78.0 tinytex_0.58
#> [28] Seqinfo_1.0.0 codetools_0.2-20 clue_0.3-66
#> [31] htmltools_0.5.9 sass_0.4.10 yaml_2.3.12
#> [34] pillar_1.11.1 crayon_1.5.3 jquerylib_0.1.4
#> [37] tidyr_1.3.2 cachem_1.1.0 magick_2.9.0
#> [40] iterators_1.0.14 foreach_1.5.2 tidyselect_1.2.1
#> [43] digest_0.6.39 dplyr_1.1.4 purrr_1.2.0
#> [46] bookdown_0.46 splines_4.5.2 fastmap_1.2.0
#> [49] colorspace_2.1-2 cli_3.6.5 magrittr_2.0.4
#> [52] survival_3.8-3 XML_3.99-0.20 utf8_1.2.6
#> [55] withr_3.0.2 bit64_4.6.0-1 XVector_0.50.0
#> [58] rmarkdown_2.30 httr_1.4.7 matrixStats_1.5.0
#> [61] bit_4.6.0 otel_0.2.0 png_0.1-8
#> [64] GetoptLong_1.1.0 memoise_2.0.1 evaluate_1.0.5
#> [67] knitr_1.51 IRanges_2.44.0 doParallel_1.0.17
#> [70] rlang_1.1.6 Rcpp_1.1.0 xtable_1.8-4
#> [73] glue_1.8.0 DBI_1.2.3 BiocManager_1.30.27
#> [76] BiocGenerics_0.56.0 rstudioapi_0.17.1 annotate_1.88.0
#> [79] jsonlite_2.0.0 R6_2.6.1 MatrixGenerics_1.22.0
Figure 1. Overview of the SRscore package
The SRscore package is an R package that enables the estimation of variations in gene expression across multiple datasets through meta-analysis. The SRscore calculation process is illustrated in three key steps within the workflow, each highlighted in corresponding colored boxes. (a) Metadata and gene expression data are loaded. (b) Sample combinations are determined to compare expression levels. (c) SRratio is calculated. The procedure is similar to that for fold change (FC); however, in this study, FC is calculated for all possible pairs and averaged. Batch effects are considered, and calculations are performed separately for each dataset (e.g., the GEO series). (d) SRscore is calculated. Based on the SRratio calculated for each dataset, gene expression changes are classified as follows (SRratio ≥ 1: expression increase, −1 < SRratio < 1: no change, SRratio ≤ −1: expression decrease). The number of expression decreases is subtracted from the number of expression increases to obtain the SRscore, and gene expression variation is evaluated based on its magnitude. (e) The SRratio, SRscore, etc., of genes (or groups of genes) of interest are checked while linking them to metadata.