First we need to install the SkeletalVis
R package and
the dependencies.
install.packages("SkeletalVis")
Once SkeletalVis
is installed the package needs to be
loaded into the current R session.
library(SkeletalVis)
First, we need to download the gene expression data and metadata. The
gene expression data is stored as a set of feather
files,
which allow fast assess to the each experiment’s foldchanges, pvalues.
Advanced users can manipulate and explore these files using standard R
functions, but for convenience we provide functions that allow common
tasks to be performed.
The load_skeletalvis
function will check if the
SkeletalVis
data has been downloaded previously and if not
will ask to download the files. The skeletalvis
variable
now shows where the data is stored on your system.
skeletalvis <- load_skeletalvis()
For this vignette we will work with a small truncated, demo version of SkeletalVis to demonstrate the functions. This demo version should not be used for analysis.
# Set demo=TRUE to load a small example database
skeletalvis <- load_skeletalvis(demo=TRUE)
We can explore the data in SkeletalVis in multiple ways, finding genes of interest, finding an experimental of interest or finding experiments that have gene expression profile similar to a query.
We can view the differential expression of a gene of interest with
the get_gene_fold_changes
function. Below we retrieve the
data for SOX9 across all expression profiles and optionally
return the FDR for those datasets with replicates. Note that in
SkeletalVis data from all species have been mapped to human gene
symbols.
# Retrieve the data for SOX9 in the expression profiles
gene_results <- get_gene_fold_changes(skeletalvis, "SOX9", return_fdr = TRUE)
head(gene_results)
## # A tibble: 6 × 16
## datasetID log2FoldChange FDR Gene accession comparisonsText Description
## <chr> <dbl> <dbl> <chr> <chr> <chr> <chr>
## 1 GSE15511… 3.11 0 SOX9 GSE155118 Sox5_6_9_OEvsc… Primary ch…
## 2 E-GEOD-6… 7.15 0 SOX9 E-GEOD-6… SOX9vseGFP SOX9 overe…
## 3 E-GEOD-5… -1.28 2.25e-195 SOX9 E-GEOD-5… Day18vsDay2 Temporal g…
## 4 GSE11026… 3.45 2.28e-151 SOX9 GSE11026… YAP KO_MSCvsWT… <NA>
## 5 GSE10447… -3.26 1.41e-117 SOX9 GSE104473 Mandibular Dis… Mouse mand…
## 6 GSE12516… 3.18 8.58e-112 SOX9 GSE125167 PLZF_10_days_d… PLZF knock…
## # ℹ 9 more variables: PMID <chr>, Species <chr>, Tissue <chr>, ExpType <chr>,
## # Perturbation <chr>, Effect <chr>, Type <chr>, foldChangeOnly <lgl>,
## # platform <chr>
Through the description column, which provides a short summary of the overall experiment, we can see that the dataset with the highest SOX9 expression is one where they have overexpressed that gene. Note that each experiment, detailed by the GEO accession number, may have multiple comparisons e.g condition1 vs control and condition2 vs control (called datasetIDs in SkeletalVis).
We can search for an individual dataset of interest by querying the metadata table. For example we might be interested in a particular species and disease. Again we can find the experiment where SOX9 has been experimentally perturbed e.g knocked out, by searching for the word “SOX9” anywhere in the experiment metadata table.
experiments <- search_skeletalvis(skeletalvis, "SOX9")
experiments
## ID
## 15 E-GEOD-69110
## 32 GSE155118
## Description PMID
## 15 SOX9 overexpression in chondrocytes 26146088
## 32 Primary chondrocytes infected with mock and Sox5/6/9 adenoviruses 33707608
## Species Tissue ExpType Perturbation Effect Type
## 15 Human Cartilage Gene perturbation SOX9 Overexpression RNASeq
## 32 Mouse Cartilage Gene perturbation Sox6 Sox8 Sox9 RNASeq
## foldChangeOnly platform
## 15 FALSE RNASeq
## 32 FALSE RNASeq
We can also search for words in particular columns. Below we find datasets where the tissue under study has been annotated as synovium.
experiments <- search_skeletalvis(skeletalvis, "synovium", columns = "Tissue")
head(experiments)
## ID
## 1 GSE58203
## 4 E-GEOD-1919
## Description
## 1 Stimulation of RA synovial fibroblasts with IL1 or PDGF-D
## 4 Human osteoarthritis or rheumatoid arthritis synovium treated with drug regimes
## PMID Species Tissue ExpType Perturbation Effect Type
## 1 24989895 Human Synovium Exogenous IL1 or PDGF-D Microarray
## 4 20858714 Human Synovium Disease RA and Osteoarthritis Microarray
## foldChangeOnly platform
## 1 FALSE Affy
## 4 TRUE Affy
Once we have a experiment of interest we can next view the comparisons of interest e.g Disease vs Control for that experiment and retrieve the datasetID will allow retrieval of that specific gene expression profile.
get_comparisons(skeletalvis, accession = "GSE155118" )
## accession comparison comparisonsText datasetID
## 1078 GSE155118 1 Sox5_6_9_OEvscontrol GSE155118_1
Here we can see experiment accession “GSE155118” has one comparison, comparing SOX5,6,9 overexpression to controls. The datasetID “GSE155118_1” can be used to select that specific gene expression profile. Full details of the experimental design and metadata can be found on the GEO database for that experiment GSE155118
Alternatively we can interactively search the table of metadata and to find experiments of interest and get the datasetID to retrieve the data for that individual experiment.
browse_skeletalvis(skeletalvis)
We can get the fold changes and pvalues (if applicable) for an individual dataset by querying by datasetID. This datasetID is the GEO accession number and the comparison number.
Retrieve the results for the experiment accession “GSE155118”, the first comparison so “GSE155118_1”
experiment_results <- get_experiment(skeletalvis, "GSE155118_1")
head(experiment_results)
## # A tibble: 6 × 3
## ID GSE155118_1_log2FoldChange GSE155118_1_FDR
## <chr> <dbl> <dbl>
## 1 SOX5 5.99 4.47e-177
## 2 DCLK3 5.60 4.78e- 8
## 3 SOX6 4.74 0
## 4 BMP7 4.69 2.28e- 78
## 5 ACAN 4.47 0
## 6 ANKRD33B 4.12 2.48e- 4
We can find particular genes of interest in this gene expression profile by subsetting the data.
genes_of_interest <- c("SOX5","SOX6","SOX9","ACAN")
experiment_results[ experiment_results$ID %in% genes_of_interest, ]
## # A tibble: 4 × 3
## ID GSE155118_1_log2FoldChange GSE155118_1_FDR
## <chr> <dbl> <dbl>
## 1 SOX5 5.99 4.47e-177
## 2 SOX6 4.74 0
## 3 ACAN 4.47 0
## 4 SOX9 3.11 0
We can also find all genes that have an absolute log2 fold change and FDR past a threshold e.g significant differentially expressed genes
# Data with an absolute fold change of 2 and FDR < 0.05
experiment_results_sig <- experiment_results[ abs(experiment_results$GSE155118_1_log2FoldChange) > log2(2) & experiment_results$GSE155118_1_FDR < 0.05, ]
# How many genes passing this threshold?
dim(experiment_results_sig)
## [1] 98 3
# Look at the top of the list
head(experiment_results_sig)
## # A tibble: 6 × 3
## ID GSE155118_1_log2FoldChange GSE155118_1_FDR
## <chr> <dbl> <dbl>
## 1 SOX5 5.99 4.47e-177
## 2 DCLK3 5.60 4.78e- 8
## 3 SOX6 4.74 0
## 4 BMP7 4.69 2.28e- 78
## 5 ACAN 4.47 0
## 6 ANKRD33B 4.12 2.48e- 4
We can plot the differential expression of an individual dataset with
either plotly
or ggplot2
for interactive and
static plots respectively. The number_points parameter controls the
number of the top up and down regulated points by FDR to plot.
volcano_plot(experiment_results, number_points=5, interactive = FALSE)
volcano_plot(experiment_results, interactive = TRUE)
We can also add labels for particular genes of interest on the volcano plot.
# Selected genes and top 10 up and down by FDR
volcano_plot(experiment_results, number_points = 5, selected_points = c("SOX5","SOX6","SOX9"))
# Just label the selected genes
volcano_plot(experiment_results, number_points = 0, selected_points = c("SOX5","SOX6","SOX9"))
We can also use the volcano plot to visualise the differential
expression data for a single gene from all the gene expression profiles
in SkeletalVis
in a similar way.
g <- volcano_plot(gene_results)
g
We can compare gene expression profiles against the SkeletalVis database using cosine similarity with the log2 fold changes. To interpret the results, the cosine similarity scores can be normalized to z-scores, representing how many standard deviations a score is from the mean similarity. This normalisation assesses of the strength of similarity compared to other gene expression profiles.
The input dataset should include Human gene symbols and log2 fold changes for all measured genes.
# Load an example built in dataset
data(query)
dim(query)
## [1] 42459 2
head(query)
## # A tibble: 6 × 2
## ID `E-MTAB-4304_1`
## <chr> <dbl>
## 1 A1BG -0.0241
## 2 A1BG-AS1 -0.0738
## 3 A1CF 0.0587
## 4 A2M -0.112
## 5 A2M-AS1 -0.139
## 6 A2ML1 0.0198
Genes not present in both the query and SkeletalVis fold change tables will be removed for the analysis.
# Get the similarity of this experiment against the skeletalvis databases
similarity_table <- experiment_similarity(skeletalvis, query)
We can view the results of the search, the table is sorted by zscore so the most similar datasets are at the top.
# Look at the top few rows
head(similarity_table)
## datasetID cosine zscore accession
## 1 E-GEOD-54461_8 0.20132325 2.7219157 E-GEOD-54461
## 2 E-MTAB-5879_6 0.14870332 2.0204709 E-MTAB-5879
## 3 GSE110268_2_4 0.08870696 1.2206952 GSE110268_2
## 4 GSE60401_2 0.08367723 1.1536470 GSE60401
## 5 E-GEOD-18648_1 0.06658287 0.9257722 E-GEOD-18648
## 6 GSE104473_4 0.06631227 0.9221650 GSE104473
## comparisonsText
## 1 Day18vsDay2
## 2 InjuryRepair.Day28vsControl.Day0
## 3 YAP KO_MSCvsWT_ESC
## 4 SXR_VK2vsDsRed_VK2
## 5 TGF.beta1vsnone
## 6 Mandibular Distraction_15_BCSPvsMandibular Fracture_15_BCSP
## Description
## 1 Temporal gene expression across osteoblastogenesis
## 2 Articular cartilage in a porcine model of early post-traumatic osteoarthritis
## 3 <NA>
## 4 SXR overexpression in ATDC5 cells
## 5 TGF-beta and BMP mediated gene expression in cultured sclerotome
## 6 Mouse mandible during distraction osteogenesis
## PMID Species Tissue ExpType
## 1 24945404 Mouse Bone Development
## 2 28671352 Pig Cartilage Disease
## 3 <NA> <NA> <NA> <NA>
## 4 25749104 Mouse Cell line Gene perturbation
## 5 20214815 Mouse Cartilage Exogenous
## 6 30356216 Mouse Bone Development
## Perturbation Effect Type
## 1 Osteoblast differentiation <NA> RNASeq
## 2 Anterior cruciate ligament (ACL) transection RNASeq
## 3 <NA> <NA> <NA>
## 4 SXR Overexpression Microarray
## 5 TGF-beta and/or BMP4 Microarray
## 6 RNASeq
## foldChangeOnly platform
## 1 FALSE RNASeq
## 2 FALSE RNASeq
## 3 NA <NA>
## 4 TRUE Affy-ST
## 5 FALSE Affy
## 6 FALSE RNASeq
We can view the most dissimilar datasets by looking at the bottom of the table.
# Look at the bottom few rows
tail(similarity_table)
## datasetID cosine zscore accession
## 31 GSE64141_1 -0.07951195 -1.021730 GSE64141
## 32 GSE69459_3 -0.08492247 -1.093855 GSE69459
## 33 GSE102292_4 -0.10211661 -1.323060 GSE102292
## 34 GSE12860_3 -0.11364283 -1.476709 GSE12860
## 35 E-GEOD-69110_1 -0.11797657 -1.534479 E-GEOD-69110
## 36 GSE12860_6 -0.13075295 -1.704794 GSE12860
## comparisonsText
## 31 after 15 days of differentiation in chondrogenic mediavsprior to differentiation
## 32 FOP-iMSCs (+ActA)vsresFOP-iMSCs (+ActA)
## 33 6 daysvs2 days
## 34 Chondrocytes stimulated with prioxicam treated RASFvsNDSF stimulated chondrocytes
## 35 SOX9vseGFP
## 36 Chondrocytes stimulated with methylprednisolone (urbason) treated RASFvsNDSF stimulated chondrocytes
## Description
## 31 Chondrogenic Differentiation of ATDC5 cells
## 32 FOP- or resFOP-iMSCs after chondrogenic differentiation
## 33 Mouse chondrocyte culture time course
## 34 Chondrocytes cultured with RA or healthy synovial fibroblast supernatant with drug treatment
## 35 SOX9 overexpression in chondrocytes
## 36 Chondrocytes cultured with RA or healthy synovial fibroblast supernatant with drug treatment
## PMID Species Tissue ExpType
## 31 26363184 Mouse Cell line Differentiation
## 32 26621707 Human Bone Disease
## 33 Mouse Cartilage Culture
## 34 19192274 Human Cartilage Exogenous
## 35 26146088 Human Cartilage Gene perturbation
## 36 19192274 Human Cartilage Exogenous
## Perturbation Effect Type
## 31 Differentiation Microarray
## 32 Fibrodysplasia ossificans progressiva Microarray
## 33 Microarray
## 34 RA synovial supernatant and drug Microarray
## 35 SOX9 Overexpression RNASeq
## 36 RA synovial supernatant and drug Microarray
## foldChangeOnly platform
## 31 TRUE Affy-ST
## 32 TRUE Affy-ST
## 33 TRUE Affy
## 34 TRUE Affy
## 35 FALSE RNASeq
## 36 TRUE Affy
We can plot the most similar datasets by plotting the similarity zscore against the rank of the dataset and label the top few datasets.
# Plot the similarity results and label the top 5 experiments
plot_similarity(similarity_table, top_n=5)
OATargets is a curated collection of literature reported gene perturbations and the results effect on osteoarthritis damage phenotypes.
We can view the table of curated data:
oagenes_curated <- view_curated_oagenes(skeletalvis)
head(oagenes_curated)
## PMID Gene Effect.on.gene.product Model Susceptibility.observed
## 1 24578232 A2M Exogenous Surgical - ACLT Protective
## 2 28743292 A2M Exogenous Surgical - ACLT Protective
## 3 31534049 ABAT Overexpression Surgical - MLI Detrimental
## 4 31534049 ABAT Knockdown Surgical - MLI Protective
## 5 31534049 ABAT Inhibition Surgical - MLI Protective
## 6 30813547 ACAN Misexpression Spontaneous Detrimental
## Inferred.gene.effect Delivery Species pub_date LastAuthor Type
## 1 Protective Articular cavity Rat 2014-07-01 Wei L Exogenous
## 2 Protective Articular cavity Rat 2017-07-25 Wei L Exogenous
## 3 Detrimental Articular cavity Mouse 2019-09-19 O'Keefe RJ Exogenous
## 4 Detrimental Articular cavity Mouse 2019-09-19 O'Keefe RJ Exogenous
## 5 Detrimental systemic Mouse 2019-09-19 O'Keefe RJ Exogenous
## 6 Protective Cartilage Mouse 2019-02-26 Aszodi A Genetic
## Intervention simpleModel effectConsensus NumStudies
## 1 Protein Surgical Protective 2
## 2 Protein Surgical Protective 2
## 3 Viral Surgical Detrimental 3
## 4 siRNA Surgical Detrimental 3
## 5 Vigabatrin Surgical Detrimental 3
## 6 Ageing Protective 2
We can also view the table of prioritised genes, created using a machine learning model trained on the known curated genes, protein-protein interaction information and the SkeletalVis gene expression data. Druggability (small molecule or antibody) from OpenTargets is given. Note that the curated genes already studied are not within this table.
oagenes_prioritised <- view_prioritised_oagenes(skeletalvis)
head(oagenes_prioritised)
## Gene Rank PredictedEffect Category_sm
## 1 IER3 1 Detrimental Unknown
## 2 SOCS3 2 Detrimental Unknown
## 3 NR4A2 3 Detrimental Discovery_Precedence_sm
## 4 BCL3 4 Detrimental Discovery_Precedence_sm
## 5 INHBA 5 Protective Discovery_Precedence_sm
## 6 NFKBIA 6 Detrimental Discovery_Precedence_sm
## Category_ab
## 1 Predicted_Tractable_ab_Medium_to_low_confidence
## 2 Predicted_Tractable_ab_Medium_to_low_confidence
## 3 Unknown
## 4 Predicted_Tractable_ab_High_confidence
## 5 Clinical_Precedence_ab
## 6 Predicted_Tractable_ab_High_confidence
We can also see a network of physically interacting OAGenes surrounding a gene of interest.
view_network(skeletalvis, query = "COL2A1")