Getting started

Installation

First we need to install the SkeletalVis R package and the dependencies.

install.packages("SkeletalVis")

Loading the package

Once SkeletalVis is installed the package needs to be loaded into the current R session.

library(SkeletalVis)

Accessing the SkeletalVis dataset

First, we need to download the gene expression data and metadata. The gene expression data is stored as a set of feather files, which allow fast assess to the each experiment’s foldchanges, pvalues. Advanced users can manipulate and explore these files using standard R functions, but for convenience we provide functions that allow common tasks to be performed.

The load_skeletalvis function will check if the SkeletalVis data has been downloaded previously and if not will ask to download the files. The skeletalvis variable now shows where the data is stored on your system.

skeletalvis <- load_skeletalvis()

For this vignette we will work with a small truncated, demo version of SkeletalVis to demonstrate the functions. This demo version should not be used for analysis.

# Set demo=TRUE to load a small example database
skeletalvis <- load_skeletalvis(demo=TRUE)

Exploring SkeletalVis

We can explore the data in SkeletalVis in multiple ways, finding genes of interest, finding an experimental of interest or finding experiments that have gene expression profile similar to a query.

Searching all datasets for a gene of interest

We can view the differential expression of a gene of interest with the get_gene_fold_changes function. Below we retrieve the data for SOX9 across all expression profiles and optionally return the FDR for those datasets with replicates. Note that in SkeletalVis data from all species have been mapped to human gene symbols.

# Retrieve the data for SOX9 in the expression profiles
gene_results <- get_gene_fold_changes(skeletalvis, "SOX9", return_fdr = TRUE)
head(gene_results)
## # A tibble: 6 × 16
##   datasetID log2FoldChange       FDR Gene  accession comparisonsText Description
##   <chr>              <dbl>     <dbl> <chr> <chr>     <chr>           <chr>      
## 1 GSE15511…           3.11 0         SOX9  GSE155118 Sox5_6_9_OEvsc… Primary ch…
## 2 E-GEOD-6…           7.15 0         SOX9  E-GEOD-6… SOX9vseGFP      SOX9 overe…
## 3 E-GEOD-5…          -1.28 2.25e-195 SOX9  E-GEOD-5… Day18vsDay2     Temporal g…
## 4 GSE11026…           3.45 2.28e-151 SOX9  GSE11026… YAP KO_MSCvsWT… <NA>       
## 5 GSE10447…          -3.26 1.41e-117 SOX9  GSE104473 Mandibular Dis… Mouse mand…
## 6 GSE12516…           3.18 8.58e-112 SOX9  GSE125167 PLZF_10_days_d… PLZF knock…
## # ℹ 9 more variables: PMID <chr>, Species <chr>, Tissue <chr>, ExpType <chr>,
## #   Perturbation <chr>, Effect <chr>, Type <chr>, foldChangeOnly <lgl>,
## #   platform <chr>

Through the description column, which provides a short summary of the overall experiment, we can see that the dataset with the highest SOX9 expression is one where they have overexpressed that gene. Note that each experiment, detailed by the GEO accession number, may have multiple comparisons e.g condition1 vs control and condition2 vs control (called datasetIDs in SkeletalVis).

Find a dataset of interest

We can search for an individual dataset of interest by querying the metadata table. For example we might be interested in a particular species and disease. Again we can find the experiment where SOX9 has been experimentally perturbed e.g knocked out, by searching for the word “SOX9” anywhere in the experiment metadata table.

experiments <- search_skeletalvis(skeletalvis, "SOX9")
experiments
##              ID
## 15 E-GEOD-69110
## 32    GSE155118
##                                                          Description     PMID
## 15                               SOX9 overexpression in chondrocytes 26146088
## 32 Primary chondrocytes infected with mock and Sox5/6/9 adenoviruses 33707608
##    Species    Tissue           ExpType   Perturbation         Effect   Type
## 15   Human Cartilage Gene perturbation           SOX9 Overexpression RNASeq
## 32   Mouse Cartilage Gene perturbation Sox6 Sox8 Sox9                RNASeq
##    foldChangeOnly platform
## 15          FALSE   RNASeq
## 32          FALSE   RNASeq

We can also search for words in particular columns. Below we find datasets where the tissue under study has been annotated as synovium.

experiments <- search_skeletalvis(skeletalvis, "synovium", columns = "Tissue")
head(experiments)
##            ID
## 1    GSE58203
## 4 E-GEOD-1919
##                                                                       Description
## 1                       Stimulation of RA synovial fibroblasts with IL1 or PDGF-D
## 4 Human osteoarthritis or rheumatoid arthritis synovium treated with drug regimes
##       PMID Species   Tissue   ExpType          Perturbation Effect       Type
## 1 24989895   Human Synovium Exogenous         IL1 or PDGF-D        Microarray
## 4 20858714   Human Synovium   Disease RA and Osteoarthritis        Microarray
##   foldChangeOnly platform
## 1          FALSE     Affy
## 4           TRUE     Affy

Once we have a experiment of interest we can next view the comparisons of interest e.g Disease vs Control for that experiment and retrieve the datasetID will allow retrieval of that specific gene expression profile.

get_comparisons(skeletalvis, accession = "GSE155118" )
##      accession comparison      comparisonsText   datasetID
## 1078 GSE155118          1 Sox5_6_9_OEvscontrol GSE155118_1

Here we can see experiment accession “GSE155118” has one comparison, comparing SOX5,6,9 overexpression to controls. The datasetID “GSE155118_1” can be used to select that specific gene expression profile. Full details of the experimental design and metadata can be found on the GEO database for that experiment GSE155118

Alternatively we can interactively search the table of metadata and to find experiments of interest and get the datasetID to retrieve the data for that individual experiment.

browse_skeletalvis(skeletalvis)

Retrieve a dataset of interest

We can get the fold changes and pvalues (if applicable) for an individual dataset by querying by datasetID. This datasetID is the GEO accession number and the comparison number.

Retrieve the results for the experiment accession “GSE155118”, the first comparison so “GSE155118_1”

experiment_results <- get_experiment(skeletalvis, "GSE155118_1")
head(experiment_results)
## # A tibble: 6 × 3
##   ID       GSE155118_1_log2FoldChange GSE155118_1_FDR
##   <chr>                         <dbl>           <dbl>
## 1 SOX5                           5.99       4.47e-177
## 2 DCLK3                          5.60       4.78e-  8
## 3 SOX6                           4.74       0        
## 4 BMP7                           4.69       2.28e- 78
## 5 ACAN                           4.47       0        
## 6 ANKRD33B                       4.12       2.48e-  4

Filtering a dataset of interest

We can find particular genes of interest in this gene expression profile by subsetting the data.

genes_of_interest <- c("SOX5","SOX6","SOX9","ACAN")

experiment_results[ experiment_results$ID %in% genes_of_interest, ]
## # A tibble: 4 × 3
##   ID    GSE155118_1_log2FoldChange GSE155118_1_FDR
##   <chr>                      <dbl>           <dbl>
## 1 SOX5                        5.99       4.47e-177
## 2 SOX6                        4.74       0        
## 3 ACAN                        4.47       0        
## 4 SOX9                        3.11       0

We can also find all genes that have an absolute log2 fold change and FDR past a threshold e.g significant differentially expressed genes

# Data with an absolute fold change of 2 and FDR < 0.05
experiment_results_sig <- experiment_results[ abs(experiment_results$GSE155118_1_log2FoldChange) > log2(2) & experiment_results$GSE155118_1_FDR < 0.05, ]

# How many genes passing this threshold?
dim(experiment_results_sig)
## [1] 98  3
# Look at the top of the list
head(experiment_results_sig)
## # A tibble: 6 × 3
##   ID       GSE155118_1_log2FoldChange GSE155118_1_FDR
##   <chr>                         <dbl>           <dbl>
## 1 SOX5                           5.99       4.47e-177
## 2 DCLK3                          5.60       4.78e-  8
## 3 SOX6                           4.74       0        
## 4 BMP7                           4.69       2.28e- 78
## 5 ACAN                           4.47       0        
## 6 ANKRD33B                       4.12       2.48e-  4

Volcano plots

We can plot the differential expression of an individual dataset with either plotly or ggplot2 for interactive and static plots respectively. The number_points parameter controls the number of the top up and down regulated points by FDR to plot.

volcano_plot(experiment_results, number_points=5, interactive = FALSE)

volcano_plot(experiment_results, interactive = TRUE)

We can also add labels for particular genes of interest on the volcano plot.

# Selected genes and top 10 up and down by FDR
volcano_plot(experiment_results, number_points = 5, selected_points = c("SOX5","SOX6","SOX9"))

# Just label the selected genes
volcano_plot(experiment_results, number_points = 0, selected_points = c("SOX5","SOX6","SOX9"))

We can also use the volcano plot to visualise the differential expression data for a single gene from all the gene expression profiles in SkeletalVis in a similar way.

g <- volcano_plot(gene_results)

g

Finding similar datasets

We can compare gene expression profiles against the SkeletalVis database using cosine similarity with the log2 fold changes. To interpret the results, the cosine similarity scores can be normalized to z-scores, representing how many standard deviations a score is from the mean similarity. This normalisation assesses of the strength of similarity compared to other gene expression profiles.

The input dataset should include Human gene symbols and log2 fold changes for all measured genes.

# Load an example built in dataset
data(query)

dim(query)
## [1] 42459     2
head(query)
## # A tibble: 6 × 2
##   ID       `E-MTAB-4304_1`
##   <chr>              <dbl>
## 1 A1BG             -0.0241
## 2 A1BG-AS1         -0.0738
## 3 A1CF              0.0587
## 4 A2M              -0.112 
## 5 A2M-AS1          -0.139 
## 6 A2ML1             0.0198

Genes not present in both the query and SkeletalVis fold change tables will be removed for the analysis.

# Get the similarity of this experiment against the skeletalvis databases
similarity_table <- experiment_similarity(skeletalvis, query)

We can view the results of the search, the table is sorted by zscore so the most similar datasets are at the top.

# Look at the top few rows
head(similarity_table)
##        datasetID     cosine    zscore    accession
## 1 E-GEOD-54461_8 0.20132325 2.7219157 E-GEOD-54461
## 2  E-MTAB-5879_6 0.14870332 2.0204709  E-MTAB-5879
## 3  GSE110268_2_4 0.08870696 1.2206952  GSE110268_2
## 4     GSE60401_2 0.08367723 1.1536470     GSE60401
## 5 E-GEOD-18648_1 0.06658287 0.9257722 E-GEOD-18648
## 6    GSE104473_4 0.06631227 0.9221650    GSE104473
##                                               comparisonsText
## 1                                                 Day18vsDay2
## 2                            InjuryRepair.Day28vsControl.Day0
## 3                                          YAP KO_MSCvsWT_ESC
## 4                                          SXR_VK2vsDsRed_VK2
## 5                                             TGF.beta1vsnone
## 6 Mandibular Distraction_15_BCSPvsMandibular Fracture_15_BCSP
##                                                                     Description
## 1                            Temporal gene expression across osteoblastogenesis
## 2 Articular cartilage in a porcine model of early post-traumatic osteoarthritis
## 3                                                                          <NA>
## 4                                             SXR overexpression in ATDC5 cells
## 5              TGF-beta and BMP mediated gene expression in cultured sclerotome
## 6                                Mouse mandible during distraction osteogenesis
##       PMID Species    Tissue           ExpType
## 1 24945404   Mouse      Bone       Development
## 2 28671352     Pig Cartilage           Disease
## 3     <NA>    <NA>      <NA>              <NA>
## 4 25749104   Mouse Cell line Gene perturbation
## 5 20214815   Mouse Cartilage         Exogenous
## 6 30356216   Mouse      Bone       Development
##                                   Perturbation         Effect       Type
## 1                   Osteoblast differentiation           <NA>     RNASeq
## 2 Anterior cruciate ligament (ACL) transection                    RNASeq
## 3                                         <NA>           <NA>       <NA>
## 4                                          SXR Overexpression Microarray
## 5                         TGF-beta and/or BMP4                Microarray
## 6                                                                 RNASeq
##   foldChangeOnly platform
## 1          FALSE   RNASeq
## 2          FALSE   RNASeq
## 3             NA     <NA>
## 4           TRUE  Affy-ST
## 5          FALSE     Affy
## 6          FALSE   RNASeq

We can view the most dissimilar datasets by looking at the bottom of the table.

# Look at the bottom few rows
tail(similarity_table)
##         datasetID      cosine    zscore    accession
## 31     GSE64141_1 -0.07951195 -1.021730     GSE64141
## 32     GSE69459_3 -0.08492247 -1.093855     GSE69459
## 33    GSE102292_4 -0.10211661 -1.323060    GSE102292
## 34     GSE12860_3 -0.11364283 -1.476709     GSE12860
## 35 E-GEOD-69110_1 -0.11797657 -1.534479 E-GEOD-69110
## 36     GSE12860_6 -0.13075295 -1.704794     GSE12860
##                                                                                         comparisonsText
## 31                     after 15 days of differentiation in chondrogenic mediavsprior to differentiation
## 32                                                              FOP-iMSCs (+ActA)vsresFOP-iMSCs (+ActA)
## 33                                                                                       6 daysvs2 days
## 34                    Chondrocytes stimulated with prioxicam treated RASFvsNDSF stimulated chondrocytes
## 35                                                                                           SOX9vseGFP
## 36 Chondrocytes stimulated with methylprednisolone (urbason) treated RASFvsNDSF stimulated chondrocytes
##                                                                                     Description
## 31                                                  Chondrogenic Differentiation of ATDC5 cells
## 32                                      FOP- or resFOP-iMSCs after chondrogenic differentiation
## 33                                                        Mouse chondrocyte culture time course
## 34 Chondrocytes cultured with RA or healthy synovial fibroblast supernatant with drug treatment
## 35                                                          SOX9 overexpression in chondrocytes
## 36 Chondrocytes cultured with RA or healthy synovial fibroblast supernatant with drug treatment
##        PMID Species    Tissue           ExpType
## 31 26363184   Mouse Cell line   Differentiation
## 32 26621707   Human      Bone           Disease
## 33            Mouse Cartilage           Culture
## 34 19192274   Human Cartilage         Exogenous
## 35 26146088   Human Cartilage Gene perturbation
## 36 19192274   Human Cartilage         Exogenous
##                             Perturbation         Effect       Type
## 31                       Differentiation                Microarray
## 32 Fibrodysplasia ossificans progressiva                Microarray
## 33                                                      Microarray
## 34      RA synovial supernatant and drug                Microarray
## 35                                  SOX9 Overexpression     RNASeq
## 36      RA synovial supernatant and drug                Microarray
##    foldChangeOnly platform
## 31           TRUE  Affy-ST
## 32           TRUE  Affy-ST
## 33           TRUE     Affy
## 34           TRUE     Affy
## 35          FALSE   RNASeq
## 36           TRUE     Affy

Plotting the similarities

We can plot the most similar datasets by plotting the similarity zscore against the rank of the dataset and label the top few datasets.

# Plot the similarity results and label the top 5 experiments
plot_similarity(similarity_table, top_n=5)

Visualising OATargets data

OATargets is a curated collection of literature reported gene perturbations and the results effect on osteoarthritis damage phenotypes.

We can view the table of curated data:

oagenes_curated <- view_curated_oagenes(skeletalvis)

head(oagenes_curated)
##       PMID Gene Effect.on.gene.product           Model Susceptibility.observed
## 1 24578232  A2M              Exogenous Surgical - ACLT              Protective
## 2 28743292  A2M              Exogenous Surgical - ACLT              Protective
## 3 31534049 ABAT         Overexpression  Surgical - MLI             Detrimental
## 4 31534049 ABAT              Knockdown  Surgical - MLI              Protective
## 5 31534049 ABAT             Inhibition  Surgical - MLI              Protective
## 6 30813547 ACAN          Misexpression     Spontaneous             Detrimental
##   Inferred.gene.effect         Delivery Species   pub_date LastAuthor      Type
## 1           Protective Articular cavity     Rat 2014-07-01      Wei L Exogenous
## 2           Protective Articular cavity     Rat 2017-07-25      Wei L Exogenous
## 3          Detrimental Articular cavity   Mouse 2019-09-19 O'Keefe RJ Exogenous
## 4          Detrimental Articular cavity   Mouse 2019-09-19 O'Keefe RJ Exogenous
## 5          Detrimental         systemic   Mouse 2019-09-19 O'Keefe RJ Exogenous
## 6           Protective        Cartilage   Mouse 2019-02-26   Aszodi A   Genetic
##   Intervention simpleModel effectConsensus NumStudies
## 1      Protein    Surgical      Protective          2
## 2      Protein    Surgical      Protective          2
## 3        Viral    Surgical     Detrimental          3
## 4        siRNA    Surgical     Detrimental          3
## 5   Vigabatrin    Surgical     Detrimental          3
## 6                   Ageing      Protective          2

We can also view the table of prioritised genes, created using a machine learning model trained on the known curated genes, protein-protein interaction information and the SkeletalVis gene expression data. Druggability (small molecule or antibody) from OpenTargets is given. Note that the curated genes already studied are not within this table.

oagenes_prioritised <- view_prioritised_oagenes(skeletalvis)

head(oagenes_prioritised)
##     Gene Rank PredictedEffect             Category_sm
## 1   IER3    1     Detrimental                 Unknown
## 2  SOCS3    2     Detrimental                 Unknown
## 3  NR4A2    3     Detrimental Discovery_Precedence_sm
## 4   BCL3    4     Detrimental Discovery_Precedence_sm
## 5  INHBA    5      Protective Discovery_Precedence_sm
## 6 NFKBIA    6     Detrimental Discovery_Precedence_sm
##                                       Category_ab
## 1 Predicted_Tractable_ab_Medium_to_low_confidence
## 2 Predicted_Tractable_ab_Medium_to_low_confidence
## 3                                         Unknown
## 4          Predicted_Tractable_ab_High_confidence
## 5                          Clinical_Precedence_ab
## 6          Predicted_Tractable_ab_High_confidence

We can also see a network of physically interacting OAGenes surrounding a gene of interest.

view_network(skeletalvis, query = "COL2A1")

mirror server hosted at Truenetwork, Russian Federation.