Assessing Usefulness of Databases for Evidence Synthesis

2026-06-08

About this vignette

In the process of developing search strategies for evidence synthesis, it is standard practice to test different versions of a search against a set of already known relevant studies — benchmark studies. In this way, the right balance between precision and sensitivity can be achieved prior to screening.

Until now, this within-database testing has been the primary method of pre-screening search validation. With CiteSource, we can test search strategies across databases to assess the usefulness of certain databases before finalizing our database set. This vignette provides a workflow for testing a search strategy across multiple databases and against a set of benchmark studies.

In this example, we are running a search about loneliness and gambling addiction. We developed a search strategy for PsycInfo, our main database, and want to see if searching Web of Science and PubMed adds useful records and helps us find more of our benchmark studies.

Installation and setup

#install.packages("CiteSource")
library(CiteSource)

Import files from multiple sources

Here we import three database searches and a set of benchmark studies. The benchmark file is assigned cite_source = NA since it does not represent a database search, and cite_label = "benchmark" to identify it as the reference set.

citation_files <- list.files(path = "valid_data", pattern = "\\.ris", full.names = TRUE)
citation_files
#> [1] "valid_data/WoS_79.ris"      "valid_data/benchmark.ris"  
#> [3] "valid_data/psycinfo_64.ris" "valid_data/pubmed_46.ris"

citations <- read_citations(citation_files,
                            cite_sources = c(NA, "psycinfo", "pubmed", "wos"),
                            cite_labels  = c("benchmark", "search", "search", "search"),
                            tag_naming   = "best_guess")
#> Note: the following cite_label value(s) are not in the standard vocabulary (search / screened / final): benchmark. Phase-analysis functions expect these exact labels.
#> Import completed - with the following details:
#>              file cite_source cite_string cite_label citations
#> 1      WoS_79.ris        <NA>        <NA>  benchmark        79
#> 2   benchmark.ris    psycinfo        <NA>     search        13
#> 3 psycinfo_64.ris      pubmed        <NA>     search        64
#> 4   pubmed_46.ris         wos        <NA>     search        46

Deduplication and source information

CiteSource merges duplicate records while preserving the cite_source and cite_label metadata fields, so the origin of each record is retained through deduplication.

unique_citations <- dedup_citations(citations)
n_unique         <- count_unique(unique_citations)
source_comparison <- compare_sources(unique_citations, comp_type = "sources")

Plot heatmap to compare source overlap

Heatmap by number of records

A heatmap shows the total number of records from each database and the count of overlapping records for each pair. Web of Science yielded the highest number of records on gambling addiction and loneliness; PubMed the least.

plot_source_overlap_heatmap(source_comparison)

Heatmap by percentage of records

The percentage heatmap shows what share of each row’s records were also found in each column. Here, 55% of Web of Science records were also found in PsycInfo, while 44% of PsycInfo records were found in Web of Science.

plot_source_overlap_heatmap(source_comparison, plot_type = "percentages")

Plot an upset plot to compare source overlap

An upset plot provides more detail about shared and unique records across all source combinations. Web of Science had the most unique records not found in any other database (n=29); PubMed had only four unique records. Twenty-four records were found in every database.

plot_source_overlap_upset(source_comparison, decreasing = c(TRUE, TRUE))

Bar plots of unique and shared records

plot_contributions() visualizes unique and shared record counts by source, and can include the benchmark label to show how each database contributed to the benchmark set.

plot_contributions(n_unique, center = TRUE)

Analyzing unique contributions

To examine which records are exclusive to each database, filter n_unique for unique == TRUE and rejoin with unique_citations to recover full bibliographic data.

unique_psycinfo <- n_unique |>
  dplyr::filter(cite_source == "psycinfo", unique == TRUE) |>
  dplyr::inner_join(unique_citations, by = "duplicate_id")

unique_pubmed <- n_unique |>
  dplyr::filter(cite_source == "pubmed", unique == TRUE) |>
  dplyr::inner_join(unique_citations, by = "duplicate_id")

unique_wos <- n_unique |>
  dplyr::filter(cite_source == "wos", unique == TRUE) |>
  dplyr::inner_join(unique_citations, by = "duplicate_id")

# To export for manual review:
# export_csv(unique_pubmed, "pubmed_unique.csv")

Record-level table

Filtering unique_citations to only the benchmark records and passing to record_level_table() shows which databases contained each benchmark study.

unique_citations |>
  dplyr::filter(stringr::str_detect(cite_label, "benchmark")) |>
  record_level_table(return = "DT")

Search summary table

citation_summary_table() calculates sensitivity and precision scores for each database against the benchmark set, providing a concise overview of each source’s performance before screening begins.

citation_summary_table(unique_citations, screening_label = "benchmark")
Sources
Records
Contribution
Sensitivity Precision
total unique unique
search
pubmed 64 33 41.77% 71.11%
wos 46 18 22.78% 51.11%
psycinfo 13 7 8.86% 14.44%
Total1 90 58 64.44%
benchmark
wos 39 14 17.72% 49.37% 84.78%
pubmed 35 9 11.39% 44.30% 54.69%
NA 27 27 34.18% 34.18%
psycinfo 6 2 2.53% 7.59% 46.15%
Total1 79 52 65.82% 87.78%
Included fields:

  • Total records are all records returned by that source, while unique records are found in only that source (or, in the Total rows, in only one source).

  • The unique contribution is the share of records only found in that source (or, in the Total rows, in only one source).

  • Sensitivity is the share of all (deduplicated) records retained at that stage compared to the total number found in that particular source.

  • Precision is the share of initial records in that source that are retained for inclusion at that stage.

1 After deduplication

Exporting for further analysis

CiteSource can export deduplicated results as CSV, RIS, or BibTeX files, and reimport them to resume analysis later.

#export_csv(unique_citations, filename = "unique-by-source.csv", separate = "cite_source")
#export_ris(unique_citations, filename = "unique_citations.ris", source_field = "DB", label_field = "N1")
#export_bib(unique_citations, filename = "unique_citations.bib", include = c("sources", "labels", "strings"))
#reimport_csv("unique-by-source.csv")

In summary

CiteSource can evaluate the usefulness of different databases against a set of benchmark studies before screening begins. In this example, both PsycInfo and Web of Science made unique contributions to the benchmark set and had a significant proportion of unique records. PubMed did not contribute any unique benchmark records and mostly overlapped with the other two databases — providing evidence that it may not be an effective addition for this topic.

mirror server hosted at Truenetwork, Russian Federation.