Introduction to EvalTest

Introduction

The ‘EvalTest’ package provides a function to compute performance indicators with their confidence intervals, and a ‘Shiny’ application for evaluating diagnostic test performance using data from laboratory or diagnostic research. It supports both binary and continuous test variables. It allows users to compute key performance indicators with their confidence interval and visualize Receiver Operating Characteristic (ROC) curves, determine optimal cut-off thresholds, display confusion matrix, and export publication-ready plot. It aims to facilitate the application of statistical methods in diagnostic test evaluation by healthcare professionals.

Installation

You can install the development version of ‘EvalTest’ from GitHub like so (if you don’t have ‘devtools’ package installed, you can install it first using install.packages("devtools")):

devtools::install_github("NassimAyad87/EvalTest", dependencies = TRUE)

Or from CRAN:

install.packages("EvalTest", dependencies = TRUE)

Compute diagnostic test indicators

After installing the package, you can load it:

library(EvalTest)

The function compute_indicators() computes sensitivity, specificity, predictive values, likelihood ratios, accuracy, and Youden index with confidence intervals based on a 2x2 table of diagnostic test results.

compute_indicators(tp, fp, fn, tn, prev, conf = 0.95)

Where:

tp: True positives
fp: False positives
fn: False negatives
tn: True negatives
prev: Prevalence of the disease in the population (numeric between 0 and 1)
conf: Confidence level (default 0.95)

It returns a list with all diagnostic indicators and confidence intervals.

Launching the application

You can launch the application following these steps:

EvalTest::run_app()

This will open the ‘Shiny’ application in your default web browser or your RStudio viewer.

Using the application

The application is designed to be user-friendly and intuitive. Here are the steps to use it:

Before uploading your data, you should ensure that the test variable is in one column (either qualitative 1/0 or quantitative) and the reference variable (disease status) is in another column (binary: 1/0), and there are no missing values in the selected columns.

Upload your data in Excel format (.xlsx) by pressing the Browse button in the Data import and parameters setting panel.
Choose your variable test type (Qualitative binary 1/0 or Quantitative).
Select the appropriate columns for test variable and reference variable (disease status).
Input disease prevalence value of the study population (number between 0 and 1).

Run the analysis and explore the results in the different tabs.
You can download the ROC plot and the results tables for your report.

We can see below some screenshots of the different tabs of the application. We have ROC curve with its confidence interval, optimal cut-off point of test variable, AUC value and its confidence interval, and projection of best sensitivity and specificity according to the top-left method. We can also download the plot in PNG format.

We have also the confusion matrix where test variable was dichotomized to binary variable (positive/negative test) according the best cut-off point, with the counts of true positives, false positives, true negatives, and false negatives. We can download it in Excel file format.

We have all computed performance indicators with their estimate and confidence intervals built according to Wilson method.

Statistical formulas

The main diagnostic performance indicators computed by EvalTest are defined as follows:

Determining the optimal Cut-off Threshold (Top-left Method)

The top-left method is a common approach to select the optimal cut-off threshold from the ROC curve.
It identifies the point on the curve that is closest to the ideal point (0,1), which corresponds to perfect sensitivity (100%) and specificity (100%).

The optimal cut-off threshold is the value of \(t\) that minimizes this distance:

\[ t^{*} = \arg\min_{t} \; d(t) \]

For each threshold \(t\), the Euclidean distance to the point (0,1) is calculated as:

\[ d(t) = \sqrt{(1 - \text{Se}(t))^2 + (1 - \text{Sp}(t))^2} \]

where:

\(\text{Se}(t)\) is the sensitivity at threshold \(t\),
\(\text{Sp}(t)\) is the specificity at threshold \(t\).

Estimates

Sensitivity (Se):

\[ Se = \frac{TP}{TP + FN} \]

Specificity (Sp):

\[ Sp = \frac{TN}{TN + FP} \]

Positive predictive value (PPV):

\[ PPV = \frac{Se \times Prev}{(Se \times Prev) + (1 - Sp)(1 - Prev)} \]

Negative predictive value (NPV):

\[ NPV = \frac{Sp \times (1 - Prev)}{(1 - Se) \times Prev + Sp \times (1 - Prev)} \]

Likelihood ratios:

\[ LR^+ = \frac{Se}{1 - Sp}, \qquad LR^- = \frac{1 - Se}{Sp} \]

Accuracy (Acc):

\[ Acc = \frac{TP + TN}{TP + FP + FN + TN} \]

Youden’s index (J):

\[ J = Se + Sp - 1 \]

Confidence intervals

For proportions (Se, Sp, Acc), Wilson binomial confidence intervals are used:

Let:

x = number of “successes” (e.g. true positives for sensitivity, true negatives for specificity)
n = number of trials (e.g. TP + FN for sensitivity, TN + FP for specificity)
\(\hat{p} = x/n\) = observed proportion
\(z = z_{1-\alpha/2}\) = quantile of the standard normal distribution (1.96 for 95% CI)

The Wilson adjusted estimate is

\[ \hat{p}_W = \frac{\hat{p} + \tfrac{z^2}{2n}}{1 + \tfrac{z^2}{n}} \]

The half-width of the confidence interval is

\[ d = \frac{z}{1 + \tfrac{z^2}{n}} \sqrt{\frac{\hat{p}(1 - \hat{p})}{n} + \frac{z^2}{4n^2}} \]

Therefore,

\[ CI_{Wilson} = [\hat{p}_W - d,\; \hat{p}_W + d] \] In practice, the function binom::binom.confint(method = "wilson") is applied.

For predictive values PPV and NPV which are monotonic functions of sensitivity (Se) and specificity (Sp). Their confidence intervals were therefore obtained by propagation of uncertainty using the confidence bounds of Se and Sp. Because the variability of PPV and NPV is driven by the uncertainty in Se and Sp, the lower and upper limits of PPV and NPV were determined by evaluating all combinations of the confidence interval limits of Se and Sp and retaining the minimum and maximum resulting values. Assuming independence between the samples used to estimate Se and Sp (diseased and non-diseased two groups), the joint coverage probability corresponds to the product of their marginal coverage probabilities. Consequently, to obtain nominal 95% confidence intervals for PPV and NPV, Se and Sp were computed at the 97.5% confidence level instead of using previous computed ones 95% (since 0.975² ≈ 0.95). The confidence intervals for PPV and NPV were therefore calculated as follows:

\[ CI_{PPV} = \left[ \min_{Se,Sp} f(Se,Sp), \; \max_{Se,Sp} f(Se,Sp) \right] \]

where \(f(Se,Sp)\) is the PPV function given above (analogous for NPV).

For Likelihood ratios (LR⁺, LR⁻), confidence intervals are computed on the log scale:

\[ CI_{LR} = \exp \left( \ln(LR) \pm 1.96 \times SE(\ln(LR)) \right) \]

The standard errors for LR+ et LR- used in the package are estimated from TP, FP, TN, FN counts according to the delta method:

\[ \operatorname{SE}\!\left[\ln\!\left(\mathrm{LR}^+\right)\right] = \sqrt{\frac{1-\mathrm{Se}}{TP} + \frac{\mathrm{Sp}}{FP}}, \]

\[ \operatorname{SE}\!\left[\ln\!\left(\mathrm{LR}^-\right)\right] = \sqrt{\frac{\mathrm{Se}}{FN} + \frac{1-\mathrm{Sp}}{TN}}. \]

For Youden’s index, the standard error is approximated as (based on binomial variance of Se and Sp and summing variances of independent proportions):

\[ SE(J) = \sqrt{\frac{Se(1-Se)}{TP+FN} + \frac{Sp(1-Sp)}{TN+FP}} \]

and the 95% CI is:

\[ CI_J = J \pm 1.96 \times SE(J) \]

Citation

If you use ‘EvalTest’ in your research, please cite it as running the following script on Rconsole:

citation("EvalTest")

Or just cite as:

Ayad N (2026). EvalTest: Tools for Evaluating Diagnostic Test Performance. R package version 1.0.6. https://CRAN.R-project.org/package=EvalTest

References

Baratloo A, Hosseini M, Negida A, El Ashal G. Part 1: Simple definition and calculation of accuracy, sensitivity and specificity. Emerg (Tehran). 2015;3:48-49.
Brown LD, Cai TT, DasGupta A. Interval Estimation for a Binomial Proportion. Statist. Sci. 2001;16:101–133.
Deeks JJ, Altman DG. Diagnostic tests 4: likelihood ratios BMJ 2004;329:168
Habibzadeh F. Diagnostic tests performance indices: an overview. Biochem Med (Zagreb). 2025;35:01010.
Hassanzad M, Hajian-Tilaki K. Methods of determining optimal cut-point of diagnostic biomarkers with application of clinical data in ROC analysis: an update review. BMC Med Res Methodol. 2024;24:84.
Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011;12:77.
Simel DL, Samsa GP, Matchar DB. Likelihood ratios with confidence: sample size estimation for diagnostic test studies. J Clin Epidemiol. 1991;44:763–770.
Ying GS, Maguire MG, Glynn RJ, Rosner B. Calculating Sensitivity, Specificity, and Predictive Values for Correlated Eye Data. Invest Ophthalmol Vis Sci. 2020;61:29.
Youden WJ. Index for rating diagnostic tests. Cancer. 1950;3:32-35.