The qDEA package implements quantile Data Envelopment Analysis (qDEA), an extension of traditional DEA that allows for a pre-specified proportion of observations to lie outside the production frontier. This approach provides robust efficiency estimation in the presence of outliers or measurement error.
Traditional DEA assumes all observations should be on or below the production frontier. In contrast, qDEA allows a proportion α (alpha) of observations to lie outside the frontier, making the method more robust to outliers while maintaining computational efficiency through linear programming.
The qDEA method is based on:
We’ll start with the simplest case using the CST11 dataset from Cooper, Seiford, and Tone (2006).
# Load example data
data(CST11)
head(CST11)
#> STORE EMPLOYEES SALES SALES_EJOR SALES_EJOR_APDX
#> 1 A 2 1 1 1
#> 2 B 3 3 4 5
#> 3 C 3 2 2 2
#> 4 D 4 3 3 3
#> 5 E 5 4 4 4
#> 6 F 5 2 2 2
# Prepare input and output matrices
X <- as.matrix(CST11$EMPLOYEES)
Y <- as.matrix(CST11$SALES_EJOR)
# Run output-oriented DEA with constant returns to scale
result_dea <- qDEA(X = X, Y = Y,
orient = "out",
RTS = "CRS",
qout = 0) # Standard DEA (no outliers allowed)
#> [1] " on dmu 1 of 8"
#> [1] " on dmu 8 of 8"
# View efficiency scores
result_dea$effvals
#> [1] 2.666667 1.000000 2.000000 1.777778 1.666667 3.333333 2.666667 2.133333Efficiency scores range from 0 to 1, where 1 indicates the unit is on the efficient frontier.
Now let’s allow one observation (12.5% of 8 observations) to be outside the frontier:
# Run qDEA allowing one outlier
qout <- 1/nrow(X) # Proportion = 1/8 = 0.125
result_qdea <- qDEA(X = X, Y = Y,
orient = "out",
RTS = "CRS",
qout = qout)
#> [1] " on dmu 1 of 8"
#> [1] " on dmu 8 of 8"
# Compare DEA and qDEA efficiency scores
comparison <- data.frame(
Store = CST11$STORE,
DEA = round(result_dea$effvals, 3),
qDEA = round(result_qdea$effvalsq, 3),
Difference = round(result_qdea$effvalsq - result_dea$effvals, 3)
)
print(comparison)
#> Store DEA qDEA Difference
#> 1 A 2.667 1.600 -1.067
#> 2 B 1.000 0.600 -0.400
#> 3 C 2.000 1.200 -0.800
#> 4 D 1.778 1.067 -0.711
#> 5 E 1.667 1.000 -0.667
#> 6 F 3.333 2.000 -1.333
#> 7 G 2.667 1.600 -1.067
#> 8 H 2.133 1.280 -0.853Notice how qDEA scores can exceed 1.0, as some units are now allowed to be “super-efficient” relative to the tighter frontier.
Let’s examine a more complex example with multiple inputs:
# Load two-input, one-output data
data(CST21)
head(CST21)
#> STORE EMPLOYEES FLOOR_AREA SALES
#> 1 A 4 3 1
#> 2 B 7 3 1
#> 3 C 8 1 1
#> 4 D 4 2 1
#> 5 E 2 4 1
#> 6 F 5 2 1
# Prepare matrices
X <- as.matrix(CST21[, c("EMPLOYEES", "FLOOR_AREA")])
Y <- as.matrix(CST21$SALES)
# Input-oriented DEA with VRS
result_vrs <- qDEA(X = X, Y = Y,
orient = "in",
RTS = "VRS",
qout = 1/nrow(X))
#> [1] " on dmu 1 of 9"
#> [1] " on dmu 9 of 9"
# Display results
data.frame(
Store = CST21$STORE,
Employees = CST21$EMPLOYEES,
Floor_Area = CST21$FLOOR_AREA,
Sales = CST21$SALES,
Efficiency = round(result_vrs$effvalsq, 3)
)
#> Store Employees Floor_Area Sales Efficiency
#> 1 A 4.0 3.0 1 0.941
#> 2 B 7.0 3.0 1 0.696
#> 3 C 8.0 1.0 1 2.000
#> 4 D 4.0 2.0 1 1.143
#> 5 E 2.0 4.0 1 2.000
#> 6 F 5.0 2.0 1 1.000
#> 7 G 6.0 4.0 1 0.667
#> 8 H 5.5 2.5 1 0.865
#> 9 I 6.0 2.5 1 0.821# Load one-input, two-output data
data(CST12)
head(CST12)
#> STORE EMPLOYEES CUSTOMERS SALES
#> 1 A 1 1 5
#> 2 B 1 2 7
#> 3 C 1 3 4
#> 4 D 1 4 3
#> 5 E 1 4 6
#> 6 F 1 5 5
# Prepare matrices
X <- as.matrix(CST12$EMPLOYEES)
Y <- as.matrix(CST12[, c("CUSTOMERS", "SALES")])
# Output-oriented qDEA
result_outputs <- qDEA(X = X, Y = Y,
orient = "out",
RTS = "CRS",
qout = 0.15) # Allow 15% outliers
#> [1] " on dmu 1 of 7"
#> [1] " on dmu 7 of 7"
# Results
data.frame(
Store = CST12$STORE,
Employees = CST12$EMPLOYEES,
Customers = CST12$CUSTOMERS,
Sales = CST12$SALES,
DEA_Eff = round(result_outputs$effvals, 3),
qDEA_Eff = round(result_outputs$effvalsq, 3)
)
#> Store Employees Customers Sales DEA_Eff qDEA_Eff
#> 1 A 1 1 5 1.400 1.200
#> 2 B 1 2 7 1.000 0.857
#> 3 C 1 3 4 1.429 1.389
#> 4 D 1 4 3 1.333 1.273
#> 5 E 1 4 6 1.000 0.962
#> 6 F 1 5 5 1.000 0.933
#> 7 G 1 6 2 1.000 0.833A realistic example using hospital data:
# Load hospital data
data(CST22)
head(CST22)
#> HOSPITAL DOCTORS NURSES OUT_PATIENTS IN_PATIENTS
#> 1 A 20 151 100 90
#> 2 B 19 131 150 50
#> 3 C 25 160 160 55
#> 4 D 27 168 180 72
#> 5 E 22 158 94 66
#> 6 F 55 255 230 90
# Prepare matrices
X <- as.matrix(CST22[, c("DOCTORS", "NURSES")])
Y <- as.matrix(CST22[, c("OUT_PATIENTS", "IN_PATIENTS")])
# Run qDEA with 10% outliers allowed
result_hospital <- qDEA(X = X, Y = Y,
orient = "in",
RTS = "VRS",
qout = 0.10)
#> [1] " on dmu 1 of 12"
#> [1] " on dmu 12 of 12"
# Create results table
hospital_results <- data.frame(
Hospital = CST22$HOSPITAL,
Doctors = CST22$DOCTORS,
Nurses = CST22$NURSES,
Out_Patients = CST22$OUT_PATIENTS,
In_Patients = CST22$IN_PATIENTS,
Efficiency = round(result_hospital$effvalsq, 3)
)
print(hospital_results)
#> Hospital Doctors Nurses Out_Patients In_Patients Efficiency
#> 1 A 20 151 100 90 1.423
#> 2 B 19 131 150 50 1.272
#> 3 C 25 160 160 55 1.000
#> 4 D 27 168 180 72 1.049
#> 5 E 22 158 94 66 0.956
#> 6 F 55 255 230 90 0.984
#> 7 G 33 235 220 88 1.001
#> 8 H 31 206 152 80 0.814
#> 9 I 30 244 190 100 1.000
#> 10 J 50 268 250 100 1.060
#> 11 K 53 306 260 147 NA
#> 12 L 38 284 250 120 1.263
# Identify efficient hospitals
efficient <- hospital_results$Hospital[hospital_results$Efficiency >= 0.99]
cat("\nEfficient hospitals:", paste(efficient, collapse = ", "))
#>
#> Efficient hospitals: A, B, C, D, G, I, J, NA, LThe qDEA package supports multiple orientations:
Minimize inputs while maintaining output levels:
result_in <- qDEA(X = X, Y = Y, orient = "in", RTS = "CRS", qout = 0.10)
#> [1] " on dmu 1 of 12"
#> [1] " on dmu 12 of 12"
cat("Input-oriented efficiencies:\n")
#> Input-oriented efficiencies:
print(round(result_in$effvalsq, 3))
#> [1] 1.408 1.184 0.960 1.016 0.965 0.846 1.000 0.804 0.996 0.885 0.964 1.000Maximize outputs while maintaining input levels:
result_out <- qDEA(X = X, Y = Y, orient = "out", RTS = "CRS", qout = 0.10)
#> [1] " on dmu 1 of 12"
#> [1] " on dmu 12 of 12"
cat("Output-oriented efficiencies:\n")
#> Output-oriented efficiencies:
print(round(result_out$effvalsq, 3))
#> [1] 0.710 0.844 1.042 0.984 1.036 1.181 1.000 1.243 1.004 1.131 1.038 1.000Minimize inputs and maximize outputs simultaneously:
result_graph <- qDEA(X = X, Y = Y, orient = "inout", RTS = "VRS", qout = 0.10)
#> [1] " on dmu 1 of 12"
#> [1] " on dmu 12 of 12"
cat("Graph-oriented distances:\n")
#> Graph-oriented distances:
print(round(result_graph$distvalsq, 3))
#> [1] -0.186 -0.179 0.040 -0.020 0.000 0.007 0.000 0.103 0.000 -0.023
#> [11] -0.184 -0.067Compare different returns to scale assumptions:
# Constant Returns to Scale (CRS)
result_crs <- qDEA(X = X, Y = Y, orient = "in", RTS = "CRS", qout = 0.10)
#> [1] " on dmu 1 of 12"
#> [1] " on dmu 12 of 12"
# Variable Returns to Scale (VRS)
result_vrs_comp <- qDEA(X = X, Y = Y, orient = "in", RTS = "VRS", qout = 0.10)
#> [1] " on dmu 1 of 12"
#> [1] " on dmu 12 of 12"
# Decreasing Returns to Scale (DRS)
result_drs <- qDEA(X = X, Y = Y, orient = "in", RTS = "DRS", qout = 0.10)
#> [1] " on dmu 1 of 12"
#> [1] " on dmu 12 of 12"
# Increasing Returns to Scale (IRS)
result_irs <- qDEA(X = X, Y = Y, orient = "in", RTS = "IRS", qout = 0.10)
#> [1] " on dmu 1 of 12"
#> [1] " on dmu 12 of 12"
# Compare results
rts_comparison <- data.frame(
Hospital = CST22$HOSPITAL,
CRS = round(result_crs$effvalsq, 3),
VRS = round(result_vrs_comp$effvalsq, 3),
DRS = round(result_drs$effvalsq, 3),
IRS = round(result_irs$effvalsq, 3)
)
print(rts_comparison)
#> Hospital CRS VRS DRS IRS
#> 1 A 1.408 1.423 1.408 1.408
#> 2 B 1.184 1.272 1.184 1.184
#> 3 C 0.960 1.000 0.960 0.960
#> 4 D 1.016 1.049 1.049 1.049
#> 5 E 0.965 0.956 0.965 0.965
#> 6 F 0.846 0.984 0.984 0.984
#> 7 G 1.000 1.001 1.001 1.001
#> 8 H 0.804 0.814 0.814 0.814
#> 9 I 0.996 1.000 1.000 1.000
#> 10 J 0.885 1.060 1.060 1.060
#> 11 K 0.964 NA NA NA
#> 12 L 1.000 1.263 1.263 1.263VRS efficiency is always ≥ CRS efficiency, as VRS is a less restrictive assumption.
Bootstrap methods can correct for bias in efficiency estimates:
# Run qDEA with bootstrap (100 replications)
# Note: This takes longer to run
result_boot <- qDEA(X = X, Y = Y,
orient = "in",
RTS = "VRS",
qout = 0.10,
nboot = 100,
seedval = 12345)
# Access bias-corrected estimates
bias_corrected <- result_boot$BOOT_DATA$effvalsq.bc
# Compare original and bias-corrected
comparison_boot <- data.frame(
Hospital = CST22$HOSPITAL,
Original = round(result_boot$effvalsq, 3),
Bias_Corrected = round(bias_corrected, 3),
Bias = round(result_boot$effvalsq - bias_corrected, 3)
)
print(comparison_boot)Identify which units serve as benchmarks for inefficient units:
# Run qDEA to get peer information
data(CST11)
X <- as.matrix(CST11$EMPLOYEES)
Y <- as.matrix(CST11$SALES_EJOR)
result_peers <- qDEA(X = X, Y = Y,
orient = "out",
RTS = "VRS",
qout = 0.10)
#> [1] " on dmu 1 of 8"
#> [1] " on dmu 8 of 8"
# Access peer information
peers <- result_peers$PEER_DATA$PEERSq
print(peers)
#> dmu0 dmuz z
#> 1 1 1 1.0
#> 2 2 2 1.0
#> 3 3 2 1.0
#> 4 4 2 0.8
#> 5 4 8 0.2
#> 6 5 2 0.6
#> 7 5 8 0.4
#> 8 6 2 0.6
#> 9 6 8 0.4
#> 10 7 2 0.4
#> 11 7 8 0.6
#> 12 8 8 1.0The peers dataframe shows which efficient units (and with what weights) form the benchmark for each inefficient unit.
Calculate target input/output levels for inefficient units:
# Run with projection calculation
result_proj <- qDEA(X = X, Y = Y,
orient = "out",
RTS = "VRS",
qout = 0.10,
getproject = TRUE)
#> [1] " on dmu 1 of 8"
#> [1] " on dmu 8 of 8"
# Access projected values
X_target <- result_proj$PROJ_DATA$X0HATq
Y_target <- result_proj$PROJ_DATA$Y0HATq
# Compare actual vs. target
projection_comparison <- data.frame(
Store = CST11$STORE,
Actual_Input = X[,1],
Target_Input = round(X_target[,1], 1),
Actual_Output = Y[,1],
Target_Output = round(Y_target[,1], 1),
Efficiency = round(result_proj$effvalsq, 3)
)
print(projection_comparison)
#> Store Actual_Input Target_Input Actual_Output Target_Output Efficiency
#> 1 A 2 2 1 1.0 1.000
#> 2 B 3 3 4 4.0 1.000
#> 3 C 3 3 2 4.0 2.000
#> 4 D 4 4 3 4.2 1.400
#> 5 E 5 5 4 4.4 1.100
#> 6 F 5 5 2 4.4 2.200
#> 7 G 6 6 3 4.6 1.533
#> 8 H 8 8 5 5.0 1.000For more precise qDEA estimates, use iterative refinement:
# Run with multiple iterations
result_iter <- qDEA(X = X, Y = Y,
orient = "out",
RTS = "CRS",
qout = 0.15,
nqiter = 5) # Allow up to 5 iterations
#> [1] " on dmu 1 of 8"
#> [1] " on dmu 8 of 8"
# Check number of iterations used
cat("Iterations completed:", result_iter$LP_DATA$qiter, "\n")
#> Iterations completed: 1 1 1 1 1 1 1 1
# Check actual proportion of outliers
cat("Actual outlier proportion:", round(result_iter$LP_DATA$qhat, 4), "\n")
#> Actual outlier proportion: 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125The algorithm iteratively adjusts the frontier until the proportion of outliers converges to the specified qout value.
Higher qout values make the frontier more robust but may be too permissive.
When in doubt, start with VRS as it’s the most general assumption.
The main function qDEA() returns a list with several
components:
result <- qDEA(X = X, Y = Y, orient = "out", RTS = "CRS", qout = 0.10)
# Main efficiency scores
result$effvals # DEA efficiency scores
result$effvalsq # qDEA efficiency scores
result$distvals # DEA distance measures
result$distvalsq # qDEA distance measures
# Input data (for reference)
result$INPUT_DATA
# Bootstrap data (if nboot > 0)
result$BOOT_DATA
# Peer information
result$PEER_DATA
# Projected values (if getproject = TRUE)
result$PROJ_DATA
# LP solver information
result$LP_DATAHere’s a complete analysis workflow:
# Load data
data(CST22)
# Prepare data
X <- as.matrix(CST22[, c("DOCTORS", "NURSES")])
Y <- as.matrix(CST22[, c("OUT_PATIENTS", "IN_PATIENTS")])
# Run comprehensive analysis
result_complete <- qDEA(
X = X,
Y = Y,
orient = "in",
RTS = "VRS",
qout = 0.10,
nqiter = 3,
getproject = TRUE
)
#> [1] " on dmu 1 of 12"
#> [1] " on dmu 12 of 12"
# Create comprehensive results table
complete_results <- data.frame(
Hospital = CST22$HOSPITAL,
Doctors = CST22$DOCTORS,
Nurses = CST22$NURSES,
Out_Patients = CST22$OUT_PATIENTS,
In_Patients = CST22$IN_PATIENTS,
DEA_Eff = round(result_complete$effvals, 3),
qDEA_Eff = round(result_complete$effvalsq, 3),
Target_Doctors = round(result_complete$PROJ_DATA$X0HATq[,1], 1),
Target_Nurses = round(result_complete$PROJ_DATA$X0HATq[,2], 1)
)
print(complete_results)
#> Hospital Doctors Nurses Out_Patients In_Patients DEA_Eff qDEA_Eff
#> 1 A 20 151 100 90 1.000 1.423
#> 2 B 19 131 150 50 1.000 1.272
#> 3 C 25 160 160 55 0.896 1.000
#> 4 D 27 168 180 72 1.000 1.049
#> 5 E 22 158 94 66 0.882 0.956
#> 6 F 55 255 230 90 0.939 0.984
#> 7 G 33 235 220 88 1.000 1.001
#> 8 H 31 206 152 80 0.799 0.814
#> 9 I 30 244 190 100 0.989 1.000
#> 10 J 50 268 250 100 1.000 1.060
#> 11 K 53 306 260 147 1.000 NA
#> 12 L 38 284 250 120 1.000 1.263
#> Target_Doctors Target_Nurses
#> 1 28.5 214.8
#> 2 24.2 166.6
#> 3 25.0 160.0
#> 4 28.3 176.2
#> 5 21.0 151.0
#> 6 54.1 250.9
#> 7 33.0 235.3
#> 8 25.2 167.6
#> 9 30.0 244.0
#> 10 53.0 284.0
#> 11 NA NA
#> 12 48.0 358.7
# Summary statistics
cat("\n=== Summary Statistics ===\n")
#>
#> === Summary Statistics ===
cat("Mean DEA efficiency:", round(mean(result_complete$effvals), 3), "\n")
#> Mean DEA efficiency: 0.959
cat("Mean qDEA efficiency:", round(mean(result_complete$effvalsq), 3), "\n")
#> Mean qDEA efficiency: NA
cat("Efficient units (qDEA ≥ 0.99):",
sum(result_complete$effvalsq >= 0.99), "out of", nrow(X), "\n")
#> Efficient units (qDEA ≥ 0.99): NA out of 12
# Potential input reductions
input_savings <- cbind(
X - result_complete$PROJ_DATA$X0HATq
)
colnames(input_savings) <- c("Doctor_Savings", "Nurse_Savings")
cat("\nTotal potential input reductions:\n")
#>
#> Total potential input reductions:
cat("Doctors:", round(sum(input_savings[,1]), 1), "\n")
#> Doctors: NA
cat("Nurses:", round(sum(input_savings[,2]), 1), "\n")
#> Nurses: NAHere’s how to visualize efficiency scores:
# Create efficiency comparison plot
data(CST22)
X <- as.matrix(CST22[, c("DOCTORS", "NURSES")])
Y <- as.matrix(CST22[, c("OUT_PATIENTS", "IN_PATIENTS")])
result_viz <- qDEA(X = X, Y = Y, orient = "in", RTS = "VRS", qout = 0.10)
#> [1] " on dmu 1 of 12"
#> [1] " on dmu 12 of 12"
# Prepare data for plotting
plot_data <- data.frame(
Hospital = CST22$HOSPITAL,
DEA = result_viz$effvals,
qDEA = result_viz$effvalsq
)
# Simple bar plot
barplot(
t(as.matrix(plot_data[, c("DEA", "qDEA")])),
beside = TRUE,
names.arg = plot_data$Hospital,
col = c("skyblue", "coral"),
main = "Hospital Efficiency: DEA vs qDEA",
ylab = "Efficiency Score",
xlab = "Hospital",
legend.text = c("DEA", "qDEA"),
args.legend = list(x = "topright")
)
abline(h = 1, lty = 2, col = "red")Cause: Dataset may be too small or all units are on the frontier.
Solution: Try a larger dataset or verify data quality.
Cause: Possible outliers or data quality issues.
Solution: - Increase qout to allow more outliers - Verify data for errors - Consider whether VRS is more appropriate than CRS
Cause: Large dataset or too many bootstrap replications.
Solution: - Start with fewer replications (nboot = 100) - Process a subset of units using dmulist parameter - Run on a more powerful computer
Cooper, W.W., Seiford, L.M., and Tone, K. (2006). Introduction to Data Envelopment Analysis and Its Uses. Springer, New York.
Atwood, J., and Shaik, S. (2020). Theory and Statistical Properties of Quantile Data Envelopment Analysis. European Journal of Operational Research, 286:649-661. https://doi.org/10.1016/j.ejor.2020.03.054
Atwood, J., and Shaik, S. (2018). Quantile DEA: Estimating qDEA-alpha Efficiency Estimates with Conventional Linear Programming. In Productivity and Inequality. Springer Press. https://doi.org/10.1007/978-3-319-68678-3_4
Simar, L., and Wilson, P.W. (2011). Two-stage DEA: caveat emptor. Journal of Productivity Analysis, 36:205-218.
For questions or issues with the qDEA package:
sessionInfo()
#> R version 4.5.2 (2025-10-31 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 26200)
#>
#> Matrix products: default
#> LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_COLLATE=C
#> [2] LC_CTYPE=English_United States.utf8
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.utf8
#>
#> time zone: America/Denver
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] qDEA_1.0.0
#>
#> loaded via a namespace (and not attached):
#> [1] vctrs_0.7.1 cli_3.6.5 knitr_1.51 rlang_1.1.7
#> [5] xfun_0.56 otel_0.2.0 highs_1.12.0-3 generics_0.1.4
#> [9] jsonlite_2.0.0 glue_1.8.0 backports_1.5.0 htmltools_0.5.9
#> [13] sass_0.4.10 rmarkdown_2.30 grid_4.5.2 tibble_3.3.1
#> [17] evaluate_1.0.5 jquerylib_0.1.4 fastmap_1.2.0 yaml_2.3.12
#> [21] lifecycle_1.0.5 compiler_4.5.2 dplyr_1.2.0 pkgconfig_2.0.3
#> [25] Rcpp_1.1.1 rstudioapi_0.18.0 lattice_0.22-9 digest_0.6.39
#> [29] R6_2.6.1 tidyselect_1.2.1 pillar_1.11.1 magrittr_2.0.4
#> [33] bslib_0.10.0 checkmate_2.3.4 Matrix_1.7-4 tools_4.5.2
#> [37] cachem_1.1.0