The Weighted Interval Score (WIS) is a probabilistic measure for evaluating forecast accuracy, balancing three key aspects of forecast quality:
WIS penalizes wide intervals or those that fail to include the observed outcome, while rewarding narrow, well-calibrated intervals. This makes WIS an interpretable and robust metric for assessing both forecast accuracy and uncertainty.
As of version 1.1.0, the NobBS
package can explicitly
return quantile estimates when calling the NobBS
and
NobBS.strat
functions, which can be used to calculate the
WIS for the given nowcast. This vignette demonstrates how to calculate
and compare the WIS for different nowcasting models using quantile
estimates generated by the NobBS
package, in combination
with the scoringutils
package (version 2.0.0 or later).
To illustrate this process, we run two nowcasting models — a Poisson
model and a negative binomial model — for five nowcasting dates, using
the dengue dataset provided in the NobBS
package. To obtain
quantile estimates we set the desired quantiles in the new
quantiles
parameter within the specs
list
given in the call to the NobBS
and NobBS.strat
functions. By default, NobBS
returns estimates for five
quantiles (0.025, 0.25, 0.5, 0.75, 0.975), representing the median along
with the 50% and 95% prediction intervals. Here, we use a more detailed
set of 23 quantile values to better capture the full predictive
distribution, ensuring a more granular assessment of forecast
uncertainty and improving the accuracy of the WIS calculation:
library(NobBS)
library(dplyr)
library(ggplot2)
library(scoringutils)
data(denguedat)
# Helper function to run nowcasting
run_nowcast <- function(data, now, win_size, specs) {
NobBS(data, now, units = "1 week", onset_date = "onset_week",
report_date = "report_week", moving_window = win_size, specs = specs)
}
quantiles <- c(0.025,seq(0.05,0.95,0.05),0.975)
q_len <- length(quantiles)
q_cols <- paste0('q_',quantiles)
# Poisson model specs
specs_poisson <- list(
dist=c("Poisson"),
quantiles=quantiles
)
#NB model specs
specs_nb = list(
dist=c("NB"),
quantiles=quantiles
)
# Initialize lists to store nowcasts
nowcasts_pois <- list() # Nowcasts using poisson model
nowcasts_nb <- list() # Nowcasts using negtive-binomial model
test_dates <- seq(as.Date("1994-08-29"),as.Date("1994-09-26"),by=7)
win_size <- 8
for (t in seq_along(test_dates)) {
now <- test_dates[t]
denguedat1 <- denguedat[denguedat$onset_week<=now,]
nowcasts_pois[[t]] <- run_nowcast(denguedat1, now, win_size, specs_poisson)
nowcasts_nb[[t]] <- run_nowcast(denguedat1, now, win_size, specs_nb)
}
Now let us plot and compare the estimates obtained from each of these nowcasts:
plot_estimates <- function(nowcast1, nowcast2, cases_per_date, now) {
# Ensure input data is valid
if (is.null(nowcast1$estimates) || is.null(nowcast2$estimates)) {
stop("Nowcast estimates are missing.")
}
# Extract estimates and credible intervals for nowcast without DoW effect
onset_dates1 <- nowcast1$estimates$onset_date
estimates1 <- nowcast1$estimates$estimate
lower1 <- nowcast1$estimates$q_0.025 # 2.5% quantile (lower bound of 95% CI)
upper1 <- nowcast1$estimates$q_0.975 # 97.5% quantile (upper bound of 95% CI)
# Extract estimates and credible intervals for nowcast with DoW effect
onset_dates2 <- nowcast2$estimates$onset_date
estimates2 <- nowcast2$estimates$estimate
lower2 <- nowcast2$estimates$q_0.025
upper2 <- nowcast2$estimates$q_0.975
# Extract eventual case counts
case_dates <- cases_per_date$onset_week
case_counts <- cases_per_date$count
# Calculate plot range
min_val <- min(c(lower1, lower2, case_counts), na.rm = TRUE)
max_val <- max(c(upper1, upper2, case_counts), na.rm = TRUE) * 1.35
# Create the plot
plot(
onset_dates1, estimates1, col = 'blue', type = 'l',
xlab = 'Onset Date', ylab = 'Cases',
ylim = c(min_val, max_val), lwd = 2,
main = paste0('Incidence Estimates for ', now)
)
lines(onset_dates2, estimates2, col = 'red', lwd = 2)
# Add 95% PI shaded regions for both nowcasts
polygon(c(onset_dates1, rev(onset_dates1)), c(lower1, rev(upper1)),
col = rgb(0, 0, 1, 0.2), border = NA)
polygon(c(onset_dates2, rev(onset_dates2)), c(lower2, rev(upper2)),
col = rgb(1, 0, 0, 0.2), border = NA)
# Add true case counts as points
points(case_dates, case_counts, col = 'black', pch = 20)
# Add a legend
legend(
'topleft',
legend = c('Estimates Poisson model',
'Estimates negative binomial model',
'95% PI (Poisson model)',
'95% PI (negative binomial model)',
'Eventual cases'),
col = c('blue', 'red', rgb(0, 0, 1, 0.2), rgb(1, 0, 0, 0.2), 'black'),
lty = c(1, 1, NA, NA, NA), lwd = c(2, 2, NA, NA, NA),
pch = c(NA, NA, 15, 15, 20),
pt.cex = c(NA, NA, 1.5, 1.5, 1),
cex = 0.9
)
}
cases_per_week <- denguedat %>% group_by(onset_week) %>% summarize(count=n())
for (t in seq_along(test_dates)) {
now <- test_dates[t]
nowcast_pois <- nowcasts_pois[[t]]
nowcast_nb <- nowcasts_nb[[t]]
cases_per_week1 <- cases_per_week[cases_per_week$onset_week <= now, ]
plot_estimates(nowcast_pois, nowcast_nb, cases_per_week1, now)
}
As seen in these plots, the mean estimates of the two models are very similar, but the negative binomial model has wider prediction intervals, which appear to better capture the actual number of eventual cases.
Now, let us calculate the WIS for these nowcasting estimates. To do
this, we first need to extract the quantile estimates from the nowcast
results and use them to construct a data frame compatible with the
scoringutils
package.
The quantile estimates can be found in the estimates
dataframe within the list returned by the NobBS
and
NobBS.strat
functions. The following code iterates over the
nowcasting results, extracts the quantile estimates, and combines them
with the observed case counts into a unified data frame. From each
nowcast we extract the estimates for three horizons: 0 (the current
week), -1 (the previous week) and -2 (two weeks ago).
data <- data.frame(onset_week=as.Date(character()),
now=as.Date(character()),
horizon=numeric(),
quantile_level=numeric(),
predicted=numeric(),
observed=numeric(),
model=character())
cases_per_week <- denguedat %>% group_by(onset_week) %>% summarize(count=n())
horizons <- c(-2,-1,0)
for (t in seq_along(test_dates)) {
now <- test_dates[t]
nowcast_pois <- nowcasts_pois[[t]]
nowcast_nb <- nowcasts_nb[[t]]
for(h in horizons) {
date <- now + h*7
true_value <- cases_per_week[cases_per_week$onset_week==date,]$count
q_est_pois <- unname(unlist(nowcast_pois$estimates[nowcast_pois$estimates$onset_date==date,q_cols]))
q_est_nb <- unname(unlist(nowcast_nb$estimates[nowcast_nb$estimates$onset_date==date,q_cols]))
data_est <- data.frame(onset_week=rep(date,q_len*2),
now=rep(now,q_len*2),
horizon=rep(h,q_len*2),
quantile_level=rep(quantiles,2),
predicted=c(q_est_pois,q_est_nb),
observed=rep(true_value,q_len*2),
model=c(rep('Pois_model',q_len),rep('NB_model',q_len)))
data <- rbind(data,data_est)
}
}
print(head(data))
#> onset_week now horizon quantile_level predicted observed model
#> 1 1994-08-15 1994-08-29 -2 0.025 131 135 Pois_model
#> 2 1994-08-15 1994-08-29 -2 0.050 132 135 Pois_model
#> 3 1994-08-15 1994-08-29 -2 0.100 133 135 Pois_model
#> 4 1994-08-15 1994-08-29 -2 0.150 133 135 Pois_model
#> 5 1994-08-15 1994-08-29 -2 0.200 134 135 Pois_model
#> 6 1994-08-15 1994-08-29 -2 0.250 135 135 Pois_model
print(tail(data))
#> onset_week now horizon quantile_level predicted observed model
#> 625 1994-09-26 1994-09-26 0 0.750 196.00 210 NB_model
#> 626 1994-09-26 1994-09-26 0 0.800 204.00 210 NB_model
#> 627 1994-09-26 1994-09-26 0 0.850 215.15 210 NB_model
#> 628 1994-09-26 1994-09-26 0 0.900 230.00 210 NB_model
#> 629 1994-09-26 1994-09-26 0 0.950 259.05 210 NB_model
#> 630 1994-09-26 1994-09-26 0 0.975 289.00 210 NB_model
To help interpret this table, consider the first row as an example. The 0.025 quantile level means that 131 cases was the lower bound of the 95% prediction interval for the August 15, 1994 onset week, forecasted on August 29, 1994 (horizon = -2).Since the actual observed value was 135 cases, this indicates that the forecasted interval appropriately captured the lower tail of the distribution.
Next, we utilize the scoringutils
package to compute the
WIS for the Poisson and negative binomial models using the constructed
data frame.
nowcasts <- data %>%
scoringutils::as_forecast_quantile()
scores <- scoringutils::score(nowcasts,
get_metrics(nowcasts,select=c("wis","overprediction","underprediction","dispersion")))
print(head(scores))
#> onset_week now horizon model wis overprediction
#> <Date> <Date> <num> <char> <num> <num>
#> 1: 1994-08-15 1994-08-29 -2 Pois_model 1.259524 0.4761905
#> 2: 1994-08-15 1994-08-29 -2 NB_model 1.035714 0.1428571
#> 3: 1994-08-22 1994-08-29 -1 Pois_model 10.921429 8.5714286
#> 4: 1994-08-22 1994-08-29 -1 NB_model 5.976190 2.9523810
#> 5: 1994-08-29 1994-08-29 0 Pois_model 16.635714 9.8095238
#> 6: 1994-08-29 1994-08-29 0 NB_model 9.364286 1.4761905
#> underprediction dispersion
#> <num> <num>
#> 1: 0 0.7833333
#> 2: 0 0.8928571
#> 3: 0 2.3500000
#> 4: 0 3.0238095
#> 5: 0 6.8261905
#> 6: 0 7.8880952
To help interpret this table, consider the first row as an example. For the onset week of August 15, 1994, using the Poisson model, the nowcast made on August 29, 1994 (horizon = -2) resulted in a WIS of 1.259. The decomposition of WIS shows that 0.476 of the score came from overprediction, 0 from underprediction, and 0.783 from dispersion. This suggests that the model slightly overestimated case counts but had a reasonable spread in its prediction intervals.
We can now summarize the results of the models according to the horizon:
scores_per_horizon <- scores %>%
summarise_scores(by = c("horizon", "model"))
print(scores_per_horizon)
#> horizon model wis overprediction underprediction dispersion
#> <num> <char> <num> <num> <num> <num>
#> 1: -2 Pois_model 1.845238 0.3333333 0.6666667 0.8452381
#> 2: -2 NB_model 1.695286 0.1619048 0.5533333 0.9800476
#> 3: -1 Pois_model 24.635238 4.8380952 17.6571429 2.1400000
#> 4: -1 NB_model 19.276190 2.6857143 13.0952381 3.4952381
#> 5: 0 Pois_model 24.351643 6.9238095 11.6238095 5.8040238
#> 6: 0 NB_model 20.149238 2.5714286 8.9904762 8.5873333
plot_wis(scores_per_horizon) +
facet_wrap(~ horizon, scales ='free') +
ggtitle('WIS per Horizon')
As we have seen above, the negative binomial model gives wider estimates compared to the Poisson model, which reflects in higher dispersion scores. However, the negative binomial model has better coverage of the true data compared to the Poisson model which reflects in lower overprediction and underprediction scores. Overall, the WIS (which is the sum of these three components) of the negative binomial model is lower (better) than the WIS of the Poisson model, for all three horizons.
We can also compare the model results according to another parameter - like the nowcasting date (the date on which the nowcast was performed):
scores_per_now_date <- scores %>%
summarise_scores(by = c("now", "model"))
print(scores_per_now_date)
#> now model wis overprediction underprediction dispersion
#> <Date> <char> <num> <num> <num> <num>
#> 1: 1994-08-29 Pois_model 9.605556 6.285714 0.000000 3.319841
#> 2: 1994-08-29 NB_model 5.458730 1.523810 0.000000 3.934921
#> 3: 1994-09-05 Pois_model 21.019841 0.000000 18.238095 2.781746
#> 4: 1994-09-05 NB_model 18.068254 0.000000 14.444444 3.623810
#> 5: 1994-09-12 Pois_model 11.625417 8.698413 0.000000 2.927004
#> 6: 1994-09-12 NB_model 8.753175 3.888889 0.000000 4.864286
#> 7: 1994-09-19 Pois_model 14.326984 5.174603 6.500000 2.652381
#> 8: 1994-09-19 NB_model 12.352698 3.619048 4.222222 4.511429
#> 9: 1994-09-26 Pois_model 28.142401 0.000000 25.174603 2.967798
#> 10: 1994-09-26 NB_model 23.901667 0.000000 19.065079 4.836587
plot_wis(scores_per_now_date) +
facet_wrap(~ now, scales ='free') +
ggtitle('WIS per Nowcasting Date')
We again see that in this instance, for each of these 5 dates, the WIS of the negative binomial model is lower (better) than the WIS of the Poisson model. This is a common finding because surveillance data typically are over-dispersed.
The WIS provides an effective framework for assessing the accuracy
and uncertainty of probabilistic forecasts, making it an important tool
for evaluating nowcasting models. This vignette demonstrated how version
1.1.0 of the NobBS
package integrates with the
scoringutils
package to evaluate forecast accuracy using
WIS.