Calculating Weighted Interval Score for Nowcast Models

Introduction

The Weighted Interval Score (WIS) is a probabilistic measure for evaluating forecast accuracy, balancing three key aspects of forecast quality:

Interval sharpness – how narrow the prediction intervals are (favoring more precise forecasts).
Calibration – how well the predicted intervals reflect the actual variability in the data.
Coverage – the proportion of observed outcomes that fall within the predicted intervals.

WIS penalizes wide intervals or those that fail to include the observed outcome, while rewarding narrow, well-calibrated intervals. This makes WIS an interpretable and robust metric for assessing both forecast accuracy and uncertainty.

As of version 1.1.0, the NobBS package can explicitly return quantile estimates when calling the NobBS and NobBS.strat functions, which can be used to calculate the WIS for the given nowcast. This vignette demonstrates how to calculate and compare the WIS for different nowcasting models using quantile estimates generated by the NobBS package, in combination with the scoringutils package (version 2.0.0 or later).

Nowcasting Scenarios

To illustrate this process, we run two nowcasting models — a Poisson model and a negative binomial model — for five nowcasting dates, using the dengue dataset provided in the NobBS package. To obtain quantile estimates we set the desired quantiles in the new quantiles parameter within the specs list given in the call to the NobBS and NobBS.strat functions. By default, NobBS returns estimates for five quantiles (0.025, 0.25, 0.5, 0.75, 0.975), representing the median along with the 50% and 95% prediction intervals. Here, we use a more detailed set of 23 quantile values to better capture the full predictive distribution, ensuring a more granular assessment of forecast uncertainty and improving the accuracy of the WIS calculation:

library(NobBS)
library(dplyr)
library(ggplot2)
library(scoringutils)

data(denguedat)

# Helper function to run nowcasting
run_nowcast <- function(data, now, win_size, specs) {
  NobBS(data, now, units = "1 week", onset_date = "onset_week",
        report_date = "report_week", moving_window = win_size, specs = specs)
}

quantiles <- c(0.025,seq(0.05,0.95,0.05),0.975)
q_len <- length(quantiles)
q_cols <- paste0('q_',quantiles)

# Poisson model specs
specs_poisson <- list(
  dist=c("Poisson"),
  quantiles=quantiles
)

#NB model specs
specs_nb = list(
  dist=c("NB"),
  quantiles=quantiles
)

# Initialize lists to store nowcasts
nowcasts_pois <- list()   # Nowcasts using poisson model
nowcasts_nb <- list()     # Nowcasts using negtive-binomial model

test_dates <- seq(as.Date("1994-08-29"),as.Date("1994-09-26"),by=7)

win_size <- 8

for (t in seq_along(test_dates)) {
  now <- test_dates[t]
  denguedat1 <- denguedat[denguedat$onset_week<=now,]
  nowcasts_pois[[t]] <- run_nowcast(denguedat1, now, win_size, specs_poisson)
  nowcasts_nb[[t]] <- run_nowcast(denguedat1, now, win_size, specs_nb)
}

Comparing the Nowcast Estimates of the Poisson and negative binomial Models

Now let us plot and compare the estimates obtained from each of these nowcasts:


plot_estimates <- function(nowcast1, nowcast2, cases_per_date, now) {
  # Ensure input data is valid
  if (is.null(nowcast1$estimates) || is.null(nowcast2$estimates)) {
    stop("Nowcast estimates are missing.")
  }
  
  # Extract estimates and credible intervals for nowcast without DoW effect
  onset_dates1 <- nowcast1$estimates$onset_date
  estimates1 <- nowcast1$estimates$estimate
  lower1 <- nowcast1$estimates$q_0.025  # 2.5% quantile (lower bound of 95% CI)
  upper1 <- nowcast1$estimates$q_0.975  # 97.5% quantile (upper bound of 95% CI)
  
  # Extract estimates and credible intervals for nowcast with DoW effect
  onset_dates2 <- nowcast2$estimates$onset_date
  estimates2 <- nowcast2$estimates$estimate
  lower2 <- nowcast2$estimates$q_0.025
  upper2 <- nowcast2$estimates$q_0.975
  
  # Extract eventual case counts
  case_dates <- cases_per_date$onset_week
  case_counts <- cases_per_date$count
  
  # Calculate plot range
  min_val <- min(c(lower1, lower2, case_counts), na.rm = TRUE)
  max_val <- max(c(upper1, upper2, case_counts), na.rm = TRUE) * 1.35
  
  # Create the plot
  plot(
    onset_dates1, estimates1, col = 'blue', type = 'l',
    xlab = 'Onset Date', ylab = 'Cases',
    ylim = c(min_val, max_val), lwd = 2,
    main = paste0('Incidence Estimates for ', now)
  )
  lines(onset_dates2, estimates2, col = 'red', lwd = 2)
  
  # Add 95% PI shaded regions for both nowcasts
  polygon(c(onset_dates1, rev(onset_dates1)), c(lower1, rev(upper1)), 
          col = rgb(0, 0, 1, 0.2), border = NA) 
  polygon(c(onset_dates2, rev(onset_dates2)), c(lower2, rev(upper2)), 
          col = rgb(1, 0, 0, 0.2), border = NA) 
  
  # Add true case counts as points
  points(case_dates, case_counts, col = 'black', pch = 20)
  
  # Add a legend
  legend(
    'topleft',
    legend = c('Estimates Poisson model', 
               'Estimates negative binomial model', 
               '95% PI (Poisson model)', 
               '95% PI (negative binomial model)',
               'Eventual cases'),
    col = c('blue', 'red', rgb(0, 0, 1, 0.2), rgb(1, 0, 0, 0.2), 'black'),
    lty = c(1, 1, NA, NA, NA), lwd = c(2, 2, NA, NA, NA), 
    pch = c(NA, NA, 15, 15, 20),
    pt.cex = c(NA, NA, 1.5, 1.5, 1), 
    cex = 0.9
  )
}

cases_per_week <- denguedat %>% group_by(onset_week) %>% summarize(count=n())

for (t in seq_along(test_dates)) {
  
  now <- test_dates[t]
  nowcast_pois <- nowcasts_pois[[t]]
  nowcast_nb <- nowcasts_nb[[t]]
  cases_per_week1 <- cases_per_week[cases_per_week$onset_week <= now, ]
  plot_estimates(nowcast_pois, nowcast_nb, cases_per_week1, now)
}

As seen in these plots, the mean estimates of the two models are very similar, but the negative binomial model has wider prediction intervals, which appear to better capture the actual number of eventual cases.

Now, let us calculate the WIS for these nowcasting estimates. To do this, we first need to extract the quantile estimates from the nowcast results and use them to construct a data frame compatible with the scoringutils package.

Extracting Quantile Estimates

The quantile estimates can be found in the estimates dataframe within the list returned by the NobBS and NobBS.strat functions. The following code iterates over the nowcasting results, extracts the quantile estimates, and combines them with the observed case counts into a unified data frame. From each nowcast we extract the estimates for three horizons: 0 (the current week), -1 (the previous week) and -2 (two weeks ago).


data <- data.frame(onset_week=as.Date(character()),
                   now=as.Date(character()),
                   horizon=numeric(),
                   quantile_level=numeric(),
                   predicted=numeric(),
                   observed=numeric(),
                   model=character())

cases_per_week <- denguedat %>% group_by(onset_week) %>% summarize(count=n())

horizons <- c(-2,-1,0)

for (t in seq_along(test_dates)) {
  now <- test_dates[t]
  nowcast_pois <- nowcasts_pois[[t]]
  nowcast_nb <- nowcasts_nb[[t]]
  for(h in horizons) {
    date <- now + h*7
    true_value <- cases_per_week[cases_per_week$onset_week==date,]$count
    q_est_pois <- unname(unlist(nowcast_pois$estimates[nowcast_pois$estimates$onset_date==date,q_cols]))
    q_est_nb <- unname(unlist(nowcast_nb$estimates[nowcast_nb$estimates$onset_date==date,q_cols]))
    data_est <- data.frame(onset_week=rep(date,q_len*2),
                           now=rep(now,q_len*2),
                           horizon=rep(h,q_len*2),
                           quantile_level=rep(quantiles,2),
                           predicted=c(q_est_pois,q_est_nb),
                           observed=rep(true_value,q_len*2),
                           model=c(rep('Pois_model',q_len),rep('NB_model',q_len)))
    data <- rbind(data,data_est)
  }
}

print(head(data))
#>   onset_week        now horizon quantile_level predicted observed      model
#> 1 1994-08-15 1994-08-29      -2          0.025       131      135 Pois_model
#> 2 1994-08-15 1994-08-29      -2          0.050       132      135 Pois_model
#> 3 1994-08-15 1994-08-29      -2          0.100       133      135 Pois_model
#> 4 1994-08-15 1994-08-29      -2          0.150       133      135 Pois_model
#> 5 1994-08-15 1994-08-29      -2          0.200       134      135 Pois_model
#> 6 1994-08-15 1994-08-29      -2          0.250       135      135 Pois_model
print(tail(data))
#>     onset_week        now horizon quantile_level predicted observed    model
#> 625 1994-09-26 1994-09-26       0          0.750    196.00      210 NB_model
#> 626 1994-09-26 1994-09-26       0          0.800    204.00      210 NB_model
#> 627 1994-09-26 1994-09-26       0          0.850    215.15      210 NB_model
#> 628 1994-09-26 1994-09-26       0          0.900    230.00      210 NB_model
#> 629 1994-09-26 1994-09-26       0          0.950    259.05      210 NB_model
#> 630 1994-09-26 1994-09-26       0          0.975    289.00      210 NB_model

To help interpret this table, consider the first row as an example. The 0.025 quantile level means that 131 cases was the lower bound of the 95% prediction interval for the August 15, 1994 onset week, forecasted on August 29, 1994 (horizon = -2).Since the actual observed value was 135 cases, this indicates that the forecasted interval appropriately captured the lower tail of the distribution.

Calculating and Plotting the Weighted Interval Score

Next, we utilize the scoringutils package to compute the WIS for the Poisson and negative binomial models using the constructed data frame.

nowcasts <- data %>%
          scoringutils::as_forecast_quantile()
scores <- scoringutils::score(nowcasts, 
             get_metrics(nowcasts,select=c("wis","overprediction","underprediction","dispersion")))

print(head(scores))
#>    onset_week        now horizon      model       wis overprediction
#>        <Date>     <Date>   <num>     <char>     <num>          <num>
#> 1: 1994-08-15 1994-08-29      -2 Pois_model  1.259524      0.4761905
#> 2: 1994-08-15 1994-08-29      -2   NB_model  1.035714      0.1428571
#> 3: 1994-08-22 1994-08-29      -1 Pois_model 10.921429      8.5714286
#> 4: 1994-08-22 1994-08-29      -1   NB_model  5.976190      2.9523810
#> 5: 1994-08-29 1994-08-29       0 Pois_model 16.635714      9.8095238
#> 6: 1994-08-29 1994-08-29       0   NB_model  9.364286      1.4761905
#>    underprediction dispersion
#>              <num>      <num>
#> 1:               0  0.7833333
#> 2:               0  0.8928571
#> 3:               0  2.3500000
#> 4:               0  3.0238095
#> 5:               0  6.8261905
#> 6:               0  7.8880952

To help interpret this table, consider the first row as an example. For the onset week of August 15, 1994, using the Poisson model, the nowcast made on August 29, 1994 (horizon = -2) resulted in a WIS of 1.259. The decomposition of WIS shows that 0.476 of the score came from overprediction, 0 from underprediction, and 0.783 from dispersion. This suggests that the model slightly overestimated case counts but had a reasonable spread in its prediction intervals.

We can now summarize the results of the models according to the horizon:

scores_per_horizon <- scores %>%
          summarise_scores(by = c("horizon", "model"))

print(scores_per_horizon)
#>    horizon      model       wis overprediction underprediction dispersion
#>      <num>     <char>     <num>          <num>           <num>      <num>
#> 1:      -2 Pois_model  1.845238      0.3333333       0.6666667  0.8452381
#> 2:      -2   NB_model  1.695286      0.1619048       0.5533333  0.9800476
#> 3:      -1 Pois_model 24.635238      4.8380952      17.6571429  2.1400000
#> 4:      -1   NB_model 19.276190      2.6857143      13.0952381  3.4952381
#> 5:       0 Pois_model 24.351643      6.9238095      11.6238095  5.8040238
#> 6:       0   NB_model 20.149238      2.5714286       8.9904762  8.5873333
plot_wis(scores_per_horizon) + 
  facet_wrap(~ horizon, scales ='free') +
  ggtitle('WIS per Horizon')

As we have seen above, the negative binomial model gives wider estimates compared to the Poisson model, which reflects in higher dispersion scores. However, the negative binomial model has better coverage of the true data compared to the Poisson model which reflects in lower overprediction and underprediction scores. Overall, the WIS (which is the sum of these three components) of the negative binomial model is lower (better) than the WIS of the Poisson model, for all three horizons.

We can also compare the model results according to another parameter - like the nowcasting date (the date on which the nowcast was performed):

scores_per_now_date <- scores %>%
          summarise_scores(by = c("now", "model"))

print(scores_per_now_date)
#>            now      model       wis overprediction underprediction dispersion
#>         <Date>     <char>     <num>          <num>           <num>      <num>
#>  1: 1994-08-29 Pois_model  9.605556       6.285714        0.000000   3.319841
#>  2: 1994-08-29   NB_model  5.458730       1.523810        0.000000   3.934921
#>  3: 1994-09-05 Pois_model 21.019841       0.000000       18.238095   2.781746
#>  4: 1994-09-05   NB_model 18.068254       0.000000       14.444444   3.623810
#>  5: 1994-09-12 Pois_model 11.625417       8.698413        0.000000   2.927004
#>  6: 1994-09-12   NB_model  8.753175       3.888889        0.000000   4.864286
#>  7: 1994-09-19 Pois_model 14.326984       5.174603        6.500000   2.652381
#>  8: 1994-09-19   NB_model 12.352698       3.619048        4.222222   4.511429
#>  9: 1994-09-26 Pois_model 28.142401       0.000000       25.174603   2.967798
#> 10: 1994-09-26   NB_model 23.901667       0.000000       19.065079   4.836587
plot_wis(scores_per_now_date) + 
  facet_wrap(~ now, scales ='free')  +
  ggtitle('WIS per Nowcasting Date')