The day-of-the-week (DoW) effect in disease incidence case reporting arises from variations in diagnostic and reporting practices across different days of the week. For example, on weekends, there may be fewer diagnoses because more health care provider offices are closed, and also fewer reports received because of reduced laboratory staffing. If not properly accounted for, this effect can distort epidemic analyses of daily incidence data. This vignette demonstrates how version 1.1.0 of the NobBS package addresses this issue by incorporating the DoW effect into nowcasting models.
To address DoW effect, the NobBS
package now introduces
a new boolean parameter, add_dow_effect
, applicable to both
the NobBS
and NobBS.strat
functions. When this
parameter is set to TRUE, and assuming the given data are at daily
resolution (as otherwise the DoW effect is irrelevant), the statistical
model for expected case counts is enhanced to include this effect as
follows:
\[ \log(\lambda_{t,d}) = \alpha_t + \beta_d + \gamma X_t \]
\(\lambda_{t,d}\): The expected number of cases reported at time \(t\) with a delay of \(d\).
\(\alpha_t\): The baseline log incidence at time \(t\).
\(\beta_d\): The effect of the delay \(d\).
\(\gamma\): A vector of coefficients associated with the DoW covariates.
\(X_{t}\): A matrix of covariates for the DoW where each row corresponds to onset date \(t\) and contains ‘1’ in the column representing the DoW for that date and ‘0’ otherwise.
Our model incorporates the DoW effect by introducing DoW coefficients (\(\gamma\)), which adjust the baseline log incidence (\(\alpha_t\)) to account for systematic DoW variations in reported cases. Typically, diagnosis dates are used because of missingness in illness onset dates. When using diagnosis dates, the addition of \(\gamma X_{t}\) captures differences in diagnostic availability across the week - such as fewer tests conducted over weekends. However, if DoW effects also exist in reporting delays (e.g., fewer reports being processed on weekends, leading to backlogs on Mondays), our model accounts for those as well.
The DoW coefficients consist of seven values, each representing the effect of a specific DoW, from Monday to Sunday. The coefficient for Sunday (\(\gamma[7]\)) is fixed to zero as the reference day, which means that the values of the rest of the DoW coefficients represent the log-expected number of cases compared with the reference day. In other words, the value of \(\exp(\gamma)\) for each day from Monday to Saturday is the multiplicative effect for the expected number of cases in this day compared with Sunday. So, for example, if \(\exp(\gamma)\) for Monday is 1.2, it means we expect 20% more cases on a Monday compared with a Sunday, all other things being equal.
The priors for the DoW coefficients are modeled as a normal
distribution with a default mean of 0 (assuming no systemic DoW effect)
and a precision of 0.25 (corresponding to a standard deviation of \(\sqrt{(1/0.25}=2\)). Users can modify these
priors using the gamma.mean.prior
and
gamma.prec.prior
parameters inside the specs list, which is
provided to the NobBS
and NobBS.strat
functions. gamma.mean.prior
allows users to specify six
values, setting the mean priors for Monday through Saturday (Sunday is
the reference day). Similarly, gamma.prec.prior
defines six
precision values for Monday through Saturday, providing flexibility to
control the level of regularization or expected variability in the DoW
effects.
The following code demonstrates this feature using daily mpox data
from the 2022 NYC outbreak, which is now included in the
NobBS
package.
First, we run the nowcasting model for a given date (August 15, 2022), both with and without the DoW effect, and store the results for comparison. We use a 28-day window, allowing the estimation of the DoW effect to be based on four weeks of data. We apply the default priors for the mean DoW effect, meaning we assume no prior effect before incorporating the data.
library(NobBS)
library(dplyr)
library(ggplot2)
library(scoringutils)
data(mpoxdat)
win_size <- 28
now <- as.Date("2022-08-15")
test_dates <- seq(now, now, by = 1)
# Filter data for current "now" date
current_data <- mpoxdat[mpoxdat$dx_date <= now, ]
# Run nowcasts and store results
nowcast_without_dow <- NobBS(
current_data, now, units = "1 day",
onset_date = "dx_date", report_date = "dx_report_date",
moving_window = win_size,
specs=list(nAdapt=2000))
nowcast_with_dow <- NobBS(
current_data, now, units = "1 day",
onset_date = "dx_date", report_date = "dx_report_date",
moving_window = win_size,
specs=list(nAdapt=2000),
add_dow_cov = TRUE,
)
nowcasts_without_dow <- list()
nowcasts_with_dow <- list()
nowcasts_without_dow[[1]] <- nowcast_without_dow
nowcasts_with_dow[[1]] <- nowcast_with_dow
Now let us plot and compare the estimates obtained from these nowcasts:
plot_estimates <- function(nowcast1, nowcast2, cases_per_date, now) {
# Ensure input data is valid
if (is.null(nowcast1$estimates) || is.null(nowcast2$estimates)) {
stop("Nowcast estimates are missing.")
}
# Extract estimates and credible intervals for nowcast without DoW effect
onset_dates1 <- nowcast1$estimates$onset_date
estimates1 <- nowcast1$estimates$estimate
lower1 <- nowcast1$estimates$q_0.025 # 2.5% quantile (lower bound of 95% PI)
upper1 <- nowcast1$estimates$q_0.975 # 97.5% quantile (upper bound of 95% PI)
# Extract estimates and prediction intervals for nowcast with DoW effect
onset_dates2 <- nowcast2$estimates$onset_date
estimates2 <- nowcast2$estimates$estimate
lower2 <- nowcast2$estimates$q_0.025
upper2 <- nowcast2$estimates$q_0.975
# Extract eventual case counts
case_dates <- cases_per_date$dx_date
case_counts <- cases_per_date$count
# Calculate plot range
min_val <- min(c(lower1, lower2, case_counts), na.rm = TRUE)
max_val <- max(c(upper1, upper2, case_counts), na.rm = TRUE)
# Create the plot
plot(
onset_dates1, estimates1, col = 'blue', type = 'l',
xlab = 'Onset Date', ylab = 'Cases',
ylim = c(min_val, max_val), lwd = 2,
main = paste0('Incidence Estimates for ', weekdays(now), ' ', now)
)
lines(onset_dates2, estimates2, col = 'red', lwd = 2)
# Add 95% PI shaded regions for both nowcasts
polygon(c(onset_dates1, rev(onset_dates1)), c(lower1, rev(upper1)),
col = rgb(0, 0, 1, 0.2), border = NA)
polygon(c(onset_dates2, rev(onset_dates2)), c(lower2, rev(upper2)),
col = rgb(1, 0, 0, 0.2), border = NA)
# Add true case counts as points
points(case_dates, case_counts, col = 'black', pch = 20)
# Add a legend
legend(
'topleft',
legend = c('Estimates without DoW effect',
'Estimates with DoW effect',
'95% PI (No DoW Effect)',
'95% PI (With DoW Effect)',
'Eventual cases'),
col = c('blue', 'red', rgb(0, 0, 1, 0.2), rgb(1, 0, 0, 0.2), 'black'),
lty = c(1, 1, NA, NA, NA), lwd = c(2, 2, NA, NA, NA),
pch = c(NA, NA, 15, 15, 20),
pt.cex = c(NA, NA, 1.5, 1.5, 1),
cex = 0.9
)
}
# Calculate true case counts
current_data <- mpoxdat[mpoxdat$dx_date <= now, ]
cases_per_date <- current_data %>%
group_by(dx_date) %>%
summarize(count = n())
# plot a comparison of the nowcast estimates
plot_estimates(nowcast_without_dow, nowcast_with_dow, cases_per_date, now)
As can be seen in the plot, the estimates that incorporate the DoW effect demonstrate greater accuracy. Specifically, the model incorporating the DoW effect correctly estimates the down trend during the weekend prior to the nowcasting date (a Monday), whereas the model without the DoW effect does not catch this trend. Additionally, the 95% prediction intervals (PIs) for the nowcast incorporating the DoW effect are narrower compared to those without the DoW effect, suggesting reduced uncertainty in predictions.
Next, we plot the mean estimates of the multiplicative effect for each DoW relative to Sunday, as discussed earlier. We visualize the estimates together with error bars representing their 95% credible intervals:
# Specify correct ordering for days of the week
weekdays_order <- c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat")
# Function to extract DoW effect for a given nowcast (mean + 95% CI from posterior samples)
extract_dow_effect <- function(nowcast) {
gammas <- numeric(6) # Only 6 because Sunday is ref
gamma_lower <- numeric(6)
gamma_upper <- numeric(6)
for (i in 1:6) {
param_name <- paste0("gamma[", i, "]")
# Extract posterior samples
posterior_samples <- exp(nowcast$params.post[[param_name]])
# Compute mean and credible intervals
gammas[i] <- mean(posterior_samples) # Mean estimate
gamma_lower[i] <- quantile(posterior_samples, 0.025) # Lower 95% CI
gamma_upper[i] <- quantile(posterior_samples, 0.975) # Upper 95% CI
}
return(list(means = gammas, lower = gamma_lower, upper = gamma_upper))
}
dow_effects <- extract_dow_effect(nowcast_with_dow)
dow_df <- data.frame(
Day = factor(weekdays_order, levels = weekdays_order),
Mean = dow_effects$means,
Lower = dow_effects$lower,
Upper = dow_effects$upper
)
plot_title <- paste("Estimates of DoW effect for nowcast performed on ",now)
p <- ggplot(dow_df, aes(x = Day, y = Mean)) +
geom_point(size = 3, color = "blue") +
geom_errorbar(aes(ymin = Lower, ymax = Upper), width = 0.2, color = "black") +
labs(title = plot_title, y = "DoW Effect on Expected Cases\n(Compared to Sunday)\n", x = "") +
theme_minimal()
print(dow_df)
#> Day Mean Lower Upper
#> 1 Mon 2.383060 1.9126116 3.067711
#> 2 Tue 2.010833 1.6168230 2.620948
#> 3 Wed 2.004893 1.6126859 2.603417
#> 4 Thu 1.868695 1.4910448 2.420906
#> 5 Fri 1.817647 1.4452783 2.365701
#> 6 Sat 1.093266 0.8496911 1.442902
print(p)
The mean estimates of the DoW multiplicative effect for weekdays range from 2.4 (Monday) to 1.8 (Friday). The lower bound of the 95% credible interval for weekdays is well above 1, indicating a statistically significant difference from Sunday. In contrast, the mean estimated effect for Saturday is 1.1 and does not appear significantly different from Sunday, as its credible interval includes 1.
Now, assume we want to perform nowcasts for the period following
August 15, 2022, while incorporating our previously estimated DoW effect
from that date as priors. We can set the log of the estimated
multiplicative effect as the prior mean parameters
(gamma.mean.prior
) and derive the prior precision
parameters (gamma.prec.prior
) from the estimated 95%
credible intervals.
Below, we demonstrate this approach by performing nowcasts for the four days following August 15, 2022, once again comparing the results of a model that incorporates the DoW effect with one that does not. Here, we use a shorter, two-week window for the nowcasts, ensuring they are based on recent data, while leveraging our prior DoW effect estimates for added stability.
prior_mean <- log(dow_effects$means)
prior_sd <- (dow_effects$upper-dow_effects$lower)/(2*1.96)
prior_prec <- 1/(prior_sd^2)
test_dates <- seq(as.Date("2022-08-16"), as.Date("2022-08-19"), by = 1)
win_size <- 14
# Initialize lists to store nowcasts
nowcasts_without_dow <- list() # Nowcasts without DoW effect
nowcasts_with_dow <- list() # Nowcasts with DoW effect
# Loop through each "now" date and run nowcasting
for (t in seq_along(test_dates)) {
now <- test_dates[t]
# Filter data for current "now" date
current_data <- mpoxdat[mpoxdat$dx_date <= now, ]
# Run nowcasts and store results
nowcasts_without_dow[[t]] <- NobBS(
current_data, now, units = "1 day",
onset_date = "dx_date", report_date = "dx_report_date",
moving_window = win_size,
)
nowcasts_with_dow[[t]] <- NobBS(
current_data, now, units = "1 day",
onset_date = "dx_date", report_date = "dx_report_date",
moving_window = win_size,
specs=list(gamma.mean.prior=prior_mean,gamma.prec.prior=prior_prec),
add_dow_cov = TRUE
)
}
# Loop through each "now" date and plot a comparison of the nowcast estimates
# with and without the DoW effect
for (t in seq_along(test_dates)) {
now <- test_dates[t]
nowcast1 <- nowcasts_without_dow[[t]]
nowcast2 <- nowcasts_with_dow[[t]]
# Calculate true case counts
current_data <- mpoxdat[mpoxdat$dx_date <= now, ]
cases_per_date <- current_data %>%
group_by(dx_date) %>%
summarize(count = n())
plot_estimates(nowcast1, nowcast2, cases_per_date, now)
}
To evaluate the results of these nowcasts we calculate and plot the
Weighted Interval Score (WIS) for the two models (see “Calculating
Weighted Interval Score for Nowcast Models” vignette for more details
about WIS calculation with NobBS
package version
1.1.0):
quantiles <- c(0.025,0.25,0.5,0.75,0.975)
q_len <- length(quantiles)
q_cols <- paste0('q_',quantiles)
data <- data.frame(onset_week=as.Date(character()),
now=as.Date(character()),
horizon=numeric(),
quantile_level=numeric(),
predicted=numeric(),
observed=numeric(),
model=character())
cases_per_date <- mpoxdat %>%
group_by(dx_date) %>%
summarize(count = n())
horizons <- c(-5,-4,-3,-2,-1,0)
for (t in seq_along(test_dates)) {
now <- test_dates[t]
nowcast1 <- nowcasts_without_dow[[t]]
nowcast2 <- nowcasts_with_dow[[t]]
for(h in horizons) {
date <- now + h
true_value <- cases_per_date[cases_per_date$dx_date==date,]$count
q_est1 <- unname(unlist(nowcast1$estimates[nowcast1$estimates$onset_date==date,q_cols]))
q_est2 <- unname(unlist(nowcast2$estimates[nowcast2$estimates$onset_date==date,q_cols]))
data_est <- data.frame(onset_week=rep(date,q_len*2),
now=rep(now,q_len*2),
horizon=rep(h,q_len*2),
quantile_level=rep(quantiles,2),
predicted=c(q_est1,q_est2),
observed=rep(true_value,q_len*2),
model=c(rep('Without DoW',q_len),rep('With DoW',q_len)))
data <- rbind(data,data_est)
}
}
nowcasts <- data %>%
scoringutils::as_forecast_quantile()
scores <- scoringutils::score(nowcasts,
get_metrics(nowcasts,select=c("wis","overprediction","underprediction","dispersion")))
scores_per_model <- scores %>%
summarise_scores(by = c("model"))
print(scores_per_model)
#> model wis overprediction underprediction dispersion
#> <char> <num> <num> <num> <num>
#> 1: Without DoW 25.44004 21.52083 0 3.919208
#> 2: With DoW 21.80376 19.25000 0 2.553760
plot_wis(scores_per_model) +
ggtitle('WIS with and without DoW')
As seen from these results, the model incorporating the DoW effect has better (smaller) dispersion and better coverage (smaller penalty for overdispersion - both models overestimate the actual cases in these nowcasts), and as a result, its WIS score is better.
We can plot our prior and posterior estimates for the DoW effect for each of the nowcasting days
# Extract DoW effects for each nowcast
for (t in seq_along(test_dates)) {
now <- test_dates[t]
# Retrieve nowcast with DoW effect
nowcast_with_dow2 <- nowcasts_with_dow[[t]]
# Extract estimates
dow_effects2 <- extract_dow_effect(nowcast_with_dow2)
dow_df2 <- data.frame(
Day = factor(weekdays_order, levels = weekdays_order),
Mean = dow_effects2$means,
Lower = dow_effects2$lower,
Upper = dow_effects2$upper
)
# Add a Type column to distinguish priors and posteriors
dow_df$Type <- "Prior"
dow_df2$Type <- "Posterior"
# Combine the two datasets
combined_dow_df <- rbind(dow_df, dow_df2)
# Ensure the Type column is a factor with the desired order
combined_dow_df$Type <- factor(combined_dow_df$Type, levels = c("Prior", "Posterior"))
plot_title <- paste("Estimates of DoW effect for nowcast performed on", weekdays(now), now)
# Plot with dodged positioning for side-by-side visualization
p <- ggplot(combined_dow_df, aes(x = Day, y = Mean, color = Type, group = Type)) +
geom_point(position = position_dodge(width = 0.4), size = 3) +
geom_errorbar(aes(ymin = Lower, ymax = Upper), width = 0.2, position = position_dodge(width = 0.4)) +
labs(title = plot_title, y = "DoW Effect on Expected Cases\n(Compared to Sunday)\n", x = "") +
ylim(0.5, 4) +
theme_minimal() +
scale_color_manual(values = c("Prior" = "blue", "Posterior" = "red")) +
theme(legend.position = "top")
print(p)
}
As seen in these plots, the posterior estimates for the DoW effect have shifted slightly but remain largely consistent with expectations, indicating that the data has refined, rather than drastically contradicted, our priors. This outcome is expected given that the time period used for these later nowcasts overlaps or is close to the period from which the prior estimates were derived. However, as we conduct nowcasts further into the future, changes in laboratory practices or reporting patterns between weekdays and weekends could lead to a more pronounced shift in the DoW effect estimates. If such changes occur, it may be necessary to reevaluate the priors used in the model to ensure they remain appropriate.
This vignette demonstrates that accounting for the DoW effect is an
important enhancement when using daily incidence data in nowcasting
models. The NobBS
package version 1.1.0 supports this
feature, improving model accuracy by capturing systematic DoW variations
in case diagnosis or reporting patterns.
However, certain limitations should be considered when applying this approach:
Sparse input data can reduce the reliability of estimated DoW effects, particularly when individual days have very few reported cases.
Holidays and irregular reporting days can disrupt the typical DoW pattern, leading to misleading estimates if not explicitly accounted for (currently not supported by the package). Users should exercise caution when applying DoW adjustments in periods with irregular reporting.
To mitigate these challenges, one approach is to pre-estimate the DoW
effect during a period with stable reporting and sufficient data, then
use these estimates as priors when nowcasting during a more problematic
period (e.g., one affected by sparse data or holidays). Additionally,
users can increase the precision parameter
(gamma.prec.prior
) to very high values, effectively
treating the pre-estimated priors as nearly fixed parameters.