Title: Respiratory Viral Infection Forecast Reporting
Version: 1.0.3
Description: Tools for reporting and forecasting viral respiratory infections, using case surveillance data. Report generation tools for short-term forecasts, and validation metrics for an arbitrary number of customizable respiratory viruses. Estimation of the effective reproduction number is based on the 'EpiEstim' framework described in work by 'Cori' and colleagues. (2013) <doi:10.1093/aje/kwt133>.
License: GPL (≥ 3)
Suggests: knitr, rmarkdown, testthat (≥ 3.0.0)
Config/testthat/edition: 3
Encoding: UTF-8
RoxygenNote: 7.3.3
Imports: data.table, dplyr, EpiEstim, ggplot2, incidence, lubridate, projections, purrr, rlang, tibble, stats, utils, tidyr, glue, mgcv, kableExtra, cowplot
VignetteBuilder: knitr
Depends: R (≥ 4.1.0)
URL: https://github.com/BCCDC-PHSA/ViroReportR
BugReports: https://github.com/BCCDC-PHSA/ViroReportR/issues
NeedsCompilation: no
Packaged: 2026-02-09 21:01:12 UTC; rebeca.falcao
Author: Mike Irvine [aut, cre, cph], Caesar Wong [aut], Nirupama Tamvada [aut], Rebeca Falcao [aut], Nelson Tang [aut]
Maintainer: Mike Irvine <mike.irvine@bccdc.ca>
Repository: CRAN
Date/Publication: 2026-02-12 08:10:21 UTC

ViroReportR: Respiratory Viral Infection Forecast Reporting

Description

Tools for reporting and forecasting viral respiratory infections, using case surveillance data. Report generation tools for short-term forecasts, and validation metrics for an arbitrary number of customizable respiratory viruses. Estimation of the effective reproduction number is based on the 'EpiEstim' framework described in work by 'Cori' and colleagues. (2013) doi:10.1093/aje/kwt133.

Author(s)

Maintainer: Mike Irvine mike.irvine@bccdc.ca [copyright holder]

Authors:

See Also

Useful links:


Check Matching Names in Named Vectors

Description

Internal utility function to verify that all input named vectors share the same names.

Usage

check_match_names(...)

Arguments

...

Named numeric vectors to be compared.

Value

Throws an error if name mismatches are found; otherwise, returns nothing.


Clean and validate case count data for EpiEstim

Description

This function prepares case count data for use with EpiEstim by performing a series of validation and cleaning steps:

Usage

clean_sample_data(data, start_date)

Arguments

data

A data frame containing at least the columns "date" and "confirm". The "date" column should be of class Date, and "confirm" should be numeric.

start_date

A Date (or date-convertible string) indicating the starting date for analysis. Must exist within the "date" column.

Details

  1. Ensures that the input data frame has the required columns: "date" and "confirm".

  2. Confirms that the specified start_date exists in the data and filters the data to include only records on or after that date.

  3. Removes leading days before the first non-zero confirmed case.

  4. Verifies that the resulting dataset contains at least 14 valid days (as required for estimation).

This function is primarily intended as a preprocessing step for EpiEstim modeling. It combines validation checks for input structure and time coverage with minimal data cleaning logic to ensure robust downstream estimation.

Value

A cleaned data frame filtered from start_date, starting at the first date with non-zero confirmed cases, and containing at least 14 days of data.


summarise a data frame d by groups along a variable

Description

summarise a data frame d by groups along a variable

Usage

create_quantiles(d, ..., variable = NULL)

Arguments

d

tibble data frame

...

group_by variables

variable

string

Value

Data frame containing sample quantiles at probabilities 0.05, 0.25, 0.50, 0.75 and 0.95


Print out text output for ViroReportR report detailing current number of case visits, last value of Rt and corresponding intervals

Description

Print out text output for ViroReportR report detailing current number of case visits, last value of Rt and corresponding intervals

Usage

current_forecast_text(time_period_result, ...)

Arguments

time_period_result

output from forecast_time_period

...

optional arguments to be passed on to forecast_metrics

Value

current forecast metrics


fit_epiestim_model - Function to estimate the reproduction number of an epidemic

Description

A wrapper function for estimate_R from the EpiEstim library to estimate the reproduction number of epidemics to support short-term forecasts

Usage

fit_epiestim_model(
  data,
  window_size = 7L,
  type = NULL,
  mean_si = NULL,
  std_si = NULL,
  recon_opt = "match",
  method = "parametric_si",
  mean_prior = NULL,
  std_prior = NULL
)

Arguments

data

data frame containing two columns: date and confirm (number of cases)

window_size

Integer Length of the sliding windows used for R estimates.

type

character Specifies type of epidemic. Must be one of "flu_a", "flu_b", "rsv", "sars_cov2" or "custom"

mean_si

Numeric User specification of mean of parametric serial interval

std_si

Numeric User specification of standard deviation of parametric serial interval

recon_opt

Not implemented. One of "naive" or "match" to pass on to estimate_R (see help page)

method

One of "non_parametric_si", "parametric_si", "uncertain_si", "si_from_data" or "si_from_sample" to pass on to estimate_R (see help page)

mean_prior

Numeric positive number giving the mean of the common prior distribution for all reproduction numbers

std_prior

Numeric positive number giving the standard deviation of the common prior distribution for all reproduction numbers

Details

fit_epiestim_model currently supports the following epidemics: Influenza, RSV and COVID-19. The default serial intervals for the estimation of R were retrieved from Cowling et al., 2011, Vink et al., 2014 and Madewell et al., 2023 for Influenza A, Influenza B, RSV and COVID (BA.5 Omicron variant) respectively

Value

Object of class estimate_R (see EpiEstim help page)


Extract current forecast metrics: forecast prediction, percentile interval and Rt value

Description

Extract current forecast metrics: forecast prediction, percentile interval and Rt value

Usage

forecast_metrics(time_period_result, iter = 10)

Arguments

time_period_result

output from forecast_time_period

iter

number of MCMC iterations used to generate Rt posterior

Value

dataframe of current forecast metrics


Forecast daily epidemic cases using EpiEstim

Description

This function prepares epidemic data, estimates the reproduction number (R_t) using fit_epiestim_model, and produces short-term forecasts of daily confirmed cases with project_epiestim_model.

It removes early periods with no cases, checks data validity, optionally smooths the epidemic curve, and then generates forward projections of cases for a specified number of days.

Usage

generate_forecast(
  data,
  start_date,
  window_size = 7,
  n_days = 7,
  type = NULL,
  smooth_data = FALSE,
  smoothing_cutoff = 10,
  ...
)

Arguments

data

data frame Must contain two columns:

  • date: observation dates

  • confirm: daily confirmed cases

start_date

Date Date after which the epidemic is considered to have started. Data before this date is removed.

window_size

Integer Length of the sliding window (in days) used for reproduction number estimation. Default is 7.

n_days

Integer Number of future days to forecast. Default is 7.

type

character Type of epidemic. Must be one of "flu_a", "flu_b", "rsv", "sars_cov2", or "custom". Passed to fit_epiestim_model.

smooth_data

logical Whether to smooth the input daily case counts before estimation. Default is FALSE.

smoothing_cutoff

Integer Cutoff parameter for smoothing. Only used if smooth_data = TRUE. Default is 10.

...

Additional arguments passed to fit_epiestim_model.

Details

Value

A data frame of forecasted daily incidence with columns:

See Also

fit_epiestim_model for reproduction number estimation, project_epiestim_model for forward simulations.

Examples


# Create sample test rsv data
disease_type <- "rsv"
test_data <- simulate_data()
formatted_data <- get_aggregated_data(
  test_data,
  number_column = disease_type,
  date_column = "date",
  start_date = "2024-04-01",
  end_date = "2024-05-01"
)

# Run a 7 day forecast with smoothing
res_smooth <- generate_forecast(
  data = formatted_data,
  start_date = "2024-04-01",
  n_days = 7,
  type = "rsv",
  smooth_data = FALSE
)


Generate Viral Respiratory Forecast Report

Description

Generates a full-season forecast report for viral respiratory diseases as an HTML document.

Usage

generate_forecast_report(
  input_data_dir = NULL,
  output_dir = NULL,
  n_days = 7,
  validate_window_size = 7,
  smooth = FALSE,
  disease_season = NULL
)

Arguments

input_data_dir

Path to input CSV data. Must contain columns: date, confirm, disease_type. Allowed values for disease_type: "flu_a", "flu_b", "rsv", "sars_cov2", "custom".

output_dir

Path to output directory for the rendered HTML report.

n_days

Number of days ahead to forecast. Default is 7.

validate_window_size

The number of days between each validation window. Default is 7.

smooth

Logical indicating whether smoothing should be applied in the forecast. Default is TRUE.

disease_season

An optional named list specifying the seasonal date ranges for each disease. Each element should be either:

  • NULL (indicating no defined season), or

  • a two-date vector in "YY-MM-DD" format (e.g., c("2024-09-01", "2025-03-01")) defining the start and end of the season for that disease.

For example: disease_season = list( flu_a = c("2024-09-01", "2025-03-01"), rsv = c("2024-09-01", "2025-03-01"), sars_cov2 = NULL )

This will produce a report where influenza A and RSV seasons run from September 1, 2024 to March 1, 2025, while no season is defined for SARS-CoV-2.

Value

Invisibly returns the path to the rendered HTML report.

Examples


data <- simulate_data(start_date = "2024-01-07", #starting Sunday
)
diseases <- c("flu_a", "rsv", "sars_cov2")
data$date <- lubridate::ymd(data$date)
vri_data_list <- purrr::set_names( purrr::map2( rep(list(data), length(diseases)),
                                  diseases,
                                  ~ get_aggregated_data(.x, "date", .y)
                                 ),
                            diseases
)
# Save the simulated data
df <- purrr::imap_dfr(
vri_data_list,
\(df, disease) dplyr::mutate(df, disease_type = disease)
)
tmp_dir <- tempdir() # temporary directory for example for saving data
data_path <- file.path(tmp_dir, "simulated_data.csv")
write.csv(df, data_path, row.names = FALSE)

output_path <- tempdir() # output directory for report (temporary as example)
generate_forecast_report(input_data_dir = data_path,
                         output_dir = output_path,
                         n_days = 7,
                         validate_window_size = 7,
                         smooth = FALSE)


Validate forecast performance over multiple time windows

Description

This function performs rolling validation of short-term forecasts generated by EpiEstim or similar models. It divides the input time series into overlapping validation windows and repeatedly runs forecasts to assess model performance across different time segments.

Usage

generate_validation(
  data,
  start_date,
  validate_window_size = 7,
  window_size = 7,
  n_days = 7,
  type = NULL,
  smooth_data = FALSE,
  smoothing_cutoff = 10,
  ...
)

Arguments

data

A data frame containing at least the columns "date" and "confirm". The "date" column should be of class Date, and "confirm" should be numeric.

start_date

A Date (or date-convertible string) specifying the starting point for validation Must exist in the "date" column.

validate_window_size

Integer. The number of days between each validation window (default: 7).

window_size

Integer. The sliding window size (in days) used by the forecasting model (default: 7).

n_days

Integer. The number of future days to forecast in each validation iteration (default: 7).

type

character Type of epidemic. Must be one of "flu_a", "flu_b", "rsv", "sars_cov2", or "custom". Passed to fit_epiestim_model.

smooth_data

Logical. Whether to smooth the input case counts prior to forecasting (default: FALSE).

smoothing_cutoff

Numeric. Threshold used for smoothing when smooth_data = TRUE (default: 10).

...

Additional arguments passed to generate_forecast().

Details

The validation procedure ensures that forecasts are evaluated under realistic temporal conditions. Starting from the earliest date, the function repeatedly:

  1. Takes a growing subset of data up to the current validation endpoint.

  2. Runs the forecast using generate_forecast().

  3. Moves the validation window forward by validate_window_size days.

This results in a set of forecasts that can be compared to observed data to evaluate predictive performance across time.

Value

A list of forecast results, each element corresponding to one validation window. Each element contains the output returned by generate_forecast() for that particular window.

See Also

clean_sample_data(), generate_forecast()

Examples

data <- simulate_data()
formatted_data <- get_aggregated_data(data,"date", "flu_a", "2024-10-16", "2024-12-31")
start_date <- as.Date("2024-10-16")
validation_results <- generate_validation(formatted_data, start_date, type="flu_a")


Compute Forecast Validation Metrics (SMAPE & MASE)

Description

This function evaluates forecast accuracy across multiple validation runs by computing two key performance metrics:

Usage

generate_validation_metric(data, validation_res)

Arguments

data

A data frame used in generate_validation(), containing the original training data for the model. It must include:

  • date: Dates of the observed case data (class Date).

  • confirm: Numeric values of observed confirmed cases.

validation_res

A list of forecast validation results, typically the output from generate_validation(). Each element should contain:

  • forecast_res_quantiles: A data frame with columns date and p50 (median forecasted values).

  • original_data: A data frame representing the training data used for that forecast, with a date column.

Details

For each forecast result, the function also reports the corresponding training and forecast periods. Computation stops once the forecast period reaches the maximum date in the model data.

The function automatically excludes forecasts extending beyond the latest date in the observed model data.

Value

A tibble (data frame) with one row per forecast result and the following columns:

See Also

generate_validation(), generate_forecast()

Examples

data <- simulate_data()
formatted_data <- get_aggregated_data(data,"date", "flu_a", "2024-10-16", "2024-12-31")
start_date <- ("2024-10-16")
validation_results <- generate_validation(formatted_data, start_date, type ="flu_a")
generate_validation_metric(formatted_data, validation_results)


Extract Aggregated Weekly Generic Data

Description

get_aggregated_data() performs data transformation in the following steps:

  1. Group the weekly or daily data by date.

  2. Aggregate the number of confirmed cases by either day or week.

  3. Select only the date and confirmed cases column.

  4. Filter the data by given start and end date

The input dataframe generic_data must have the following columns:

Note that these columns can be defined in a generic name, and inputted as the other two function parameters for data transformation (date_column, number_column)

Assume the date column is the start of the epiweek.

Usage

get_aggregated_data(
  generic_data,
  date_column,
  number_column,
  start_date = NULL,
  end_date = NULL,
  unit = "day"
)

Arguments

generic_data

the weekly generic data from get_data()

date_column

date column name str

number_column

cases count column name str

start_date

start date string (e.g. '2022-01-01')(optional, default is NULL)

end_date

end date string (e.g. '2022-12-31')(optional, default is NULL)

unit

aggregation unit "day" or "week"

Value

aggregated weekly data of the generic confirmed cases data (filtered by date if any)

date

Either day or week date

confirm

number of confirmed cases

Examples

sim_data <- simulate_data()
aggregated_data <- get_aggregated_data(
  sim_data,
  "date", "flu_a", "2024-10-16", "2024-12-31"
)

Compare Reproduction Number Estimates from Multiple EpiEstim Outputs

Description

[Deprecated]

This function is deprecated and may be removed in a future release.

This function creates a ggplot comparing estimated reproduction numbers (R) over time from multiple EpiEstim outputs. Each input should be a named object produced by EpiEstim containing an estimate_R$R data frame.

Usage

plot_R_fit_comparison(...)

Arguments

...

Named EpiEstim output objects produced by generate_forecast(). Each object should contain a data frame estimate_R$R with columns such as:

  • t_start, t_end: Time window indices

  • Median(R): Median estimated reproduction number

  • Quantile.0.025(R) / Quantile.0.975(R): 95% credible interval bounds

  • Quantile.0.25(R) / Quantile.0.75(R): 50% credible interval bounds

Value

A ggplot2 object showing median R estimates with 50% and 95% credible intervals, colored by input source (i.e., name).

See Also

generate_forecast(), EpiEstim::estimate_R()


Compare Forecast Outputs

Description

[Deprecated]

This function is deprecated and may be removed in a future release.

This function visualizes and compares multiple forecast outputs generated by generate_forecast(). It plots median predicted incidences with 80% and 90% predictive intervals.

Usage

plot_forecast_comparison(...)

Arguments

...

Named forecast result objects produced by generate_forecast(). Each object should contain a data frame forecast_res_quantiles with columns:

  • date: Forecast date

  • p10, p50, p90: 10th, median, and 90th quantile predictions

  • (Optional) other quantiles if available

Value

A ggplot2 object showing forecasted median incidences over time with shaded prediction intervals, colored by input source (i.e., model name).

See Also

generate_forecast(), ggplot2::ggplot()


Plot Mean Rt with time index (dates)

Description

Plot Mean Rt with time index (dates)

Usage

plot_rt(forecast_results)

Arguments

forecast_results

is the output of generate_forecast.

Value

Mean Rt with time index plot

Examples

# Create sample test rsv data
disease_type <- "rsv"
test_data <- simulate_data()
formatted_data <- get_aggregated_data(
  test_data,
  number_column = disease_type,
  date_column = "date",
  start_date = "2024-04-01",
  end_date = "2024-05-01"
)

# Run a 7 day forecast with smoothing
forecast_results <- generate_forecast(
  data = formatted_data,
  start_date = "2024-04-01",
  n_days = 7,
  type = "rsv",
  smooth_data = FALSE
)
plot_rt(forecast_results)

Plot a ribbon plot with each time horizon predictions against true values for validation

Description

Plot a ribbon plot with each time horizon predictions against true values for validation

Usage

plot_validation(data, validation_res, pred_plot = "ribbon")

Arguments

data

A data frame used in generate_validation(), containing the original training data used for model fitting. Must include:

  • date: Dates of the observed data (class Date).

  • confirm: Numeric values of confirmed cases.

validation_res

A list of forecast validation results, typically produced by generate_validation(). Each element should include:

  • forecast_res_quantiles: A data frame containing forecasted quantiles (p025, p10, p25, p50, p75, p90, p975, and date).

  • estimate_R: A list with estimated reproduction numbers (used for grouping).

  • smoothed_data: (optional) A data frame of smoothed observations, if smoothing was applied before forecasting.

pred_plot

either "ribbon" or "error_bar" (by default) to produce either ribbon prediction plots or error_bar plots respectively

Value

error_bar validation plot or ribbon validation plot for a specific prediction horizon

Examples

data <- simulate_data()
formatted_data <- get_aggregated_data(data,"date", "flu_a", "2024-10-16", "2024-12-31")
start_date <- ("2024-10-16")
validation_results <- generate_validation(formatted_data, start_date, type ="flu_a")
plot_validation(formatted_data, validation_results)


Extract daily forecast samples

Description

Function to produce short-term daily projections from objects of class estimate_R

Usage

project_epiestim_model(data, model_fit, n_days = 7, n_sim = 1000)

Arguments

data

data frame containing two columns: date and confirm (number of cases per day)

model_fit

Object of class estimate_R generated by running fit_epiestim_model

n_days

The number of days to run simulations for. Defaults to 14

n_sim

The number of epicurves to simulate. Defaults to 1000

Value

Data-frame of daily forecast samples from all simulations

date

date

incidence

projected number of daily confirmed cases

sim

simulation run number


Simulate Daily Virus Incidence Data

Description

Generates simulated daily incidence data for specified respiratory viruses over a defined number of days. Each virus is modeled using a Gaussian-like curve, parameterized by peak day, amplitude, and scale.

Usage

simulate_data(
  days = 365,
  peaks = c(flu_a = 90, rsv = 110, sars_cov2 = 160),
  amplitudes = c(flu_a = 50, rsv = 40, sars_cov2 = 20),
  scales = c(flu_a = -0.004, rsv = -0.005, sars_cov2 = -0.001),
  time_offset = 0,
  noise_sd = 5,
  start_date = "2024-01-07"
)

Arguments

days

Integer. Number of days to simulate (default is 365).

peaks

Named numeric vector. Peak day for each virus (e.g., c("flua"=90,"rsv"=110,"sars_cov2"=160)).

amplitudes

Named numeric vector. Amplitude for each virus's peak (e.g., c("flua"=50,"rsv"=40,"sars_cov2"=20)).

scales

Named numeric vector. Scale controlling spread of the peak for each virus (e.g., c("flua"=-0.004,"rsv"=-0.005,"sars_cov2"=-0.001)).

time_offset

Integer. Number of days to offset start of the simulation. useful if want to test data with larger values in the middle of a respiratory season.

noise_sd

numeric or named numeric. Gaussian noise applied to each virus signal. can either be a single value or named for each virus e.g., c("flu_a"=2,"rsv"=5,"sars_cov2"=7)

start_date

string

Value

A data frame with daily simulated incidence counts for each virus, including a date column.

Examples

simulate_data()
simulate_data(days = 100, peaks = c(flu_a = 30), amplitudes = c(flu_a = 60),
scales = c(flu_a = -0.01), noise_sd = c(flu_a = 5))

mirror server hosted at Truenetwork, Russian Federation.