Help for package climatehealth

Title:

Statistical Tools for Modelling Climate-Health Impacts

Version:

1.0.1

Date:

2026-04-08

Description:

Tools for producing climate-health indicators and supporting official statistics from health and climate data. Implements analytical workflows for temperature-related mortality, wildfire smoke exposure, air pollution, suicides related to extreme heat, malaria, and diarrhoeal disease outcomes, with utilities for descriptive statistics, model validation, attributable fraction and attributable number estimation, relative risk estimation, minimum mortality temperature estimation, and plotting for reporting. These six indicators are endorsed by the United Nations Statistical Commission for inclusion in the Global Set of Environment and Climate Change Statistics. Implemented methods include distributed lag non-linear models (DLNM), quasi-Poisson time-series regression, case-crossover analysis, Bayesian spatio-temporal models using the Integrated Nested Laplace Approximation ('INLA'), and multivariate meta-analysis for sub-national estimates. The package is based on methods developed in the Standards for Official Statistics on Climate-Health Interactions (SOSCHI) project https://climate-health.officialstatistics.org. For methodologies, see Watkins et al. (2025) <doi:10.5281/zenodo.14865904>, Brown et al. (2024) <doi:10.5281/zenodo.14052183>, Pearce et al. (2024) <doi:10.5281/zenodo.14050224>, Byukusenge et al. (2025) <doi:10.5281/zenodo.15585042>, Dzakpa et al. (2025) <doi:10.5281/zenodo.14881886>, and Dzakpa et al. (2025) <doi:10.5281/zenodo.14871506>.

License:

MIT + file LICENSE

Encoding:

UTF-8

RoxygenNote:

7.3.3

Imports:

car, data.table, dlnm, dplyr, Epi, forcats, exactextractr, ggplot2, ggtext, gnm, graphics, grDevices, gplots, lifecycle, lme4, lubridate, metafor, mgcv, mixmeta, ncdf4, patchwork, pkgbuild, purrr, raster, RColorBrewer, readr, readxl, reshape2, rlang, scales, sf, spdep, splines, stats, stringr, tibble, tidyr, tools, tseries, tsModel (≥ 0.6-2), utils, xfun, zoo

VignetteBuilder:

knitr

Suggests:

covr, knitr, rmarkdown, devtools, DT, htmltools, INLA, mockery, mvmeta, openxlsx, patrick, pkgload, stringdist, terra, testthat (≥ 3.2.1.1), withr

URL:

https://climate-health.officialstatistics.org

Additional_repositories:

https://inla.r-inla-download.org/R/stable/

Depends:

R (≥ 4.4.0)

Config/rcmdcheck/ignore-inconsequential-notes:

true

NeedsCompilation:

Packaged:

2026-04-08 16:53:32 UTC; omekek

Author:

Charlie Browning [aut], Kenechi Omeke [aut, cre], Etse Yawo Dzakpa [aut], Gladin Jose [aut], Matt Pearce [aut], Ellie Watkins [aut], Claire Hunt [aut], Beatrice Byukusenge [aut], Cassien Habyarimana [aut], Venuste Nyagahakwa [aut], Felix Scarbrough [aut], Treesa Shaji [aut], Bonnie Lewis [aut], Maquines Odhiambo Sewe [aut], Vijendra Ingole [aut], Sean Lovell [ctb], Antony Brown [ctb], Euan Soutter [ctb], Gillian Flower [ctb], David Furley [ctb], Joe Panes [ctb], Charlotte Romaniuk [ctb], Milly Powell [ctb], Wellcome [fnd], Office for National Statistics [cph] (SOSCHI Project)

Maintainer:

Kenechi Omeke <climate.health@ons.gov.uk>

Repository:

CRAN

Date/Publication:

2026-04-08 20:30:02 UTC

climatehealth: Statistical Tools for Modelling Climate-Health Impacts

Description

Overview

This package provides a suite of analysis functions for measuring the relationship between various climate factors (indicators) and health outcomes.

Included Indicators

Mortality attributable to high and low outdoor temperatures
Mortality attributable to wildfire-related PM2.5
Suicides attributable to extreme heat
Mortality attributable to short-term exposure to outdoor PM2.5 exposure
Diarrhea cases attributable to extreme temperatures and rainfall
Malaria cases attributable to extreme temperatures and rainfall

License

MIT

The full range of topics include

Temperature-related health effects
Health effects of wildfires
Mental Health
Health effects of air pollution
Water-borne diseases
Vector-borne diseases

Author(s)

Maintainer: Kenechi Omeke climate.health@ons.gov.uk

Authors:

Charlie Browning
Etse Yawo Dzakpa
Gladin Jose
Matt Pearce
Ellie Watkins
Claire Hunt
Beatrice Byukusenge
Cassien Habyarimana
Venuste Nyagahakwa
Felix Scarbrough
Treesa Shaji
Bonnie Lewis
Maquines Odhiambo Sewe
Vijendra Ingole

Other contributors:

Sean Lovell [contributor]
Antony Brown [contributor]
Euan Soutter [contributor]
Gillian Flower [contributor]
David Furley [contributor]
Joe Panes [contributor]
Charlotte Romaniuk [contributor]
Milly Powell [contributor]
Wellcome [funder]
Office for National Statistics (SOSCHI Project) [copyright holder]

English day of week names

Description

Provides consistent English day names regardless of system locale

Usage

.english_dow_names(day_numbers = NULL, short = FALSE)

Arguments

day_numbers

Optional vector of day numbers (1-7, where 1=Sunday)

short

Logical. Return abbreviated names? Default FALSE.

Value

Character vector of day names

English month names

Description

Provides consistent English month names regardless of system locale

Usage

.english_month_names(month_numbers = NULL, short = FALSE)

Arguments

month_numbers

Optional vector of month numbers (1-12) to return

short

Logical. Return abbreviated names? Default FALSE.

Value

Character vector of month names

Temporarily set English locale for date operations

Description

Temporarily sets the locale to English for date parsing and formatting

Usage

.with_english_locale(expr)

Arguments

expr

Expression to evaluate with English locale

Value

Result of the expression

Raise a typed error with structured metadata

Description

Creates a classed condition that can be caught and inspected by the API layer. This is the base helper - prefer using specific helpers like abort_column_not_found() or abort_validation() when applicable.

Usage

abort_climate(message, type = "generic_error", ..., call = rlang::caller_env())

Arguments

message

Human-readable error message

type

Error type for classification. One of:

"validation_error": Data/parameter validation issues (HTTP 400)
"column_not_found": Missing column in dataset (HTTP 400)
"model_error": Statistical model failures (HTTP 422)
"generic_error": Unclassified errors (HTTP 500)

...

Additional metadata to include in the error (e.g., column = "tmean")

call

The call to include in the error (defaults to caller's call)

Value

Never returns; always raises an error.

Examples


# Basic usage
err <- tryCatch(
  abort_climate("Something went wrong", "generic_error"),
  error = identity
)
inherits(err, "climate_error")

# With metadata
err <- tryCatch(
  abort_climate(
    "Invalid lag value",
    "validation_error",
    param = "nlag",
    value = -1,
    expected = "non-negative integer"
  ),
  error = identity
)
err$type

Raise a column-not-found error with available columns

Description

Use this when a required column is missing from a dataset. Includes fuzzy matching to suggest the closest available column name.

Usage

abort_column_not_found(
  column,
  available,
  dataset_name = "dataset",
  call = rlang::caller_env()
)

Arguments

column

The column name that was not found

available

Character vector of available column names

dataset_name

Optional name of the dataset for clearer messages

call

The call to include in the error

Value

Never returns; always raises an error.

Examples


data <- data.frame(temp = 1)
if (!("tmean" %in% colnames(data))) {
  err <- tryCatch(
    abort_column_not_found("tmean", colnames(data)),
    error = identity
  )
  err$suggestion
}

Raise a model error (statistical/computational failures)

Description

Use this when statistical models fail to converge, produce singular matrices, or encounter other computational issues that aren't due to obvious user error.

Usage

abort_model_error(
  message,
  model_type = "unknown",
  ...,
  call = rlang::caller_env()
)

Arguments

message

Human-readable error message

model_type

Type of model that failed (e.g., "dlnm", "glm", "meta-analysis")

...

Additional diagnostic metadata

call

The call to include in the error

Value

Never returns; always raises an error.

Examples


tryCatch({
  stop("convergence failed")
}, error = function(e) {
  err <- tryCatch(
    abort_model_error(
    "Model failed to converge",
    model_type = "dlnm",
    original_error = conditionMessage(e)
    ),
    error = identity
  )
  inherits(err, "model_error")
})

Raise a validation error (data/parameter issues)

Description

Use this for general validation failures where the user's input or data doesn't meet requirements. For missing columns specifically, use abort_column_not_found().

Usage

abort_validation(message, ..., call = rlang::caller_env())

Arguments

message

Human-readable error message

...

Additional metadata (e.g., param = "nlag", value = -1)

call

The call to include in the error

Value

Never returns; always raises an error.

Examples


# Parameter validation
nlag <- -1
if (nlag < 0) {
  err <- tryCatch(
    abort_validation(
      "nlag must be >= 0",
      param = "nlag",
      value = nlag,
      expected = "non-negative integer"
    ),
    error = identity
  )
  inherits(err, "validation_error")
}

Aggregate air pollution results by month

Description

Aggregates daily analysis results to monthly summaries

Usage

aggregate_air_pollution_by_month(
  analysis_results,
  max_lag = 14L,
  include_national = TRUE
)

Arguments

analysis_results

Results from analyze_air_pollution_daily

max_lag

Integer. Maximum lag used in analysis. Defaults to 14.

include_national

Logical. Whether to include national results. Default TRUE.

Value

Dataframe with monthly aggregates

Aggregate air pollution results by region

Description

Aggregates daily analysis results to regional summaries

Usage

aggregate_air_pollution_by_region(analysis_results, max_lag = 14L)

Arguments

analysis_results

Results from analyze_air_pollution_daily

max_lag

Integer. Maximum lag used in analysis. Defaults to 14.

Value

Dataframe with regional aggregates

Aggregate air pollution results by year

Description

Aggregates daily analysis results to annual summaries

Usage

aggregate_air_pollution_by_year(
  analysis_results,
  max_lag = 14L,
  include_national = TRUE
)

Arguments

analysis_results

Results from analyze_air_pollution_daily

max_lag

Integer. Maximum lag used in analysis. Defaults to 14.

include_national

Logical. Whether to include national results. Default TRUE.

Value

Dataframe with annual aggregates

Split dataframe into multiple dataframes, based on a columns value.

Description

Split dataframe into multiple dataframes, based on a columns value.

Usage

aggregate_by_column(df, column_name)

Arguments

df

The dataframe to aggregate.

column_name

The column to aggregate the data by.

Value

A list of dataframes, split up based on the value of column_name.

Descriptive statistics

Description

Generates summary statistics for climate, environmental and health data

Usage

air_pollution_descriptive_stats(
  data,
  env_labels = c(pm25 = "PM2.5 (µg/m³)", tmax = "Max Temperature (°C)", precipitation
    = "Precipitation (mm)", humidity = "Humidity (%)", wind_speed = "Wind Speed (m/s)"),
  save_outputs = FALSE,
  output_dir = NULL,
  moving_average_window = 3L,
  plot_corr_matrix = FALSE,
  correlation_method = "pearson",
  plot_dist = FALSE,
  plot_na_counts = FALSE,
  plot_scatter = FALSE,
  plot_box = FALSE,
  plot_seasonal = FALSE,
  plot_regional = FALSE,
  plot_total = FALSE,
  detect_outliers = FALSE,
  calculate_rate = FALSE
)

Arguments

data

Dataframe containing a daily time series of climate, environmental and health data

env_labels

Named vector. Labels for environmental variables with units.

save_outputs

Logical. Whether to save outputs. Defaults to FALSE.

output_dir

Character. Directory to save descriptive statistics. Defaults to NULL.

moving_average_window

Numeric. Window size for moving average calculations. Defaults to 3 (3-day moving average).

plot_corr_matrix

Logical. Whether to plot correlation matrix. Defaults to FALSE.

correlation_method

Character. Correlation method. One of 'pearson', 'spearman', 'kendall'.

plot_dist

Logical. Whether to plot distribution histograms. Defaults to FALSE.

plot_na_counts

Logical. Whether to plot NA counts. Defaults to FALSE.

plot_scatter

Logical. Whether to plot scatter plots. Defaults to FALSE.

plot_box

Logical. Whether to plot box plots. Defaults to FALSE.

plot_seasonal

Logical. Whether to plot seasonal trends. Defaults to FALSE.

plot_regional

Logical. Whether to plot regional trends. Defaults to FALSE.

plot_total

Logical. Whether to plot total health outcomes per year. Defaults to FALSE.

detect_outliers

Logical. Whether to detect outliers. Defaults to FALSE.

calculate_rate

Logical. Whether to calculate rate per 100k people.. Defaults to FALSE.

Value

Invisibly returns the national data with moving averages

Comprehensive Air Pollution Analysis Pipeline

Description

Master function that runs the complete air pollution analysis including data loading, preprocessing (including lags), modeling, plotting, attribution calculations vs reference standards, power analysis and descriptive statistics

Usage

air_pollution_do_analysis(
  data_path,
  date_col = "date",
  region_col = "region",
  pm25_col = "pm25",
  deaths_col = "deaths",
  population_col = "population",
  humidity_col = "humidity",
  precipitation_col = "precipitation",
  tmax_col = "tmax",
  wind_speed_col = "wind_speed",
  categorical_others = NULL,
  continuous_others = NULL,
  Categorical_Others = NULL,
  Continuous_Others = NULL,
  max_lag = 14L,
  df_seasonal = 6,
  family = "quasipoisson",
  reference_standards = list(list(value = 15, name = "WHO")),
  output_dir = "air_pollution_results",
  save_outputs = TRUE,
  run_descriptive = TRUE,
  run_power = TRUE,
  moving_average_window = 3L,
  include_national = TRUE,
  years_filter = NULL,
  regions_filter = NULL,
  attr_thr = 95,
  plot_corr_matrix = TRUE,
  correlation_method = "pearson",
  plot_dist = TRUE,
  plot_na_counts = TRUE,
  plot_scatter = TRUE,
  plot_box = TRUE,
  plot_seasonal = TRUE,
  plot_regional = TRUE,
  plot_total = TRUE,
  detect_outliers = TRUE,
  calculate_rate = FALSE
)

Arguments

data_path

Character. Path to CSV data file

date_col

Character. Name of date column

region_col

Character. Name of region column

pm25_col

Character. Name of PM2.5 column

deaths_col

Character. Name of deaths column

population_col

Character. Name of the population column.

humidity_col

Character. Name of humidity column

precipitation_col

Character. Name of precipitation column

tmax_col

Character. Name of temperature column

wind_speed_col

Character. Name of wind speed column

categorical_others

Optional character vector. Names of additional categorical variables.

continuous_others

Optional character vector. Names of additional continuous variables (e.g., "tmean")

Categorical_Others

Deprecated alias for categorical_others.

Continuous_Others

Deprecated alias for continuous_others.

max_lag

Integer. Maximum lag days. Defaults to 14.

df_seasonal

Integer. Degrees of freedom for seasonal spline. Default 6.

family

Character. Character. Probability distribution for the outcome variable. Options include "quasipoisson" (default: "quasipoisson")

reference_standards

List of reference standards, each with "PM2.5 value" and "name of of standard (e.g. WHO)"

output_dir

Directory to save outputs

save_outputs

Logical. Whether to save outputs

run_descriptive

Logical. Whether to run descriptive statistics

run_power

Logical. Whether to run power analysis

moving_average_window

Integer. Window for moving average in descriptive stats

include_national

Logical. Whether to include national results in plots. Default TRUE.

years_filter

Optional numeric vector of years to include (e.g., c(2020, 2021, 2022)). It is recommended to filter for at least 3 consecutive years for a minimum considerable time series

regions_filter

Optional character vector of regions to include

attr_thr

Numeric (0-100). Percentile threshold used in power analysis to evaluate attribution detectability. Default 95.

plot_corr_matrix

Logical. Plot correlation matrix. Default TRUE.

correlation_method

Character. Correlation method for corr matrix (e.g.,"pearson", "spearman"). Default "pearson".

plot_dist

Logical. Plot distributions (hist/density) for key variables. Default TRUE.

plot_na_counts

Logical. Plot missingness/NA counts. Default TRUE.

plot_scatter

Logical. Plot scatter plots for key pairs. Default TRUE.

plot_box

Logical. Plot boxplots by region/season where applicable. Default TRUE.

plot_seasonal

Logical. Plot seasonal summaries. Default TRUE.

plot_regional

Logical. Plot regional summaries. Default TRUE.

plot_total

Logical. Plot overall totals where relevant. Default TRUE.

detect_outliers

Logical. Flag potential outliers in descriptive workflow. Default TRUE.

calculate_rate

Logical. Whether to calculate rate variables during descriptive stats (e.g., deaths per population). Default FALSE

Value

List containing:

data: Processed data with lag variables
meta_analysis: Meta-analysis results with AF/AN calculations
lag_analysis: Lag-specific analysis results
distributed_lag_analysis: Distributed lag model results (if requested)
plots: List of generated plots (forest, lags, distributed lags)
power_list: A list containing power information by area
exposure_response_plots: Exposure-response plots for each reference standard (if requested)
reference_specific_af_an: AF/AN calculations specific to each reference standard (if requested)
descriptive_stats: Summary statistics of key variables

Examples


example_data <- data.frame(
  date = seq.Date(as.Date("2020-01-01"), by = "day", length.out = 180),
  province = "Example Province",
  pm25 = stats::runif(180, 8, 35),
  deaths = stats::rpois(180, lambda = 5),
  population = 500000,
  humidity = stats::runif(180, 40, 90),
  precipitation = stats::runif(180, 0, 20),
  tmax = stats::runif(180, 18, 35),
  wind_speed = stats::runif(180, 1, 8)
)
example_path <- tempfile(fileext = ".csv")
utils::write.csv(example_data, example_path, row.names = FALSE)

results <- air_pollution_do_analysis(
  data_path = example_path,
  date_col = "date",
  region_col = "province",
  pm25_col = "pm25",
  deaths_col = "deaths",
  population_col = "population",
  humidity_col = "humidity",
  precipitation_col = "precipitation",
  tmax_col = "tmax",
  wind_speed_col = "wind_speed",
  continuous_others = NULL,
  max_lag = 7L,
  df_seasonal = 4,
  family = "quasipoisson",
  reference_standards = list(list(value = 15, name = "WHO")),
  years_filter = NULL,
  regions_filter = NULL,
  include_national = FALSE,
  output_dir = tempdir(),
  save_outputs = FALSE,
  run_descriptive = FALSE,
  run_power = FALSE,
  moving_average_window = 3L,
  attr_thr = 95,
  plot_corr_matrix = FALSE,
  correlation_method = "pearson",
  plot_dist = FALSE,
  plot_na_counts = FALSE,
  plot_scatter = FALSE,
  plot_box = FALSE,
  plot_seasonal = FALSE,
  plot_regional = FALSE,
  plot_total = FALSE,
  detect_outliers = FALSE,
  calculate_rate = FALSE
)

Perform meta analysis with multiple lag structures

Description

Implements distributed lag model. Individual lag coefficients and cumulative effects are extracted and perform meta analysis

Usage

air_pollution_meta_analysis(
  data_with_lags,
  max_lag = 14L,
  df_seasonal = 6L,
  family = "quasipoisson"
)

Arguments

data_with_lags

Lagged data

max_lag

Integer. Maximum lag days. Defaults to 14

df_seasonal

Integer. Degrees of freedom for seasonal spline. Default 6.

family

Character string indicating the distribution family used in the GAM.

Value

Dataframe with lag-specific results including for regional and national

Air Pollution Power Calculation using Meta Results

Description

Produce a power statistic by region for PM2.5 attributable mortality using meta-analysis results

Usage

air_pollution_power_list(
  meta_results,
  data_with_lags,
  ref_pm25 = 15,
  attr_thr = 95,
  include_national = TRUE
)

Arguments

meta_results

Meta-analysis results from air_pollution_meta_analysis

data_with_lags

Lagged data frame

ref_pm25

Numeric. Reference PM2.5 value for attributable risk calculation

attr_thr

Integer. Percentile at which to define the PM2.5 threshold for calculating attributable risk. Defaults to 95.

include_national

Logical. Whether to include national level calculations. Defaults to TRUE.

Value

A list containing power information by region

FUNCTION FOR COMPUTING ATTRIBUTABLE MEASURES FROM DLNM

Description

A function to calculate attributable numbers and fractions derived from (c) Antonio Gasparrini 2015-2017. Modifications to produce daily values with confidence intervals.

Usage

an_attrdl(
  x,
  basis,
  cases,
  coef = NULL,
  vcov = NULL,
  model.link = NULL,
  dir = "back",
  tot = TRUE,
  cen,
  range = NULL,
  nsim = 5000
)

Arguments

x

AN EXPOSURE VECTOR OR (ONLY FOR dir="back") A MATRIX OF LAGGED EXPOSURES

basis

THE CROSS-BASIS COMPUTED FROM x

cases

THE CASES VECTOR OR (ONLY FOR dir="forw") THE MATRIX OF FUTURE CASES

coef

COEF FOR basis IF model IS NOT PROVIDED

vcov

VCOV FOR basis IF model IS NOT PROVIDED

model.link

LINK FUNCTION IF model IS NOT PROVIDED

dir

EITHER "back" OR "forw" FOR BACKWARD OR FORWARD PERSPECTIVES

tot

IF TRUE, THE TOTAL ATTRIBUTABLE RISK IS COMPUTED

cen

THE REFERENCE VALUE USED AS COUNTERFACTUAL SCENARIO

range

THE RANGE OF EXPOSURE. IF NULL, THE WHOLE RANGE IS USED

nsim

NUMBER OF SIMULATION SAMPLES

Value

Attributable Fraction
Attributable Fraction lower confidence intervals
Attributable Fraction upper confidence intervals
Attributable Numbers
Attributable Numbers lower confidence intervals
Attributable Numbers upper confidence intervals
Simulation matrix of attributable numbers

Calculate daily RR/AF/AN/AR for region-specific/national distributed lag effects for a chosen PM2.5 reference.

Description

Calculate daily RR/AF/AN/AR for region-specific/national distributed lag effects for a chosen PM2.5 reference.

Usage

analyze_air_pollution_daily(
  data_with_lags,
  meta_results,
  ref_pm25 = 15,
  ref_name = "WHO",
  max_lag = 14L
)

Arguments

data_with_lags

Dataset. Lagged data with lag variables.

meta_results

Dataset. Results from meta analysis.

ref_pm25

Numeric. PM2.5 reference value. Defaults to 15.

ref_name

Character. Reference body name. Defaults to "WHO".

max_lag

Integer. Maximum lag days. Defaults to 14.

Value

List with region-specific/national results for daily RR/AF/AN/AR

FUNCTION FOR COMPUTING ATTRIBUTABLE MEASURES FROM DLNM

Description

A function to calculate attributable numbers and fractions derived from (c) Antonio Gasparrini 2015-2017.

Usage

attrdl(
  x,
  basis,
  cases,
  model = NULL,
  coef = NULL,
  vcov = NULL,
  model.link = NULL,
  type = "af",
  dir = "back",
  tot = TRUE,
  cen,
  range = NULL,
  sim = FALSE,
  nsim = 5000
)

Arguments

x

AN EXPOSURE VECTOR OR (ONLY FOR dir="back") A MATRIX OF LAGGED EXPOSURES

basis

THE CROSS-BASIS COMPUTED FROM x

cases

THE CASES VECTOR OR (ONLY FOR dir="forw") THE MATRIX OF FUTURE CASES

model

THE FITTED MODEL

coef

COEF FOR basis IF model IS NOT PROVIDED

vcov

VCOV FOR basis IF model IS NOT PROVIDED

model.link

LINK FUNCTION IF model IS NOT PROVIDED

type

EITHER "an" OR "af" FOR ATTRIBUTABLE NUMBER OR FRACTION

dir

EITHER "back" OR "forw" FOR BACKWARD OR FORWARD PERSPECTIVES

tot

IF TRUE, THE TOTAL ATTRIBUTABLE RISK IS COMPUTED

cen

THE REFERENCE VALUE USED AS COUNTERFACTUAL SCENARIO

range

THE RANGE OF EXPOSURE. IF NULL, THE WHOLE RANGE IS USED

sim

IF SIMULATION SAMPLES SHOULD BE RETURNED. ONLY FOR tot=TRUE

nsim

NUMBER OF SIMULATION SAMPLES

Value

Attributable Numbers and Fractions

Calculate Attributable Metrics for Climate-Health Associations.

Description

Computes the attributable number, fraction, and rate of cases associated with specific exposure variables (e.g., temperature or rainfall) using fitted INLA models. The function estimates these metrics at the desired spatial aggregation level (country, region, or district) and optionally disaggregates by month or year.

Usage

attribution_calculation(
  data,
  param_term,
  model,
  level,
  param_threshold = 1,
  max_lag,
  nk,
  filter_year = NULL,
  group_by_year = FALSE,
  case_type,
  output_dir = NULL,
  save_csv = FALSE
)

Arguments

data

A data frame or list returned by the combine_health_climate_data() function, containing health outcome, population, and exposure data.

param_term

Character. The exposure variable term to evaluate (e.g.,"tmax" for maximum temperature, "rainfall" for precipitation). Defaults to "tmax".

model

The fitted INLA model object returned by the run_inla_models() function.

level

Character. The spatial disaggregation level. Can take one of the following values: "country", "region", or "district".

param_threshold

Numeric. Threshold above which relative risks (RR) are considered attributable. Defaults to 1.

filter_year

Integer. The year to filter to data to. Defaults to NULL.

group_by_year

Logical. Whether to aggregate results by year (TRUE) or by year and month (FALSE). Defaults to FALSE.

case_type

Character. The type of disease that the case column refers to. Must be one of "diarrhea" or "malaria".

output_dir

Optional. Directory path to save the output metrics if save_fig = TRUE

save_csv

Logical. Whether to save the generated attribution metrics to file. Default is FALSE.

Value

A tibble containing the following columns:

Grouping variables depending on the level and group_by_year settings.
MRT: Minimum risk temperature (or equivalent reference exposure).
AR_Number, AR_Number_LCI, AR_Number_UCI: Estimated, lower, and upper bounds of the attributable number of cases.
AR_Fraction, AR_Fraction_LCI, AR_Fraction_UCI: Estimated, lower, and upper bounds of the attributable fraction (%).
AR_per_100k, AR_per_100k_LCI, AR_per_100k_UCI: Estimated, lower, and upper bounds of the attributable rate per 100,000 population.

Generate a grid size for a certain number of plots.

Description

Generate a grid size for a certain number of plots.

Usage

calculate_air_pollution_grid_dims(n_plots)

Arguments

n_plots

The number of plots required for the grid.

Value

A list containing ncol and nrow values for the grid.

Calculate attributable numbers and fraction of a given health outcome.

Description

Takes a calculated RR and upper and lower CIs, and applies these to the input data to calculate attributable fraction and attributable number, along with upper and lower CIs, for each day in the input data. Uses Lag 1 RR and lower/upper CIs.

Usage

calculate_daily_AF_AN(data, rr_data)

Arguments

data

Dataframe containing a daily time series of climate and health data that was used to obtain rr_data.

rr_data

Dataframe containing relative risk and confidence intervals, calculated from input data.

Value

A dataframe containing a daily timseries of AF and AN, including upper and lower confidence intervals.

QAIC calculation

Description

Computes the Quasi-Akaike Information Criterion (QAIC) for models, enabling model comparison

Usage

calculate_qaic(
  data,
  save_csv = FALSE,
  output_folder_path = NULL,
  print_results = FALSE
)

Arguments

data

Dataframe containing a daily time series of climate and health data from which to fit models.

save_csv

Bool. Whether or not to save the VIF results to a CSV.

output_folder_path

String. Where to save the CSV file to (if save_csv == TRUE).

print_results

Logical. Whether or not to print model summaries and pearson dispersion statistics. Defaults to FALSE.

Value

Dataframe containing QAIC results for each lag.

Passes data to casecrossover_quasipoisson to calculate RR.

Description

Splits data by region if relative_risk_by_region==TRUE. If TRUE, data for each individual region is passed to casecrossover_quasipoisson to calculate RR by region. If FALSE, RR is calculated for the entire dataset.

Usage

calculate_wildfire_rr_by_region(
  data,
  scale_factor_wildfire_pm,
  calc_relative_risk_by_region = FALSE,
  save_fig = FALSE,
  output_folder_path = NULL,
  print_model_summaries = FALSE
)

Arguments

data

Dataframe containing a daily time series of climate and health data from which to fit models.

scale_factor_wildfire_pm

Numeric. The value to divide the wildfire PM2.5 concentration variables by for alternative interpretation of outputs. Corresponds to the unit increase in wildfire PM2.5 to give the model estimates and relative risks (e.g. scale_factor = 10 corresponds to estimates and relative risks representing impacts of a 10 unit increase in wildfire PM2.5). Setting this parameter to 0 or 1 leaves the variable unscaled.

calc_relative_risk_by_region

Bool. Whether to calculate Relative Risk by region. Defaults to FALSE.

save_fig

Bool. Whether or not to save a figure showing residuals vs fitted values for each lag. Defaults to FALSE.

output_folder_path

String. Where to save the figure. Defaults to NULL.

print_model_summaries

Bool. Whether to print the model summaries to console. Defaults to FALSE.

Value

Dataframe of relative risk and confidence intervals for each lag of wildfire-related PM2.5. Split by region if calc_relative_risk_by_region set to TRUE.

Fit quasipoisson regression models for different lags using a time-stratified case-crossover approach.

Description

Fits quasipoisson regression models using gnm

Usage

casecrossover_quasipoisson(
  data,
  scale_factor_wildfire_pm = 10,
  wildfire_lag,
  save_fig = TRUE,
  output_folder_path = NULL,
  print_model_summaries = TRUE
)

Arguments

data

Dataframe containing a daily time series of climate and health data from which to fit models.

scale_factor_wildfire_pm

save_fig

Bool. Whether or not to save a figure showing residuals vs fitted values for each lag. Defaults to FALSE.

output_folder_path

String. Where to save the figure. Defaults to NULL.

print_model_summaries

Bool. Whether to print the model summaries to console. Defaults to FALSE.

Value

Dataframe of relative risk and confidence intervals for each lag of wildfire-related PM2.5

Check multicollinearity using VIF and write the results to file

Description

This function runs check_diseases_vif(), reshapes the result into a tabular data frame, and optionally writes the table to VIF_results.csv.

Usage

check_and_write_vif(data, param_term, inla_param, case_type, output_dir = NULL)

Arguments

data

A data frame containing the disease outcome column, param_term, and the variables listed in inla_param.

param_term

Character vector of exposure variable term(s) to include in the VIF assessment.

inla_param

Character vector of additional model covariates to include in the VIF assessment.

case_type

Character. The type of disease that the case column refers to. Must be one of "diarrhea" or "malaria".

output_dir

Character. The output directory to save the VIF results to. Results are saved as VIF_results.csv. Defaults to NULL.

Value

A data frame with columns variable, VIF, and interpretation.

Check multicollinearity using VIF on model variables

Description

This function checks multicollinearity across the disease outcome, the exposure term(s) of interest, and the additional INLA covariates using a correlation-matrix-based variance inflation factor calculation.

Usage

check_diseases_vif(data, param_term, inla_param, case_type)

Arguments

data

A data frame containing the disease outcome column, param_term, and the variables listed in inla_param.

param_term

Character vector of exposure variable term(s) to include in the VIF assessment.

inla_param

Character vector of additional model covariates to include in the VIF assessment.

case_type

Character. The type of disease that the case column refers to. Must be one of "diarrhea" or "malaria".

Value

A list with:

variables: Character vector of variables used in the VIF calculation.
vif: Numeric vector of VIF values aligned to variables.
vif_interpretation: Character vector of qualitative VIF interpretations ("Low", "Moderate", "High", or "Not computed").

Check if a dataframe is empty.

Description

Checks if a dataframe is empty, and raises an error if it is.

Usage

check_empty_dataframe(df)

Arguments

df

Dataframe. The dataframe to check.

Value

NULL. No return if the dataframe is not empty.

Check that a file exists at a passed path.

Description

Checks the files on disk to assert that the passed file is present.

Usage

check_file_exists(fpath, raise = TRUE)

Arguments

fpath

The filepath to check exists.

raise

Whether or not to raise an error if the file does not exist, Default: TRUE

Value

'exists'. Whether or not the file exists on disk.

Check that a file extension on a given path matches the expected.

Description

This function takes an expected file extension, and validates it against a user-inputted file path.

Usage

check_file_extension(fpath, expected_ext, param_nm = "fpath", raise = TRUE)

Arguments

fpath

The filepath.

expected_ext

The expected file extension.

param_nm

The parameter name that the filepath was passed to (for error raising), Default: 'fpath'

raise

Whether or not to raise an error, Default: TRUE

Value

Whether or not the passed file has a valid file extension.

Check for Rtools Installation on Windows

Description

Verifies whether Rtools is installed and properly configured on a Windows system.

Usage

check_has_rtools()

Details

The function uses pkgbuild::check_build_tools(debug = TRUE) to test for the presence of Rtools and its integration with R. If Rtools is missing or misconfigured, the function throws an error with installation instructions.

Value

Returns TRUE invisibly if Rtools is detected and functional. Otherwise, throws an error.

Check variance inflation factors of predictor variables using a linear model

Description

Checks variance inflation factors of predictor variables using a linear model of the predictor variables on the health outcome. Prints stats if print_vif==TRUE. Raises a warning if VIF for a variables is > 2.

Usage

check_wildfire_vif(
  data,
  predictors,
  save_csv = FALSE,
  output_folder_path = NULL,
  print_vif = FALSE
)

Arguments

data

Dataframe containing a daily time series of climate and health data.

predictors

Character vector with each of the predictors to include in the model. Must contain at least 2 variables.

save_csv

Bool. Whether or not to save the VIF results to a CSV.

output_folder_path

String. Where to save the CSV file to (if save_csv == TRUE).

print_vif

Bool, whether or not to print VIF for each predictor. Defaults to FALSE.

Value

Variance inflation factor statistics for each predictor variable.

Read in and combine climate and health data

Description

Read and combine climate and health data prepared for the spatiotemporal and DLNM analysis.

Usage

combine_health_climate_data(
  health_data_path,
  climate_data_path,
  map_path,
  region_col,
  district_col,
  date_col,
  year_col,
  month_col,
  case_col,
  case_type,
  tot_pop_col,
  tmin_col,
  tmean_col,
  tmax_col,
  rainfall_col,
  r_humidity_col,
  geometry_col,
  runoff_col = NULL,
  ndvi_col = NULL,
  spi_col = NULL,
  max_lag,
  output_dir = NULL
)

Arguments

health_data_path

The path to the health data.

climate_data_path

The path to the climate data.

map_path

The path to the relevant map data.

region_col

Character. Name of the column in the dataframe that contains the region names.

district_col

Character. Name of the column in the dataframe that contains the region names.

date_col

Character. Name of the column in the dataframe that contains the date. Defaults to NULL.

year_col

Character. Name of the column in the dataframe that contains the Year.

month_col

Character. Name of the column in the dataframe that contains the Month.

case_col

Character. Name of the column in the dataframe that contains the disease cases to be considered.

case_type

Character. The type of disease that the case column refers to. Must be one of 'diarrhea' or 'malaria'.

tot_pop_col

Character. Name of the column in the dataframe that contains the total population.

tmin_col

Character. Name of the column in the dataframe that contains the minimum temperature data.

tmean_col

Character. Name of the column in the dataframe that contains the average temperature.

tmax_col

Character. Name of the column in the dataframe that contains the maximum temperature.

rainfall_col

Character. Name of the column in the dataframe that contains the cumulative monthly rainfall.

r_humidity_col

Character. Name of the column in the dataframe that contains the relative humidity.

geometry_col

is the Name of the geometry column in the shapefile (usually "geometry").

runoff_col

Character. Name of the column in the dataframe that contains the monthly runoff water data. Defaults to NULL.

ndvi_col

Character. Name of column containing the Normalized Difference Vegetation Index (ndvi) data. Defaults to NULL.

spi_col

Character. Name of the column in the dataframe that contains the standardized precipitation index. Defaults to NULL.

max_lag

Character. Number corresponding to the maximum lag to be considered for the delay effect. It should be between 2 an 4. Defaults to 2.

output_dir

Path to folder where the processed map data should be saved. Defaults to NULL.

Value

A list of dataframes containing the map, nb.map, data, grid_data, summary

Deprecated alias for `run_descriptive_stats()`.

Description

Generic wrapper function to compute descriptive statistics and EDA outputs.

Usage

common_descriptive_stats(
  df_list,
  output_path,
  aggregation_column = NULL,
  population_col = NULL,
  plot_corr_matrix = FALSE,
  correlation_method = "pearson",
  plot_dist = FALSE,
  plot_ma = FALSE,
  ma_days = 100,
  ma_sides = 1,
  timeseries_col = NULL,
  dependent_col,
  independent_cols,
  units = NULL,
  plot_na_counts = FALSE,
  plot_scatter = FALSE,
  plot_box = FALSE,
  plot_seasonal = FALSE,
  plot_regional = FALSE,
  plot_total = FALSE,
  detect_outliers = FALSE,
  calculate_rate = FALSE
)

Arguments

df_list

List of dataframes. A list of input dataframes.

output_path

Character. The path to write outputs to.

aggregation_column

Character. The column to use for aggregating the dataset into smaller subsets of regions.

population_col

Character. The column containing the population.

plot_corr_matrix

Logical. Whether or not to plot correlation matrix.

correlation_method

Character. The correlation method. One of 'pearson', 'spearman', 'kendall'.

plot_dist

Logical. Whether or not to plot distribution histograms.

plot_ma

Logical. Whether to plot moving averages over a timeseries.

ma_days

Integer. The number of days to use for a moving average.

ma_sides

Integer. The number of sides to use for a moving average (1 or 2).

timeseries_col

Character. The column used as the timeseries for moving averages.

dependent_col

Character. The column in the data containing the dependent variable.

independent_cols

Character vector. The columns in the data containing the independent variables.

units

Named character vector. A named character vector of units for each variable.

plot_na_counts

Logical. Whether to plot NA counts.

plot_scatter

Logical. Whether to plot scatter plots.

plot_box

Logical. Whether to plot box plots.

plot_seasonal

Logical. Whether to plot seasonal plots.

plot_regional

Logical. Whether to plot regional plots.

plot_total

Logical. Whether to plot total health outcomes per year.

detect_outliers

Logical. Whether to output a table containing outlier information.

calculate_rate

Logical. Whether to calculate the rate of health outcomes per 100k people.

Value

Character vector. Backward-compatible output path format.

Deprecated. Use run_descriptive_stats() instead.

Deprecated alias for `run_descriptive_stats_api()`.

Description

Deprecated alias for run_descriptive_stats_api().

Usage

common_descriptive_stats_api(
  data,
  aggregation_column = NULL,
  population_col = NULL,
  dependent_col,
  independent_cols,
  units = NULL,
  plot_correlation = FALSE,
  plot_dist_hists = FALSE,
  plot_ma = FALSE,
  plot_na_counts = FALSE,
  plot_scatter = FALSE,
  plot_box = FALSE,
  plot_seasonal = FALSE,
  plot_regional = FALSE,
  plot_total = FALSE,
  correlation_method = "pearson",
  ma_days = 100,
  ma_sides = 1,
  timeseries_col = NULL,
  detect_outliers = FALSE,
  calculate_rate = FALSE,
  output_path
)

Arguments

data

The dataset used for descriptive stats (as a vector).

aggregation_column

Character. The column to use for aggregating the dataset into smaller subsets.

population_col

Character. The column containing the population.

dependent_col

Character. The dependent column.

independent_cols

Character vector. The independent columns.

units

Named character vector. A named character vector of units for each variable.

plot_correlation

Logical. Whether to plot a correlation matrix.

plot_dist_hists

Logical. Whether to plot histograms showing column distributions.

plot_ma

Logical. Whether to plot moving averages over a timeseries.

plot_na_counts

Logical. Whether to plot counts of NAs in each column.

plot_scatter

Logical. Whether to plot the dependent column against the independent columns.

plot_box

Logical. Whether to generate box plots for selected columns.

plot_seasonal

Logical. Whether to plot seasonal trends of the variables in columns.

plot_regional

Logical. Whether to plot regional trends of the variables in columns.

plot_total

Logical. Whether to plot the total of the dependent column per year.

correlation_method

Character. The correlation method. One of 'pearson', 'spearman', 'kendall'.

ma_days

Integer. The number of days to use in moving average calculations.

ma_sides

Integer. The number of sides to use in moving average calculations (1 or 2).

timeseries_col

Character. The column used as the timeseries for moving averages.

detect_outliers

Logical. Whether to have a table of outliers.

calculate_rate

Logical. Whether to plot a rate based metric of the dependent column per year.

output_path

Character. The path to save outputs to.

Value

Character vector. Backward-compatible output path format.

Deprecated. Use run_descriptive_stats_api() instead.

Deprecated alias for `descriptive_stats_core()`.

Description

Deprecated. Use descriptive_stats_core() instead.

Usage

common_descriptive_stats_core(
  df,
  output_path,
  title,
  aggregation_column = NULL,
  population_col = NULL,
  plot_corr_matrix = FALSE,
  correlation_method = "pearson",
  plot_dist = FALSE,
  dependent_col,
  independent_cols = c(),
  units = NULL,
  plot_na_counts = FALSE,
  plot_scatter = FALSE,
  plot_box = FALSE,
  plot_seasonal = FALSE,
  plot_regional = FALSE,
  plot_total = FALSE,
  timeseries_col = "date",
  detect_outliers = FALSE,
  calculate_rate = FALSE
)

Create and plot the exposure-lag-response relationship (contour plot) at country, region or district level for each disease cases type (`diarrhea` and `malaria`).

Description

: Generates a contour plot showing the exposure-lag-response relationship of the exposure tmax and rainfall and the diseases case type.

Usage

contour_plot(
  data,
  param_term,
  model,
  level,
  max_lag,
  nk,
  case_type,
  filter_year = NULL,
  save_fig = FALSE,
  output_dir = NULL
)

Arguments

data

Data list from combine_health_climate_data() function.

param_term

A character vector or list containing parameter terms such as tmax (temperature exposure) and rainfall(rainfall exposure). Default to tmax.

model

The fitted model from the run_inla_models() function.

level

A character vector specifying the geographical disaggregation. Can take one of the following values: "country", "region", or "district".

case_type

Character. The type of disease that the case column refers to. Must be one of diarrhea or malaria.

filter_year

Integer. The year to filter to data to. Defaults to NULL.

save_fig

Boolean. Whether to save the outputted plot. Defaults to FALSE.

output_dir

The path to save the visualisation to. Defaults to NULL

Value

contour plot at country, Region and District level

Create lagged values for PM2.5 variable and average lag column.

Description

Creates new variables in a dataframe for lags and means over lag periods.

Usage

create_air_pollution_lags(data, max_lag = 14L)

Arguments

data

Dataframe from load_air_pollution_data() containing a daily time series of health and environmental data.

max_lag

Integer. The maximum lag days for outdoor PM2.5. Defaults to 14.

Value

Dataframe with added columns for lagged PM2.5 concentration.

Create statistical summaries of columns in a dataframe.

Description

Create statistical summaries of columns in a dataframe.

Usage

create_column_summaries(df, independent_cols = NULL)

Arguments

df

Datarame. Input data.

independent_cols

Character vector. The columns in the data containing the independent variables.

Value

Dataframe. Column summaries

Create a correlation matrix for columns in a dataframe.

Description

Create a correlation matrix for columns in a dataframe.

Usage

create_correlation_matrix(
  df,
  independent_cols = NULL,
  correlation_method = "pearson"
)

Arguments

df

Dataframe. The dataframe to use to create a correlation matrix.

independent_cols

Character vector. The columns in the data containing the independent variables.

correlation_method

string. The method to use for correlation calculations.

Value

Matrix. Correlation matrix for selected columns in the input dataset.

Generate a grid size for a certain number of plots.

Description

This function calculates the minimum grid size required to plot X amount of plots on a a figure. For example, 6 plots would require a 3x2, where as 7 would require a 3x3, and so on.

Usage

create_grid(plot_count)

Arguments

plot_count

The number of plots required for the grid.

Value

A numeric vector: c(x, y), where x and y define the grid dimensions.

Create indices for INLA models

Description

: For the INLA model, there is a need to set-up regions index, district index, and year index. This function create these indices using the dataset, ndistrict and nregion.

Usage

create_inla_indices(data, case_type)

Arguments

data

is the dataframe containing district_code, region_code, and year columns from the combine_health_climate_data() function.

case_type

Character. The type of disease that the case column refers to. Must be one of 'diarrhea' or 'malaria'.

Value

The modified data with the created indices.

Generate lagged values for predictor (temperature) variables

Description

Generates new variables in a dataframe for lags and means over lag periods.

Usage

create_lagged_variables(data, wildfire_lag, temperature_lag)

Arguments

data

Dataframe containing a daily time series of climate and health data

wildfire_lag

Integer. The number of days for which to calculate the lags for wildfire PM2.5. Default is 3.

temperature_lag

Integer. The number of days for which to calculate the lags for temperature. Default is 1.

Value

Dataframe with added columns for lagged temperature and wildfire-related PM2.5 concentration

Create a summary of all NA values in a dataset.

Description

Create a summary of all NA values in a dataset.

Usage

create_na_summary(df, independent_cols = NULL)

Arguments

df

Dataframe. The input dataset.

independent_cols

Character vector. The columns in the data containing the independent variables.

Value

Dataframe. A summary of NA values in the dataset.

Generate splines for temperature variable

Description

Generates temperature splines for each region

Usage

create_temperature_splines(data, nlag = 0, degrees_freedom = 6)

Arguments

data

Dataframe containing a daily time series of climate and health data

nlag

Integer. The number of days of lag in the temperature variable from which to generate splines (unlagged temperature variable). Defaults to 0.

degrees_freedom

Integer. Degrees of freedom for the spline(s). Defaults to 6.

Value

Dataframe with additional column for temperature spline.

Emit a consistent deprecation warning for descriptive stats wrappers.

Description

Emit a consistent deprecation warning for descriptive stats wrappers.

Usage

deprecate_descriptive_stats(old_fn, new_fn)

Arguments

old_fn

Character. Deprecated function name.

new_fn

Character. Replacement function name.

Value

None. Emits a warning.

Save descriptive statistics

Description

Generates summary statistics for climate and health data and saves them to the specified file path.

Usage

descriptive_stats(data, variables, bin_width = 5, output_dir = ".")

Arguments

data

Dataframe containing a daily time series of climate and health data

variables

Character or character vector with variable to produce summary statistics for. Must include at least 1 variable.

bin_width

Integer. Width of each bin in a histogram of the outcome variable. Defaults to 5.

output_dir

Character. The directory to output descriptive stats to. Must exist and will not be automatically created. Defaults to ".".

Value

Prints summary statistics and a histogram of the the outcome variable

Core Functionality for Producing Descriptive Statistics

Description

Core Functionality for Producing Descriptive Statistics

Usage

descriptive_stats_core(
  df,
  output_path,
  title,
  aggregation_column = NULL,
  population_col = NULL,
  plot_corr_matrix = FALSE,
  correlation_method = "pearson",
  plot_dist = FALSE,
  dependent_col,
  independent_cols = c(),
  units = NULL,
  plot_na_counts = FALSE,
  plot_scatter = FALSE,
  plot_box = FALSE,
  plot_seasonal = FALSE,
  plot_regional = FALSE,
  plot_total = FALSE,
  timeseries_col = "date",
  write_outlier_table = FALSE,
  calculate_rate = FALSE
)

Arguments

df

Dataframe. The input DataFrame.

output_path

Character. The path to write outputs to.

title

Character. The specific title for the subset of data being used.

aggregation_column

Character. Column to aggregate data by.

population_col

Character. Column containing population data.

plot_corr_matrix

Logical. Whether or not to plot correlation matrix.

correlation_method

Character. The correlation method. One of 'pearson', 'spearman', 'kendall'.

plot_dist

Logical. Whether or not to plot distribution histograms.

dependent_col

Character. The dependent column.

independent_cols

Character vector. The independent columns.

units

Named character vector. Units to use for plots (maps to columns parameter).

plot_na_counts

Logical. Whether to plot NA counts.

plot_scatter

Logical. Whether to plot scatter plots.

plot_box

Logical. Whether to plot box plots.

plot_seasonal

Logical. Whether to plot seasonal plots.

plot_regional

Logical. Whether to plot regional plots.

plot_total

Logical. Whether to plot total health outcomes per year.

timeseries_col

Character. Column containing timeseries data (e.g., date).

write_outlier_table

Logical. Whether to output a table containing outlier information.

calculate_rate

Logical. Whether to calculate the rate of health outcomes per 100k people.

Value

None. Outputs are written to files.

Detect Outliers Using the IQR Method

Description

Detect Outliers Using the IQR Method

Usage

detect_outliers(df, independent_cols = NULL)

Arguments

df

A data frame containing the data to check for outliers.

independent_cols

Character vector. The columns in the data containing the independent variables.

Value

Dataframe. Column summaries

Code for calculating Diarrhea disease cases attributable to extreme precipitation and extreme temperature Run Full diarrhea-Climate Analysis Pipeline

Description

The diarrhea_do_analysis function runs the complete analysis workflow by combining multiple functions to analyze the association between diarrhea cases and climate variables. It processes health, climate, and spatial data, fits models, generates plots, and calculates attributable risk.

Usage

diarrhea_do_analysis(
  health_data_path,
  climate_data_path,
  map_path,
  region_col,
  district_col,
  date_col = NULL,
  year_col,
  month_col,
  case_col,
  tot_pop_col,
  tmin_col,
  tmean_col,
  tmax_col,
  rainfall_col,
  r_humidity_col,
  runoff_col,
  geometry_col,
  spi_col = NULL,
  ndvi_col = NULL,
  max_lag = 2,
  nk = 2,
  basis_matrices_choices,
  inla_param,
  param_term,
  level,
  param_threshold = 1,
  filter_year = NULL,
  family = "nbinomial",
  group_by_year = FALSE,
  config = TRUE,
  save_csv = FALSE,
  save_model = TRUE,
  save_fig = FALSE,
  cumulative = FALSE,
  output_dir = NULL
)

Arguments

health_data_path

Character. Path to the processed health data file.

climate_data_path

Character. Path to the processed climate data file.

map_path

Character. Path to the spatial data file (e.g., shapefile).

region_col

Character. Column name for the region variable.

district_col

Character. Column name for the district variable.

date_col

Character (optional). Column name for the date variable. Defaults to NULL.

year_col

Character. Column name for the year variable.

month_col

Character. Column name for the month variable.

case_col

Character. Column name for diarrhea case counts.

tot_pop_col

Character. Column name for total population.

tmin_col

Character. Column name for minimum temperature.

tmean_col

Character. Column name for mean temperature.

tmax_col

Character. Column name for maximum temperature.

rainfall_col

Character. Column name for cumulative monthly rainfall.

r_humidity_col

Character. Column name for relative humidity.

runoff_col

Character. Column name for monthly runoff data.

geometry_col

Character. Column name of the geometry column in the shapefile (usually "geometry").

spi_col

Character (optional). Column name for the Standardized Precipitation Index (SPI). Defaults to NULL.

ndvi_col

Character (optional). Column name for the Normalized Difference Vegetation Index (NDVI). Defaults to NULL.

max_lag

Numeric. Maximum temporal lag to include in the distributed lag model (e.g., 2-4). Defaults to 2.

nk

Numeric. Number of internal knots for the natural spline of each predictor, controlling its flexibility: nk = 0 produces a linear effect with one basis column, nk = 1 generates a simple spline with two columns, nk = 2 yields a more flexible spline with three columns, and higher values of nk further increase flexibility but may also raise collinearity among spline terms. Defaults to 2.

basis_matrices_choices

Character vector. Specifies which climate variables to include in the basis matrix (e.g., c("tmax", "rainfall", "r_humidity")).

inla_param

Character vector. Specifies exposure variables included in the INLA model (e.g., c("tmin", "rainfall", "r_humidity")).

param_term

Character or vector. Exposure variable(s) of primary interest for relative risk and attribution (e.g., "tmax", "rainfall").

level

Character. Spatial disaggregation level; must be one of "country", "region", or "district".

param_threshold

Numeric. Threshold above which exposure is considered "attributable." Defaults to 1.

filter_year

Integer or vector (optional). Year(s) to filter the data by. Defaults to NULL.

family

Character. Probability distribution for the outcome variable. Options include "poisson" (default) and "nbinomial" for a negative binomial model.

group_by_year

Logical. Whether to group attributable metrics by year. Defaults to FALSE.

config

Logical. Whether to enable additional INLA model configurations. Defaults to TRUE.

save_csv

Logical. If TRUE, saves intermediate datasets to CSV. Defaults to TRUE.

save_model

Logical. If TRUE, saves fitted INLA model results. Defaults to TRUE.

save_fig

Logical. If TRUE, saves generated plots. Defaults to TRUE.

cumulative

Boolean. If TRUE, plot and save cumulative risk of all year for the specific exposure at region and district level. Defaults to FALSE.

output_dir

Character. Directory where output files (plots, datasets, maps) are saved. Defaults to NULL.

Value

A list containing:

Model output from INLA
Monthly random effects plot
Yearly random effects plot
Contour plot
Relative risk map
Relative risk plot
Attributable fraction and number summary

Meta-analysis and BLUPs

Description

Run meta-analysis using temperature average and range as meta predictors. Then create the best linear unbiased predictions (BLUPs).

Usage

dlnm_meta_analysis(
  df_list,
  coef_,
  vcov_,
  save_csv = FALSE,
  output_folder_path = NULL
)

Arguments

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region.

coef_

A matrix of coefficients for the reduced model.

vcov_

A list. Covariance matrices for each region for the reduced model.

save_csv

Boolean. Whether to save the results as a CSV. Defaults to FALSE.

output_folder_path

Path to folder where results should be saved. Defaults to NULL.

Value

mm A model object. A multivariate meta-analysis model.
blup A list. BLUP (best linear unbiased predictions) from the meta-analysis model for each region.
meta_test_res A dataframe of results from statistical tests on the meta model.

Define minimum mortality percentiles and temperatures

Description

Calculate the temperature at which there is minimum mortality risk using the product of the basis matrix and BLUPs.

Usage

dlnm_min_mortality_temp(
  df_list,
  var_fun = "bs",
  var_per = c(10, 75, 90),
  var_degree = 2,
  blup = NULL,
  coef_,
  meta_analysis = FALSE,
  outcome_type = c("temperature", "suicide")
)

Arguments

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region.

var_fun

Character. Exposure function for argvar (see dlnm::crossbasis). Defaults to 'bs'.

var_per

Vector. Internal knot positions for argvar (see dlnm::crossbasis). Defaults to c(10, 75, 90).

var_degree

Integer. Degree of the piecewise polynomial for argvar (see dlnm::crossbasis). Defaults to 2 (quadratic).

blup

A list. BLUP (best linear unbiased predictions) from the meta-analysis model for each region. Defaults to NULL.

coef_

A matrix of coefficients for the reduced model.

meta_analysis

Boolean. Whether to perform a meta-analysis. Defaults to FALSE.

outcome_type

Character. The indicator that the function is being used for. One of 'suicide' or 'temperature'. Defaults to c("temperature", "suicide")

Value

Percentiles and corresponding temperatures for each geography.

Create population totals

Description

Creates a list of population totals by year and region for use in the attributable rate calculations.

Usage

dlnm_pop_totals(df_list, country = "National", meta_analysis = FALSE)

Arguments

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region.

country

Character. Name of country for national level estimates. Defaults to 'National'

meta_analysis

Boolean. Whether to perform a meta-analysis. Defaults to FALSE.

Value

List of population totals by year and region

Power calculation

Description

Produce a power statistic by area for the attributable threshold and above as a reference.

Usage

dlnm_power_list(
  df_list,
  pred_list,
  minperc,
  attr_thr_high = 97.5,
  attr_thr_low = 2.5,
  compute_low = TRUE
)

Arguments

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region.

pred_list

A list containing predictions from the model by region.

minperc

Vector. Percentile of maximum outcome temperature for each region.

attr_thr_high

Integer. Percentile at which to define the upper temperature threshold for calculating attributable risk. Defaults to 97.5.

attr_thr_low

Integer. Percentile at which to define the lower temperature threshold for calculating attributable risk. Defaults to 2.5.

compute_low

Bool. Whether to computer power for the lower threshold. Defaults to FALSE

Value

A list containing power information by area.

Run national predictions from meta analysis

Description

Use the meta analysis to create national level predictions

Usage

dlnm_predict_nat(
  df_list,
  var_fun = "bs",
  var_per = c(25, 50, 75),
  var_degree = 2,
  minpercreg,
  mmpredall,
  pred_list,
  country = "National"
)

Arguments

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region.

var_fun

Character. Exposure function for argvar (see dlnm::crossbasis). Defaults to 'bs'.

var_per

Vector. Internal knot positions for argvar (see dlnm::crossbasis). Defaults to c(25, 50, 75).

var_degree

Integer. Degree of the piecewise polynomial for argvar (see dlnm::crossbasis). Defaults to 2 (quadratic).

minpercreg

Vector. Percentile of maximum suicide temperature for each region.

mmpredall

List of national coefficients and covariance matrices for the crosspred.

pred_list

A list containing predictions from the model by region.

country

Character. Name of country for national level estimates. Defaults to National.

Value

A list containing predictions by region.

Reduce to overall cumulative

Description

Reduce model to the overall cumulative association

Usage

dlnm_reduce_cumulative(
  df_list,
  var_per = c(25, 50, 75),
  var_degree = 2,
  cenper = NULL,
  cb_list,
  model_list
)

Arguments

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region.

var_per

Vector. Internal knot positions for argvar (see dlnm::crossbasis). Defaults to c(25, 50, 75).

var_degree

Integer. Degree of the piecewise polynomial for argvar (see dlnm::crossbasis). Defaults to 2 (quadratic).

cenper

Integer. Value for the percentile in calculating the centering value 0-100. Defaults to NULL.

cb_list

List of cross_basis matrices from create_crossbasis function.

model_list

List of models produced from DLNM analysis.

Value

coef_ A matrix of coefficients for the reduced model.
vcov_ A list. Covariance matrices for each region for the reduced model.

Produce variance inflation factor

Description

Produces variance inflation factor for the independent variables.

Usage

dlnm_vif(df_list, independent_cols = NULL)

Arguments

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region.

independent_cols

Additional independent variables to test in model validation. Defaults to NULL.

Value

A list. Variance inflation factors for each independent variables by region.

Enforce a file extension on a given path

Description

Ensures that the provided file path ends with the desired file extension.

Usage

enforce_file_extension(path, file_extension)

Arguments

path

Character. A file path.

file_extension

Character. The file extension to enforce on 'path'.

Value

Character. The path with the expected file extension.

Extract metadata from a climate_error

Description

Extracts the structured metadata from a typed climate error for use in API responses or logging.

Usage

extract_error_metadata(error)

Arguments

error

A climate_error condition object

Value

A list containing the error metadata (type, column, available, etc.)

Extract mean wildfire PM2.5 values for shapefile regions from NetCDF file

Description

Takes a NetCDF file of gridded wildfire data and shapefile for geographical regions and extracts mean values for each shapefile region.

Information on NetCDF files: https://climatedataguide.ucar.edu/climate-tools/NetCDF#:~:text=An%20nc4%20files%20is%20a,readily%20handle%20netCDF%2D4%20files.

We use a daily time series of gridded wildfire-related PM2.5 concentration from the Finnish Meteorological Institute's SILAM-CTM model. This is available open-source: https://doi.org/10.57707/fmi-b2share.d1cac971b3224d438d5304e945e9f16c.

Usage

extract_means_for_geography(
  ncdf_path,
  shp_path,
  region_col = "region",
  output_value_col = "mean_PM"
)

Arguments

ncdf_path

Path to a NetCDF file

shp_path

Path to a shapefile .shp of the geographical boundaries for which to extract mean values of wildfire-related PM2.5

region_col

Character. The name of the column containing region data in the shapefile. Defaults to 'region'

output_value_col

Character. The name of the value column to include in the output. Defaults to mean_PM

Value

Dataframe containing a daily time series with mean wildfire-related PM2.5 values for each region

Fit GAM model

Description

Fit a generalized additive model (mgcv::gam) including pm25 and its lagged variables (pm25_lag1, ..., pm25_lagN)

Usage

fit_air_pollution_gam(
  data_with_lags,
  max_lag = 14L,
  df_seasonal = 6L,
  family = "quasipoisson"
)

Arguments

data_with_lags

data.frame or tibble containing the outcome, confounders and pm25 lag variables.

max_lag

integer. Maximum lag to include. Defaults to 14.

df_seasonal

integer. Degrees of freedom for seasonal spline. Default 6.

family

character or family object passed to mgcv::gam. Default "quasipoisson".

Value

A list with components:

model: the fitted mgcv::gam object (or NULL if fit failed)
coef_table: data.frame with columns: lag (0 for pm25, 1..N for pm25_lag#, and "0-N" for cumulative), pm25_variable, coef, se, ci.lb, ci.ub
vcov_used_for_cumulative: logical; TRUE if vcov() was used to compute cumulative SE

Calculate p-values for Wald test

Description

Calculate p-values for an explanatory variable.

Usage

fwald(mm, var)

Arguments

mm

A model object. A multivariate meta-analysis model.

var

A character. The name of the variable in the meta-model to calculate p-values for.

Value

A number. The p-value of the explanatory variable.

Generate a run id for descriptive statistics output folders.

Description

Generate a run id for descriptive statistics output folders.

Usage

generate_descriptive_stats_run_id()

Value

Character. Run id in the format YYYYmmdd_HHMMSS_NNNN.

Generate Relative Risk Estimates by Region

Description

Computes relative risk estimates for wildfire-specific PM2.5 exposure across regions as PM values changes.

Usage

generate_rr_pm_by_region(
  data,
  relative_risk_overall,
  scale_factor_wildfire_pm,
  wildfire_lag = 0,
  pm_vals = NULL
)

Arguments

data

Data frame containing a daily time series of mean_PM values, either from the original input csv file or produced after merging wildfire data with the initial csv file.

relative_risk_overall

Data frame containing relative risk estimates and confidence intervals for wildfire-related PM2.5 exposure at different lags. Must include columns: 'lag', 'relative_risk', 'ci_lower', and 'ci_upper'.

scale_factor_wildfire_pm

Numeric. Scaling factor used to normalize PM2.5 values to the unit of exposure used in the original relative risk estimate.

wildfire_lag

Integer. Lag day to filter from the input data for extrapolation. Defaults to 0.

pm_vals

Numeric vector. PM2.5 concentrations over which to compute relative risk. Defaults to a sequence from 0 to the maximum observed wildfire-related PM2.5 in dataset, max(mean_PM).

Value

A data frame with relative risk estimates for each region and PM value.

Relative risk estimates across PM2.5 concentrations for a specified lag.

Description

Computes relative risk and confidence intervals across a range of PM2.5 concentrations for a specified wildfire-related lag, using log-linear extrapolation from a reference estimate.

Usage

generate_rr_pm_overall(
  data,
  relative_risk_overall,
  scale_factor_wildfire_pm,
  wildfire_lag = 0,
  pm_vals = NULL
)

Arguments

data

Data frame containing a daily time series of mean_PM values, either from the original input csv file or produced after merging wildfire data with the initial csv file.

relative_risk_overall

Data frame containing relative risk estimates and confidence intervals for wildfire-related PM2.5 exposure at different lags. Must include columns: 'lag', 'relative_risk', 'ci_lower', and 'ci_upper'.

scale_factor_wildfire_pm

Numeric. Scaling factor used to normalize PM2.5 values to the unit of exposure used in the original relative risk estimate.

wildfire_lag

Integer. Lag day to filter from the input data for extrapolation. Defaults to 0.

pm_vals

Numeric vector. PM2.5 concentrations over which to compute relative risk. Defaults to a sequence from 0 to the maximum observed wildfire-related PM2.5 in dataset, max(mean_PM).

Value

A data frame with columns: 'pm_levels', 'relative_risk', 'ci_lower', and 'ci_upper', representing estimated relative risk and 95% confidence intervals across the specified PM2.5 levels.

Generate and RGB colour value with alpha from a hex value.

Description

Generate and RGB colour value with alpha from a hex value.

Usage

get_alpha_colour(hex, alpha)

Arguments

hex

The hex code of the colour to convert.

alpha

The alpha of the converted colour (ranging from 0-1).

Value

The converted RGB colour.

Create lagged columns and provide the mean value.

Description

Creates new columns containing lagged values over n rows and determine the mean of the lagged column.

Usage

get_lags_and_means(data, lagcol, nlags)

Arguments

data

Dataframe containing a daily time series of climate and health data

lagcol

Character. The column to lag.

nlags

Character. How many rows to obtain a lag from.

Value

Dataframe with added columns for lagged values and mean(s) of those lags.

A function to predict relative risk at country, region, and district level

Description

Produces cumulative relative risk at country, region and district level from analysis.

Usage

get_predictions(data, param_term, max_lag, nk, model, level, case_type)

Arguments

data

Data list from combine_health_climate_data() function.

param_term

A character vector or list containing parameter terms such as tmax (maximum temperature) and rainfall (rainfall exposure).

model

The fitted model from run_inla_models() function.

level

Character. The spatial disaggregation level. Can take one of the following values: country, region, or district.

case_type

Character. The type of disease that the case column refers to. Must be one of diarrhea or malaria.

Value

A dataframe containing cumulative relative risk at the chosen level.

Process data for national analysis

Description

Aggregate to national data and run crossbasis

Usage

hc_add_national_data(
  df_list,
  pop_list,
  var_fun = "bs",
  var_per = c(10, 75, 90),
  var_degree = 2,
  lagn = 21,
  lagnk = 3,
  country = "National",
  cb_list,
  mm,
  minpercgeog_
)

Arguments

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular geography.

pop_list

List of population totals by year and geography.

var_fun

Character. Exposure function for argvar (see dlnm::crossbasis). Defaults to 'bs'.

var_per

Vector. Internal knot positions for argvar (see dlnm::crossbasis). Defaults to c(10, 75, 90).

var_degree

Integer. Degree of the piecewise polynomial for argvar (see dlnm::crossbasis). Defaults to 2 (quadratic).

lagn

Integer. Number of days in the lag period. Defaults to 21. (see dlnm::crossbasis).

lagnk

Integer. Number of knots in lag function. Defaults to 3. (see dlnm::logknots).

country

Character. Name of country for national level estimates. Defaults to 'National'.

cb_list

A list of cross-basis matrices by geography.

mm

A model object. A multivariate meta-analysis model.

minpercgeog_

Vector. Percentile of minimum mortality temperature for each geography.

Value

df_list List. A list of data frames for each geography and national level.
cb_list List. A list of cross-basis matrices by geography and national level.
minpercgeog_ Vector. Percentile of minimum mortality temperature for each geography and national level.
mmpredall List. A list of national coefficients and covariance matrices.

Run ADF test and produce PACF plots for each model combination

Description

Run augmented Dickey-Fuller test for stationarity of dependent variable and produce a partial autocorrelation function plot of residuals for each model combination.

Usage

hc_adf(df_list)

Arguments

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular geography.

Value

'adf_list'. List of ADF results for each geography.

Estimate attributable numbers

Description

Estimate attributable numbers and confidence intervals for each geography using Monte Carlo simulations.

Usage

hc_attr(
  df_list,
  cb_list,
  pred_list,
  minpercgeog_,
  attr_thr_high = 97.5,
  attr_thr_low = 2.5
)

Arguments

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular geography.

cb_list

A list of cross-basis matrices by geography.

pred_list

A list containing predictions from the model by geography.

minpercgeog_

Vector. Percentile of minimum mortality temperature for each geography.

attr_thr_high

Integer. Percentile at which to define the upper temperature threshold for calculating attributable risk. Defaults to 97.5.

attr_thr_low

Integer. Percentile at which to define the lower temperature threshold for calculating attributable risk. Defaults to 2.5.

Value

'attr_list'. A list containing attributable numbers per geography.

Create attributable estimates tables

Description

Aggregate tables of attributable numbers, rates and fractions for total, yearly and monthly by geography and national level.

Usage

hc_attr_tables(attr_list, country = "National", meta_analysis = FALSE)

Arguments

attr_list

A list containing attributable numbers per geography.

country

Character. Name of country for national level estimates. Defaults to 'National'.

meta_analysis

Boolean. Whether to perform a meta-analysis. Defaults to FALSE.

Value

res_attr_tot Dataframe. Total attributable fractions, numbers and rates for each geography over the whole time series.
attr_yr_list List. Dataframes containing yearly estimates of attributable fractions, numbers and rates by geography.
attr_mth_list List. Dataframes containing total attributable fractions, numbers and rates by calendar month and geography.

Create cross-basis matrix

Description

Creates a cross-basis matrix of the lag-response and exposure-response functions, for each geography.

Usage

hc_create_crossbasis(
  df_list,
  var_fun = "bs",
  var_degree = 2,
  var_per = c(10, 75, 90),
  lagn = 21,
  lagnk = 3,
  dfseas = 8
)

Arguments

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular geography.

var_fun

Character. Exposure function for argvar (see dlnm::crossbasis). Defaults to 'bs'.

var_degree

Integer. Degree of the piecewise polynomial for argvar (see dlnm::crossbasis). Defaults to 2 (quadratic).

var_per

Vector. Internal knot positions for argvar (see dlnm::crossbasis). Defaults to c(10,75,90).

lagn

Integer. Number of days in the lag period. Defaults to 21. (see dlnm::crossbasis).

lagnk

Integer. Number of knots in lag function. Defaults to 3. (see dlnm::logknots).

dfseas

Integer. Degrees of freedom for seasonality. Defaults to 8.

Value

'cb_list'. A list of cross-basis matrices by geography.

Produce check results of model combinations

Description

Runs every combination of model based on user selected additional independent variables and returns model diagnostic checks for each.

Usage

hc_model_combo_res(df_list, cb_list, independent_cols = NULL, dfseas = 8)

Arguments

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular geography.

cb_list

List of cross-basis matrices from hc_create_crossbasis function.

independent_cols

Character/list. Additional independent variables to test in model validation as confounders. Defaults to NULL.

dfseas

Integer. Degrees of freedom for seasonality. Defaults to 8.

Value

qaic_results A dataframe of QAIC and dispersion metrics for each model combination.
residuals_list List. Residuals for each model combination.

Model Validation Assessment

Description

Produces results on QAIC for each model combination, variance inflation factor for each independent variable, ADF test for stationarity, and plots for residuals to assess the models.

Usage

hc_model_validation(
  df_list,
  cb_list,
  independent_cols = NULL,
  dfseas = 8,
  save_fig = FALSE,
  save_csv = FALSE,
  output_folder_path = NULL,
  seed = NULL
)

Arguments

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular geography.

cb_list

List of cross-basis matrices from hc_create_crossbasis function.

independent_cols

Character/list. Additional independent variables to test in model validation as confounders. Defaults to NULL.

dfseas

Integer. Degrees of freedom for seasonality. Defaults to 8.

save_fig

Boolean. Whether to save the plot as an output. Defaults to FALSE.

save_csv

Boolean. Whether to save the results as a CSV. Defaults to FALSE.

output_folder_path

Path to folder where plots should be saved. Defaults to NULL.

Value

qaic_results A dataframe of QAIC and dispersion metrics for each model combination and geography.
qaic_summary A dataframe with the mean QAIC and dispersion metrics for each model combination.
vif_results A dataframe of variance inflation factors for each independent variables by geography.
vif_summary A dataframe with the mean variance inflation factors for each independent variable.
adf_results A dataframe of ADF test results for each geography.

Plot attributable fractions by calendar month - low temperatures

Description

Plot attributable fractions grouped over the whole time series by calendar month to explore seasonality.

Usage

hc_plot_af_cold_monthly(
  attr_mth_list,
  df_list,
  country = "National",
  attr_thr_low = 2.5,
  save_fig = FALSE,
  output_folder_path = NULL
)

Arguments

attr_mth_list

A list of data frames containing total attributable fractions, numbers and rates by calendar month and geography.

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular geography.

country

Character. Name of country for national level estimates. Defaults to 'National'.

attr_thr_low

Integer. Percentile at which to define the lower temperature threshold for calculating attributable risk. Defaults to 2.5.

save_fig

Boolean. Whether to save the plot as an output. Defaults to FALSE.

output_folder_path

Path to folder where plots should be saved. Defaults to NULL.

Value

Plots of attributable fractions by calendar month per geography.

Plot attributable fractions for cold by year

Description

Plot attributable fractions by year and geography with confidence intervals.

Usage

hc_plot_af_cold_yearly(
  attr_yr_list,
  save_fig = FALSE,
  output_folder_path = NULL,
  country = "National"
)

Arguments

attr_yr_list

A list of matrices containing yearly estimates of attributable fractions, numbers and rates by geography.

save_fig

Boolean. Whether to save the plot as an output. Defaults to FALSE.

output_folder_path

Path to folder where plots should be saved. Defaults to NULL.

country

Character. Name of country for national level estimates. Defaults to 'National'.

Value

Plots of yearly attributable fractions per geography.

Plot attributable fractions by calendar month - high temperatures

Description

Plot attributable fractions grouped over the whole time series by calendar month to explore seasonality.

Usage

hc_plot_af_heat_monthly(
  attr_mth_list,
  df_list,
  country = "National",
  attr_thr_high = 97.5,
  save_fig = FALSE,
  output_folder_path = NULL
)

Arguments

attr_mth_list

A list of data frames containing total attributable fractions, numbers and rates by calendar month and geography.

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular geography.

country

Character. Name of country for national level estimates. Defaults to 'National'.

attr_thr_high

Integer. Percentile at which to define the upper temperature threshold for calculating attributable risk. Defaults to 97.5.

save_fig

Boolean. Whether to save the plot as an output. Defaults to FALSE.

output_folder_path

Path to folder where plots should be saved. Defaults to NULL.

Value

Plots of attributable fractions by calendar month per geography.

Plot attributable fractions for heat by year

Description

Plot attributable fractions by year and geography with confidence intervals.

Usage

hc_plot_af_heat_yearly(
  attr_yr_list,
  save_fig = FALSE,
  output_folder_path = NULL,
  country = "National"
)

Arguments

attr_yr_list

A list of matrices containing yearly estimates of attributable fractions, numbers and rates by geography.

save_fig

Boolean. Whether to save the plot as an output. Defaults to FALSE.

output_folder_path

Path to folder where plots should be saved. Defaults to NULL.

country

Character. Name of country for national level estimates. Defaults to 'National'.

Value

Plots of yearly attributable fractions per geography.

Plot attributable rates by calendar month - low temperatures

Description

Plot attributable rates grouped over the whole time series by calendar month to explore seasonality.

Usage

hc_plot_ar_cold_monthly(
  attr_mth_list,
  df_list,
  country = "National",
  attr_thr_low = 2.5,
  save_fig = FALSE,
  output_folder_path = NULL
)

Arguments

attr_mth_list

A list of data frames containing total attributable fractions, numbers and rates by calendar month and geography.

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular geography.

country

Character. Name of country for national level estimates. Defaults to 'National'.

attr_thr_low

Integer. Percentile at which to define the lower temperature threshold for calculating attributable risk. Defaults to 2.5.

save_fig

Boolean. Whether to save the plot as an output. Defaults to FALSE.

output_folder_path

Path to folder where plots should be saved. Defaults to NULL.

Value

Plots of attributable rates by calendar month per geography.

Plot attributable rates by year - low temperatures

Description

Plot attributable rates by year and geography with confidence intervals.

Usage

hc_plot_ar_cold_yearly(
  attr_yr_list,
  save_fig = FALSE,
  output_folder_path = NULL,
  country = "National"
)

Arguments

attr_yr_list

A list of matrices containing yearly estimates of attributable fractions, numbers and rates by geography.

save_fig

Boolean. Whether to save the plot as an output. Defaults to FALSE.

output_folder_path

Path to folder where plots should be saved. Defaults to NULL.

country

Character. Name of country for national level estimates. Defaults to 'National'.

Value

Plots of yearly attributable rates per geography.

Plot attributable rates by calendar month - high temperatures

Description

Plot attributable rates grouped over the whole time series by calendar month to explore seasonality.

Usage

hc_plot_ar_heat_monthly(
  attr_mth_list,
  df_list,
  country = "National",
  attr_thr_high = 97.5,
  save_fig = FALSE,
  output_folder_path = NULL
)

Arguments

attr_mth_list

A list of data frames containing total attributable fractions, numbers and rates by calendar month and geography.

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular geography.

country

Character. Name of country for national level estimates. Defaults to 'National'.

attr_thr_high

Integer. Percentile at which to define the upper temperature threshold for calculating attributable risk. Defaults to 97.5.

save_fig

Boolean. Whether to save the plot as an output. Defaults to FALSE.

output_folder_path

Path to folder where plots should be saved. Defaults to NULL.

Value

Plots of attributable rates by calendar month per geography.

Plot attributable rates by year - high temperatures

Description

Plot attributable rates by year and geography with confidence intervals.

Usage

hc_plot_ar_heat_yearly(
  attr_yr_list,
  save_fig = FALSE,
  output_folder_path = NULL,
  country = "National"
)

Arguments

attr_yr_list

A list of matrices containing yearly estimates of attributable fractions, numbers and rates by geography.

save_fig

Boolean. Whether to save the plot as an output. Defaults to FALSE.

output_folder_path

Path to folder where plots should be saved. Defaults to NULL.

country

Character. Name of country for national level estimates. Defaults to 'National'.

Value

Plots of yearly attributable rates per geography.

Plot total attributable fractions and rates - low temperatures

Description

Plot total attributable fractions and rates over the whole time series by geography.

Usage

hc_plot_attr_cold_totals(
  df_list,
  res_attr_tot,
  save_fig = FALSE,
  output_folder_path = NULL,
  country = "National"
)

Arguments

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular geography.

res_attr_tot

Matrix containing total attributable fractions, numbers and rates for each geography over the whole time series.

save_fig

Boolean. Whether to save the plot as an output. Defaults to FALSE.

output_folder_path

Path to folder where plots should be saved. Defaults to NULL.

country

Character. Name of country for national level estimates. Defaults to 'National'.

Value

Plots of total attributable fractions and rates by geography

Plot total attributable fractions and rates - high temperatures

Description

Plot total attributable fractions and rates over the whole time series by geography.

Usage

hc_plot_attr_heat_totals(
  df_list,
  res_attr_tot,
  save_fig = FALSE,
  output_folder_path = NULL,
  country = "National"
)

Arguments

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular geography.

res_attr_tot

Matrix containing total attributable fractions, numbers and rates for each geography over the whole time series.

save_fig

Boolean. Whether to save the plot as an output. Defaults to FALSE.

output_folder_path

Path to folder where plots should be saved. Defaults to NULL.

country

Character. Name of country for national level estimates. Defaults to 'National'.

Value

Plots of total attributable fractions and rates by geography.

Plot statistical power for temperature mortality analysis

Description

Plots the power statistic for each reference temperature at and above the attributable risk threshold for each geography.

Usage

hc_plot_power(
  power_list_high,
  power_list_low,
  save_fig = FALSE,
  output_folder_path = NULL,
  country = "National"
)

Arguments

power_list_high

List. A list containing power information for high temperatures by geography.

power_list_low

List. A list containing power information for low temperatures by geography.

save_fig

Boolean. Whether to save the plot as an output. Defaults to FALSE.

output_folder_path

Character. Path to folder where plots should be saved. Defaults to NULL.

country

Character. The name of the country for national level estimates. Defaults to 'National'.

Value

Plots of power by temperature for the attributable thresholds and above for each geography.

Plot results of relative risk analysis

Description

Plots cumulative lag exposure-response function with histogram of temperature distribution for each geography.

Usage

hc_plot_rr(
  df_list,
  pred_list,
  attr_thr_high = 97.5,
  attr_thr_low = 2.5,
  minpercgeog_,
  country = "National",
  save_fig = FALSE,
  output_folder_path = NULL
)

Arguments

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular geography.

pred_list

A list containing predictions from the model by geography.

attr_thr_high

Integer. Percentile at which to define the upper temperature threshold for calculating attributable risk. Defaults to 97.5.

attr_thr_low

Integer. Percentile at which to define the lower temperature threshold for calculating attributable risk. Defaults to 2.5.

minpercgeog_

Vector. Percentile of minimum mortality temperature for each geography.

country

Character. Name of country for national level estimates. Defaults to 'National'.

save_fig

Boolean. Whether to save the plot as an output. Defaults to FALSE.

output_folder_path

Path to folder where plots should be saved. Defaults to NULL.

Value

Plots of cumulative lag exposure-response function with histogram of temperature distribution for each geography.

Run predictions from model

Description

Use model to run predictions. Predictions can be produced for a single input geography, or multiple disaggregated geographies.

Usage

hc_predict_subnat(
  df_list,
  var_fun = "bs",
  var_per = c(10, 75, 90),
  var_degree = 2,
  mintempgeog_,
  blup,
  coef_,
  vcov_,
  meta_analysis = FALSE
)

Arguments

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular geography.

var_fun

Character. Exposure function for argvar (see dlnm::crossbasis). Defaults to 'bs'.

var_per

Vector. Internal knot positions for argvar (see dlnm::crossbasis). Defaults to c(10, 75, 90).

var_degree

Integer. Degree of the piecewise polynomial for argvar (see dlnm::crossbasis). Defaults to 2 (quadratic).

mintempgeog_

Vector. Percentile of minimum mortality temperature for each geography.

blup

A list. BLUP (best linear unbiased predictions) from the meta-analysis model for each geography.

coef_

A matrix of coefficients for the reduced model.

vcov_

A list. Covariance matrices for each geography for the reduced model.

meta_analysis

Boolean. Whether to perform a meta-analysis. Defaults to FALSE.

Value

'pred_list'. A list containing predictions by geography.

Define and run quasi-Poisson regression with DLNM

Description

Fits a quasi-Poisson case-crossover with a distributed lag non-linear model.

Usage

hc_quasipoisson_dlnm(df_list, control_cols = NULL, cb_list, dfseas = 8)

Arguments

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular geography.

control_cols

List. Confounders to include in the final model adjustment. Defaults to NULL.

cb_list

List of cross-basis matrices from hc_create_crossbasis function.

dfseas

Integer. Degrees of freedom for seasonality. Defaults to 8.

Value

'model_list'. List containing models by geography.

Read temperature-related mortality indicator data

Description

Reads in data and geography names for analysis from a CSV file.

Usage

hc_read_data(
  input_csv_path,
  dependent_col,
  date_col,
  region_col,
  temperature_col,
  population_col
)

Arguments

input_csv_path

Path to a CSV containing a daily time series of health outcome and climate data per geography.

dependent_col

Character. Name of the column in the dataframe containing the dependent health outcome variable e.g. deaths.

date_col

Character. Name of the column in the dataframe containing the date.

region_col

Character. Name of the column in the dataframe that contains the geography name(s).

temperature_col

Character. Name of the column in the dataframe that contains the temperature column.

population_col

Character. Name of the column in the dataframe that contains the population estimate per geography.

Value

'df_list'. A list of dataframes for each geography with formatted and renamed columns.

Produce cumulative relative risk results of analysis

Description

Produces cumulative relative risk and confidence intervals from analysis.

Usage

hc_rr_results(
  pred_list,
  df_list,
  minpercgeog_,
  attr_thr_high = 97.5,
  attr_thr_low = 2.5
)

Arguments

pred_list

A list containing predictions from the model by geography.

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular geography.

minpercgeog_

Vector. Percentile of minimum mortality temperature for each geography.

attr_thr_high

Integer. Percentile at which to define the upper temperature threshold for calculating attributable risk. Defaults to 97.5.

attr_thr_low

Integer. Percentile at which to define the lower temperature threshold for calculating attributable risk. Defaults to 2.5.

Value

'rr_results'. Dataframe containing cumulative relative risk and confidence intervals from analysis.

Save results of analysis

Description

Saves a CSV file of cumulative relative risk and confidence intervals.

Usage

hc_save_results(
  rr_results,
  res_attr_tot,
  attr_yr_list,
  attr_mth_list,
  power_list_high,
  power_list_low,
  output_folder_path = NULL
)

Arguments

rr_results

Dataframe containing cumulative relative risk and confidence intervals from analysis.

res_attr_tot

Matrix containing total attributable fractions, numbers and rates for each geography over the whole time series.

attr_yr_list

A list of matrices containing yearly estimates of attributable fractions, numbers and rates by geography.

attr_mth_list

A list of data frames containing total attributable fractions, numbers and rates by calendar month and geography.

output_folder_path

Path to folder where results should be saved. Defaults to NULL.

Install the INLA Package from Its Official Repository

Description

This function installs the INLA package from its official repository at https://inla.r-inla-download.org/R/stable/. On Windows, it checks whether Rtools is available and installs the official binary package directly.

Usage

install_INLA(os = .Platform$OS.type)

Arguments

os

The current operating system. Defaults to .Platform$OS.type.

Details

On Windows systems, the function verifies that Rtools is installed using pkgbuild::has_build_tools(). If Rtools is missing, it displays a warning and aborts the installation. The function then installs the matching Windows binary package from the official INLA repository.

On non-Windows systems, the package is installed normally from the repository.

Value

Invisibly returns NULL. The function is called for its side effect.

Examples

## Not run: 
install_INLA()

## End(Not run)

Install the terra Package from the CRAN Archive

Description

This function installs the terra package at version 1.8-60 from the CRAN archive.

Usage

install_terra(os = .Platform$OS.type)

Arguments

os

The current operating system. Defaults to .Platform$OS.type.

Details

Value

Invisibly returns NULL. The function is called for its side effect.

Examples

## Not run: 
install_terra()

## End(Not run)

Check if an error is a climate_error

Description

Utility function to check if a caught condition is a typed climate error.

Usage

is_climate_error(error)

Arguments

error

A condition object

Value

TRUE if the error inherits from "climate_error", FALSE otherwise.

Examples


tryCatch({
  stop("example error")
}, error = function(e) {
  if (is_climate_error(e)) {
    # Handle structured error
  } else {
    # Handle untyped error
  }
})

Join monthly PM2.5 estimates with attributable risk data by region and time

Description

Aggregates PM2.5 data to monthly averages by region and joins it with attributable risk data using year, month, and region as keys.

Usage

join_ar_and_pm_monthly(pm_data, an_ar_data)

Arguments

pm_data

A data frame with columns: year, month, region, mean_PM. Represents monthly PM2.5 estimates.

an_ar_data

A data frame with columns: year, month, region. Represents attributable risk or fraction data to be joined with PM2.5 estimates.

Value

A data frame with monthly average PM2.5 values joined to attributable risk data.

Join health and climate data

Description

Joins a daily time series of wildfire PM2.5 data with a daily time series of health data.

Usage

join_health_and_climate_data(
  climate_data,
  health_data,
  region_col = "region",
  date_col = "date",
  exposure_col = "mean_PM"
)

Arguments

climate_data

Character. Dataframe containing a daily time series of climate data, which may be disaggregated by region.

health_data

Character. Path to a CSV file containing a daily time series of data for a particular health outcome, which may be disaggregated by region.

region_col

Character. Name of the region column in both datasets. Defaults to 'region'

date_col

Character. Name of the date column in both datasets. Defaults to 'date'

exposure_col

Character. Name of the column in the climate data containing the exposure column (e.g., PM2.5) in kilograms. Defaults to 'mean_PM'.

Value

Dataframe containing a daily time series of the joined climate and health data.

Append the units to a column label.

Description

Append the units to a column label.

Usage

label_with_unit(col, units)

Arguments

col

Character. The column name.

units

Named Character vector. A vector of units (str) that map to columns.

Value

The new column label containing units (if col in units).

Read in climate, environmental and health data and rename columns

Description

Reads in a CSV file for a daily time series of climate, environmental and health data and renames them to standardised names. This function creates year, month, day, and day of week columns derived from the date.

Usage

load_air_pollution_data(
  data_path,
  date_col = "date",
  region_col = "region",
  pm25_col = "pm25",
  deaths_col = "deaths",
  population_col = "population",
  humidity_col = "humidity",
  precipitation_col = "precipitation",
  tmax_col = "tmax",
  wind_speed_col = "wind_speed",
  categorical_others = NULL,
  continuous_others = NULL
)

Arguments

data_path

Path to a CSV file containing a daily time series of data.

date_col

Character. Name of date column in the dataframe with format YYYY-MM-DD. Defaults to "date".

region_col

Character. Name of region column in the dataframe. Defaults to "region".

pm25_col

Character. Name of PM2.5 column in the dataframe. Defaults to "pm25".

deaths_col

Character. Name of all-cause mortality column in the dataframe (Note that deaths_col variable has value 1 for each recorded death). 'Defaults to "deaths"

population_col

Character. Name of population column in the dataframe. This is REQUIRED for calculating Attributable Rate (AR). Defaults to "population".

humidity_col

Character. Name of humidity column in the dataframe. Defaults to "humidity".

precipitation_col

Character. Name of precipitation column in the dataframe. Defaults to "precipitation".

tmax_col

Character. Name of maximum temperature column in the dataframe. Defaults to "tmax".

wind_speed_col

Character. Name of wind speed column in the dataframe. Defaults to "wind_speed".

categorical_others

Optional. Character vector of additional categorical variables (e.g., "sex", "age_group"). Defaults to NULL.

continuous_others

Optional. Character vector of additional continuous variables (e.g., "tmean"). Defaults to NULL.

Value

Dataframe with formatted and renamed with standardized column names.

Read in and format climate data

Description

Read in a monthly time series of climate data, rename columns and create lag variable for spatiotemporal and DLNM analysis. The climate data should start a year before a start year in the health data to allow the lag variables calculation.

Usage

load_and_process_climatedata(
  climate_data_path,
  district_col,
  year_col,
  month_col,
  tmin_col,
  tmean_col,
  tmax_col,
  rainfall_col,
  r_humidity_col,
  runoff_col = NULL,
  ndvi_col = NULL,
  spi_col = NULL,
  max_lag
)

Arguments

climate_data_path

Path to a csv file containing a monthly time series of data for climate variables, which may be disaggregated by district.

district_col

Character. Name of the column in the dataframe that contains the region names.

year_col

Character. Name of the column in the dataframe that contains the Year.

month_col

Character. Name of the column in the dataframe that contains the month.

tmin_col

Character. Name of the column in the dataframe that contains the minimum temperature data.

tmean_col

Character. Name of the column in the dataframe that contains the average temperature.

tmax_col

Character. Name of the column in the dataframe that contains the maximum temperature.

rainfall_col

Character. Name of the column in the dataframe that contains the cumulative monthly rainfall.

r_humidity_col

Character. Name of the column in the dataframe that contains the relative humidity.

runoff_col

Character. Name of the column in the dataframe that contains the monthly runoff water data. Defaults to NULL.

ndvi_col

Character. Name of column containing the Normalized Difference Vegetation Index (ndvi) data. Defaults to NULL.

spi_col

Character. Name of the column in the dataframe that contains the standardized precipitation index. Defaults to NULL.

max_lag

Character. Number corresponding to the maximum lag to be considered for the delay effect. It should be between 2 an 4. Defaults to 4.

Value

climate dataframe with formatted and renamed columns, and the lag variables

Read in and format health data - diseases cases type

Description

Read in a csv file containing a monthly time series of health outcomes and population data. Renames columns and creates time variables for spatiotemporal analysis.

Usage

load_and_process_data(
  health_data_path,
  region_col,
  district_col,
  date_col = NULL,
  year_col = NULL,
  month_col = NULL,
  case_col,
  case_type,
  tot_pop_col
)

Arguments

health_data_path

Path to a csv file containing a monthly time series of data for health outcome case type, which may be disaggregated by sex (under five case or above five case), and by Region and District.

region_col

Character. Name of the column in the dataframe that contains the region names.

district_col

Character. Name of the column in the dataframe that contains the district names.

date_col

Character. Name of the column in the dataframe that contains the date. Defaults to NULL.

year_col

Character. Name of the column in the dataframe that contains the year.

month_col

Character. Name of the column in the dataframe that contains the month.

case_col

Character. Name of the column in the dataframe that contains the disease cases to be considered.

case_type

Character. The type of disease that the case column refers to. Must be one of 'diarrhea' or 'malaria'.

tot_pop_col

Character. Name of the column in the dataframe that contains the total population.

Value

A dataframe with formatted and renamed columns.

Read in and format country map data

Description

: Read in a shape file, rename columns and create the adjacency matrix for spatiotemporal analysis.

Usage

load_and_process_map(
  map_path,
  region_col,
  district_col,
  geometry_col,
  output_dir = NULL
)

Arguments

map_path

The path to the country's geographic data (shape file "sf" data).

region_col

Character. The region column in the dataframe.

district_col

Character. The district column in the dataframe.

geometry_col

Character. The geometry column in the dataframe.

output_dir

Character. The path to output the processed adjacency (neighboring) matrix, and the map graph.

Value

'map' The processed map
'nb.map'
'graph_file'

Load wildfire and health data

Description

Loads a dataframe containing a daily time series of health and climate data, which may be disaggregated by region.

Usage

load_wildfire_data(
  health_path,
  ncdf_path,
  shp_path,
  join_wildfire_data = TRUE,
  date_col,
  region_col,
  shape_region_col = NULL,
  mean_temperature_col,
  health_outcome_col,
  population_col = NULL,
  rh_col = NULL,
  wind_speed_col = NULL,
  pm_2_5_col = NULL
)

Arguments

health_path

Path to a CSV file containing a daily time series of data for a particular health outcome, which may be disaggregated by region. If this does not include a column with wildfire-related PM2.5, use join_wildfire_data = TRUE to join these data.

ncdf_path

Path to a NetCDF file containing a daily time series of gridded wildfire-related PM2.5 concentration data.

shp_path

Path to a shapefile .shp of the geographical boundaries for which to extract mean values of wildfire-related PM2.5

join_wildfire_data

Boolean. If TRUE, a daily time series of wildfire-related PM2.5 concentration is joined to the health data. If FALSE, the data set is loaded without any additional joins.

date_col

Character. Name of the column in the dataframe that contains the date.

region_col

Character. Name of the column in the dataframe that contains the region names.

shape_region_col

Character. Name of the column in the shapefile dataframe that contains the region names.

mean_temperature_col

Character. Name of the column in the dataframe that contains the mean temperature column.

health_outcome_col

Character. Name of the column in the dataframe that contains the health outcome count column (e.g. number of deaths, hospital admissions)

population_col

Character. Name of the column in the dataframe that contains the population data. Defaults to NULL. If omitted, a pop column is used when present.

rh_col

Character. Name of the column in the dataframe that contains daily relative humidity values.Defaults to NULL.

wind_speed_col

Character. Name of the column in the dataframe that contains the daily windspeed values.Defaults to NULL.

pm_2_5_col

Character. The name of the column containing PM2.5 values in micrograms. This is only required if health data isn't joined. Defaults to NULL.

Value

Dataframe containing a daily time series of climate and health data.

Code for calculating Malaria disease cases attributable to extreme rainfall and extreme temperature Run Full Malaria-Climate Analysis Pipeline

Description

The Malaria_do_analysis() function executes the complete workflow for analyzing the association between malaria cases and climate variables. It integrates health, climate, and spatial data; fits spatio-temporal models using INLA; and generates a suite of diagnostic and inferential outputs, including plots and attributable risk estimates.

Usage

malaria_do_analysis(
  health_data_path,
  climate_data_path,
  map_path,
  region_col,
  district_col,
  date_col = NULL,
  year_col,
  month_col,
  case_col,
  tot_pop_col,
  tmin_col,
  tmean_col,
  tmax_col,
  rainfall_col,
  r_humidity_col,
  runoff_col,
  geometry_col,
  spi_col = NULL,
  ndvi_col = NULL,
  max_lag = 2,
  nk = 2,
  basis_matrices_choices,
  inla_param,
  param_term,
  level,
  param_threshold = 1,
  filter_year = NULL,
  family = "nbinomial",
  group_by_year = FALSE,
  cumulative = FALSE,
  config = FALSE,
  save_csv = FALSE,
  save_model = FALSE,
  save_fig = FALSE,
  output_dir = NULL
)

Arguments

health_data_path

Character. Path to the processed health data file.

climate_data_path

Character. Path to the processed climate data file.

map_path

Character. Path to the spatial data file (e.g., shapefile).

region_col

Character. Column name for the region variable.

district_col

Character. Column name for the district variable.

date_col

Character (optional). Column name for the date variable. Defaults to NULL.

year_col

Character. Column name for the year variable.

month_col

Character. Column name for the month variable.

case_col

Character. Column name for malaria case counts.

tot_pop_col

Character. Column name for total population.

tmin_col

Character. Column name for minimum temperature.

tmean_col

Character. Column name for mean temperature.

tmax_col

Character. Column name for maximum temperature.

rainfall_col

Character. Column name for cumulative monthly rainfall.

r_humidity_col

Character. Column name for relative humidity.

runoff_col

Character. Column name for monthly runoff data.

geometry_col

Character. Column name of the geometry column in the shapefile (usually "geometry").

spi_col

Character (optional). Column name for the Standardized Precipitation Index (SPI). Defaults to NULL.

ndvi_col

Character (optional). Column name for the Normalized Difference Vegetation Index (NDVI). Defaults to NULL.

max_lag

Numeric. Maximum temporal lag to include in the distributed lag model (e.g., 2-4). Defaults to 4.

nk

basis_matrices_choices

Character vector. Specifies which climate variables to include in the basis matrix (e.g., c("tmax", "rainfall", "r_humidity")).

inla_param

Character vector. Specifies exposure variables included in the INLA model (e.g., c("tmin", "rainfall", "r_humidity")).

param_term

Character or vector. Exposure variable(s) of primary interest for relative risk and attribution (e.g., "tmax", "rainfall").

level

Character. Spatial disaggregation level; must be one of "country", "region", or "district".

param_threshold

Numeric. Threshold above which exposure is considered "attributable." Defaults to 1.

filter_year

Integer or vector (optional). Year(s) to filter the data by. Defaults to NULL.

family

Character. Probability distribution for the outcome variable. Options include "poisson" (default) and "nbinomial" for a negative binomial model.

group_by_year

Logical. Whether to group attributable metrics by year. Defaults to FALSE.

cumulative

Boolean. If TRUE, plot and save cumulative risk of all year for the specific exposure at region and district level. Defaults to FALSE.

config

Logical. Whether to enable additional INLA model configurations. Defaults to TRUE.

save_csv

Logical. If TRUE, saves intermediate datasets to CSV. Defaults to TRUE.

save_model

Logical. If TRUE, saves fitted INLA model results. Defaults to TRUE.

save_fig

Logical. If TRUE, saves generated plots. Defaults to TRUE.

output_dir

Character. Directory where output files (plots, datasets, maps) are saved. Defaults to NULL.

Value

A named list containing:

inla_result - Fitted INLA model object and summaries.
plot_malaria, plot_tmax, plot_rainfall - Exploratory time-series plots.
reff_plot_monthly - Monthly random effects plot.
reff_plot_yearly - Yearly spatial random effects plot.
contour_plot - Exposure-response contour plot.
rr_map_plot - Spatial relative risk map.
rr_plot, rr_df - Relative risk plot and associated data.
attr_frac_num - Attributable risk summary table.
plot_AR_num, plot_AR_frac, plot_AR_per_100k - Plots of attributable number, fraction, and rate.

Process data for national analysis

Description

Aggregate to national data and run crossbasis

Usage

mh_add_national_data(
  df_list,
  pop_list,
  var_fun = "bs",
  var_per = c(25, 50, 75),
  var_degree = 2,
  lag_fun = "strata",
  lag_breaks = 1,
  lag_days = 2,
  country = "National",
  cb_list,
  mm,
  minpercreg
)

Arguments

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region.

pop_list

List of population totals by year and region.

var_fun

Character. Exposure function for argvar (see dlnm::crossbasis). Defaults to 'bs'.

var_per

Vector. Internal knot positions for argvar (see dlnm::crossbasis). Defaults to c(25,50,75).

var_degree

Integer. Degree of the piecewise polynomial for argvar (see dlnm::crossbasis). Defaults to 2 (quadratic).

lag_fun

Character. Exposure function for arglag (see dlnm::crossbasis). Defaults to 'strata'.

lag_breaks

Integer. Internal cut-off point defining the strata for arglag (see dlnm::crossbasis). Defaults to 1.

lag_days

Integer. Maximum lag. Defaults to 2. (see dlnm::crossbasis).

country

Character. Name of country for national level estimates.

cb_list

A list of cross-basis matrices by region.

mm

A model object. A multivariate meta-analysis model.

minpercreg

Vector. Percentile of maximum suicide temperature for each region.

Value

df_list List. A list of data frames for each region and nation.
cb_list List. A list of cross-basis matrices by region and nation.
minpercreg Vector. Percentile of minimum suicide temperature for each region and nation.
mmpredall List. A list of national coefficients and covariance matrices.

Estimate attributable numbers

Description

Estimate attributable numbers for each region and confidence intervals using Monte Carlo simulations.

Usage

mh_attr(df_list, cb_list, pred_list, minpercreg, attr_thr = 97.5)

Arguments

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region.

cb_list

A list of cross-basis matrices by region.

pred_list

A list containing predictions from the model by region.

minpercreg

Vector. Percentile of maximum suicide temperature for each region.

attr_thr

Integer. Percentile at which to define the temperature threshold for calculating attributable risk. Defaults to 97.5.

Value

A list containing attributable numbers per region

Create attributable estimates tables

Description

Aggregate tables of attributable numbers, rates and fractions for total, yearly and monthly by region and nation

Usage

mh_attr_tables(attr_list, country = "National", meta_analysis = FALSE)

Arguments

attr_list

A list containing attributable numbers per region.

country

Character. Name of country for national level estimates. Defaults to 'National'.

meta_analysis

Boolean. Whether to perform a meta-analysis. Defaults to FALSE.

Value

res_attr_tot Dataframe. Total attributable fractions, numbers and rates for each area over the whole time series.
attr_yr_list List. Dataframes containing yearly estimates of attributable fractions, numbers and rates by area.
attr_mth_list List. Dataframes containing total attributable fractions, numbers and rates by calendar month and area.

Quasi-Poisson Case-Crossover model with DLNM

Description

Fits a quasi-Poisson case-crossover with a distributed lag non-linear model

Usage

mh_casecrossover_dlnm(df_list, control_cols = NULL, cb_list)

Arguments

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region.

control_cols

A list of confounders to include in the final model adjustment. Defaults to NULL if none.

cb_list

List of cross_basis matrices from create_crossbasis function.

Value

List containing models by region

Create cross-basis matrix

Description

Creates a cross-basis matrix for each region

Usage

mh_create_crossbasis(
  df_list,
  var_fun = "bs",
  var_degree = 2,
  var_per = c(25, 50, 75),
  lag_fun = "strata",
  lag_breaks = 1,
  lag_days = 2
)

Arguments

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region.

var_fun

Character. Exposure function for argvar (see dlnm::crossbasis). Defaults to 'bs'.

var_degree

Integer. Degree of the piecewise polynomial for argvar (see dlnm::crossbasis). Defaults to 2 (quadratic).

var_per

Vector. Internal knot positions for argvar (see dlnm::crossbasis). Defaults to c(25,50,75).

lag_fun

Character. Exposure function for arglag (see dlnm::crossbasis). Defaults to 'strata'.

lag_breaks

Integer. Internal cut-off point defining the strata for arglag (see dlnm::crossbasis). Defaults to 1.

lag_days

Integer. Maximum lag. Defaults to 2. (see dlnm::crossbasis).

Value

A list of cross-basis matrices by region

Produce check results of model combinations

Description

Runs every combination of model based on user selected additional independent variables and returns model diagnostic checks for each.

Usage

mh_model_combo_res(df_list, cb_list, independent_cols = NULL)

Arguments

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region.

cb_list

List of cross_basis matrices from create_crossbasis function.

independent_cols

Additional independent variables to test in model validation as confounders.

Value

qaic_results A dataframe of QAIC and dispersion metrics for each model combination.
residuals_list A list. Residuals for each model combination.

Model Validation Assessment

Description

Produces results on QAIC for each model combination, variance inflation factor for each independent variable, and plots for residuals to assess the models

Usage

mh_model_validation(
  df_list,
  cb_list,
  independent_cols = NULL,
  save_fig = FALSE,
  save_csv = FALSE,
  output_folder_path = NULL,
  seed = NULL
)

Arguments

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region.

cb_list

List of cross_basis matrices from create_crossbasis function.

independent_cols

Additional independent variables to test in model validation as confounders.

save_fig

Boolean. Whether to save the plot as an output. Defaults to FALSE.

save_csv

Boolean. Whether to save the results as a CSV. Defaults to FALSE.

output_folder_path

Path to folder where plots should be saved. Defaults to NULL.

Value

qaic_results A dataframe of QAIC and dispersion metrics for each model combination and geography.
qaic_summary A dataframe with the mean QAIC and dispersion metrics for each model combination.
vif_results A dataframe. Variance inflation factors for each independent variables by region.
vif_summary A dataframe with the mean variance inflation factors for each independent variable.

Plot attributable fractions by calendar month

Description

Plot attributable fractions grouped over the whole time series by calendar month to explore seasonality.

Usage

mh_plot_af_monthly(
  attr_mth_list,
  df_list,
  country = "National",
  attr_thr = 97.5,
  save_fig = FALSE,
  output_folder_path = NULL
)

Arguments

attr_mth_list

A list of data frames containing total attributable fractions, numbers and rates by calendar month and area.

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region.

country

Character. Name of country for national level estimates. Defaults to 'National'.

attr_thr

Integer. Percentile at which to define the temperature threshold for calculating attributable risk. Defaults to 97.5.

save_fig

Boolean. Whether to save the plot as an output. Defaults to FALSE.

output_folder_path

Path to folder where plots should be saved. Defaults to NULL.

Value

Plots of attributable fractions by calendar month per area

Plot attributable fractions by year

Description

Plot attributable fractions by year and area with confidence intervals

Usage

mh_plot_af_yearly(
  attr_yr_list,
  save_fig = FALSE,
  output_folder_path = NULL,
  country = "National"
)

Arguments

attr_yr_list

A list of matrices containing yearly estimates of attributable fractions, numbers and rates by area

save_fig

Boolean. Whether to save the plot as an output. Defaults to FALSE.

output_folder_path

Path to folder where plots should be saved. Defaults to NULL.

country

Character. Name of country for national level estimates. Defaults to 'National'

Value

Plots of yearly attributable fractions per area

Plot attributable rates by calendar month

Description

Plot attributable rates grouped over the whole time series by calendar month to explore seasonality.

Usage

mh_plot_ar_monthly(
  attr_mth_list,
  df_list,
  country = "National",
  attr_thr = 97.5,
  save_fig = FALSE,
  output_folder_path = NULL
)

Arguments

attr_mth_list

A list of data frames containing total attributable fractions, numbers and rates by calendar month and area.

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region.

country

Character. Name of country for national level estimates. Defaults to 'National'.

attr_thr

Integer. Percentile at which to define the temperature threshold for calculating attributable risk. Defaults to 97.5.

save_fig

Boolean. Whether to save the plot as an output. Defaults to FALSE.

output_folder_path

Path to folder where plots should be saved. Defaults to NULL.

Value

Plots of attributable rates by calendar month per area

Plot attributable rates by year

Description

Plot attributable rates by year and area with confidence intervals

Usage

mh_plot_ar_yearly(
  attr_yr_list,
  save_fig = FALSE,
  output_folder_path = NULL,
  country = "National"
)

Arguments

attr_yr_list

A list of matrices containing yearly estimates of attributable fractions, numbers and rates by area

save_fig

Boolean. Whether to save the plot as an output. Defaults to FALSE.

output_folder_path

Path to folder where plots should be saved. Defaults to NULL.

country

Character. Name of country for national level estimates. Defaults to 'National'.

Value

Plots of yearly attributable rates per area

Plot total attributable fractions and rates

Description

Plot total attributable fractions and rates over the whole time series by area.

Usage

mh_plot_attr_totals(
  df_list,
  res_attr_tot,
  save_fig = FALSE,
  output_folder_path = NULL,
  country = "National"
)

Arguments

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region.

res_attr_tot

Matrix containing total attributable fractions, numbers and rates for each area over the whole time series.

save_fig

Boolean. Whether to save the plot as an output. Defaults to FALSE.

output_folder_path

Path to folder where plots should be saved. Defaults to NULL.

country

Character. Name of country for national level estimates. Defaults to 'National'.

Value

Plots of total attributable fractions and rates by area

Plot power

Description

Plots the power statistic for each reference temperature at and above the attributable risk threshold for each area.

Usage

mh_plot_power(
  power_list,
  save_fig = FALSE,
  output_folder_path = NULL,
  country = "National"
)

Arguments

power_list

A list containing power information by area.

save_fig

Boolean. Whether to save the plot as an output. Defaults to FALSE.

output_folder_path

Path to folder where plots should be saved. Defaults to NULL.

country

Character. Name of country for national level estimates. Defaults to 'National'

Value

Plots of power by temperature for the attributable threshold and above for each area.

Plot results of relative risk analysis - Mental Health

Description

Plots cumulative lag exposure-response function with histogram of temperature distribution for each region

Usage

mh_plot_rr(
  df_list,
  pred_list,
  attr_thr = 97.5,
  minpercreg,
  country = "National",
  save_fig = FALSE,
  output_folder_path = NULL
)

Arguments

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region.

pred_list

A list containing predictions from the model by region.

attr_thr

Integer. Percentile at which to define the temperature threshold for calculating attributable risk. Defaults to 97.5.

minpercreg

Vector. Percentile of minimum suicide temperature for each area.

country

Character. Name of country for national level estimates. Defaults to 'National'.

save_fig

Boolean. Whether to save the plot as an output. Defaults to FALSE.

output_folder_path

Path to folder where plots should be saved. Defaults to NULL.

Value

Plots of cumulative lag exposure-response function with histogram of temperature distribution for each region

Run regional predictions from model

Description

Use model to run regional predictions

Usage

mh_predict_reg(
  df_list,
  var_fun = "bs",
  var_per = c(25, 50, 75),
  var_degree = 2,
  minpercreg,
  blup,
  coef_,
  vcov_,
  meta_analysis = FALSE
)

Arguments

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region.

var_fun

Character. Exposure function for argvar (see dlnm::crossbasis). Defaults to 'bs'.

var_per

Vector. Internal knot positions for argvar (see dlnm::crossbasis). Defaults to c(25,50,75).

var_degree

Integer. Degree of the piecewise polynomial for argvar (see dlnm::crossbasis). Defaults to 2 (quadratic).

minpercreg

Vector. Percentile of maximum suicide temperature for each region.

blup

A list. BLUP (best linear unbiased predictions) from the meta-analysis model for each region.

coef_

A matrix of coefficients for the reduced model.

vcov_

A list. Covariance matrices for each region for the reduced model.

meta_analysis

Boolean. Whether to perform a meta-analysis.

Value

A list containing predictions by region

Read in and format data - Mental Health

Description

Reads in a CSV file for a daily time series of health and climate data, renames columns and creates stratum for case-crossover analysis.

Usage

mh_read_and_format_data(
  data_path,
  date_col,
  region_col = NULL,
  temperature_col,
  health_outcome_col,
  population_col
)

Arguments

data_path

Path to a csv file containing a daily time series of data for a particular health outcome and climate variables, which may be disaggregated by region.

date_col

Character. Name of the column in the dataframe that contains the date.

region_col

Character. Name of the column in the dataframe that contains the region names. Defaults to NULL.

temperature_col

Character. Name of the column in the dataframe that contains the temperature column.

health_outcome_col

Character. Name of the column in the dataframe that contains the health outcome count column (e.g. number of deaths, hospital admissions).

population_col

Character. Name of the column in the dataframe that contains the population estimate coloumn.

Value

A list of dataframes with formatted and renamed columns.

Produce cumulative relative risk results of analysis

Description

Produces cumulative relative risk and confidence intervals from analysis.

Usage

mh_rr_results(pred_list, df_list, attr_thr = 97.5, minpercreg)

Arguments

pred_list

A list containing predictions from the model by region.

df_list

A list of dataframes containing daily timeseries data for a health outcome and climate variables which may be disaggregated by a particular region.

attr_thr

Integer. Percentile at which to define the temperature threshold for calculating attributable risk. Defaults to 97.5.

minpercreg

Vector. Percentile of minimum suicide temperature for each area.

Value

Dataframe containing cumulative relative risk and confidence intervals from analysis.

Save results of analysis - Mental Health

Description

Saves a CSV file of cumulative relative risk and confidence intervals.

Usage

mh_save_results(
  rr_results,
  res_attr_tot,
  attr_yr_list,
  attr_mth_list,
  power_list,
  output_folder_path = NULL
)

Arguments

rr_results

Dataframe containing cumulative relative risk and confidence intervals from analysis.

res_attr_tot

Matrix containing total attributable fractions, numbers and rates for each area over the whole time series.

attr_yr_list

A list of matrices containing yearly estimates of attributable fractions, numbers and rates by area

attr_mth_list

A list of data frames containing total attributable fractions, numbers and rates by calendar month and area.

power_list

A list containing power information by area.

output_folder_path

Path to folder where results should be saved. Defaults to NULL.

Plot relative risk results by region (if available).

Description

Plots relative risk and confidence intervals for each lag value of wildfire-related PM2.5

Usage

plot_RR(
  rr_data,
  wildfire_lag,
  by_region = FALSE,
  save_fig = FALSE,
  output_folder_path = NULL
)

Arguments

rr_data

Dataframe of relative risk and confidence intervals for each lag of wildfire-related PM2.5

wildfire_lag

Integer. The maximum number of days for which to plot the lags for wildfire PM2.5. Defaults to 3.

by_region

Bool. Whether to plot RR(relative risk) by region. Defaults to FALSE

save_fig

Boolean. Whether to save the generated plot. Defaults to FALSE.

output_folder_path

Path to folder where plots should be saved.

Value

Plot of relative risk and confidence intervals for each lag of wildfire-related PM2.5

Core functionality for plotting results of relative risk analysis.

Description

Plots relative risk and confidence intervals for each lag value of wildfire-related PM2.5.

Usage

plot_RR_core(
  rr_data,
  save_fig = FALSE,
  wildfire_lag,
  output_folder_path = NULL,
  region_name = "All regions",
  ylims = NULL
)

Arguments

rr_data

Dataframe of relative risk and confidence intervals for each lag of wildfire-related PM2.5.

save_fig

Boolean. Whether to save the plot as an output. Defaults to FALSE.

wildfire_lag

Integer. The maximum number of days for which to plot the lags for wildfire PM2.5. Defaults to 3.

output_folder_path

Path to folder where plots should be saved. Defaults to NULL.

region_name

Character. The name of the region. Defaults to 'All regions'.

Value

Plot of relative risk and confidence intervals for each lag of wildfire-related PM2.5.

Plots attributable fractions and CI across years by regions

Description

Generates a PDF containing one or more plots of average attributable fractions over time. If by_region is TRUE, the function creates separate plots for each region. All plots are saved to a single PDF file named "aggregated_AF_by_region.pdf" in the specified output_dir.

Usage

plot_aggregated_AF(data, by_region = FALSE, output_dir = ".")

Arguments

data

A data frame containing annual attributable fraction estimates. Must include columns: year, average_attributable_fraction, lower_ci_attributable_fraction, upper_ci_attributable_fraction. If by_region is TRUE, must also include region.

by_region

Logical. If TRUE, plots are generated per region using region. Defaults to FALSE.

output_dir

Character. Directory path where the PDF file will be saved. Must exist. Defaults to ".".

Value

No return value. A PDF file is created.

Create a plot of aggregated annual attributable fractions with CI

Description

Aggregates annual average attributable fraction estimates and generates a ggplot showing the central estimate and CI.

Usage

plot_aggregated_AF_core(data, region_name = NULL)

Arguments

data

A data frame with columns: year, average_attributable_fraction, lower_ci_attributable_fraction, and upper_ci_attributable_fraction.

region_name

Optional character string used to label the plot title with a region name. Defaults to NULL.

Value

A ggplot object showing annual attributable rates with confidence intervals.

Combined AN and AR plots by region

Description

Creates both Attributable Number (AN) and Attributable Rate (AR) bar charts by region in a single function call.

Usage

plot_air_pollution_an_ar_by_region(
  analysis_results,
  max_lag = 14L,
  include_national = TRUE,
  output_dir = NULL,
  save_plot = FALSE
)

Arguments

analysis_results

Results from analyze_air_pollution_daily

max_lag

Integer. Maximum lag. Defaults to 14.

include_national

Logical. Whether to include national results. Default TRUE.

output_dir

Character. Directory to save plot

save_plot

Logical. Whether to save

Value

List with two ggplot objects: an_plot and ar_plot

Plot the AN and AR by year

Description

Creates both Attributable Number (AN) and Attributable Rate (AR) plots aggregated by year in a single function call.

Usage

plot_air_pollution_an_ar_by_year(
  analysis_results,
  max_lag = 14L,
  include_national = TRUE,
  output_dir = NULL,
  save_plot = FALSE
)

Arguments

analysis_results

Results from analyze_air_pollution_daily

max_lag

Integer. Maximum lag. Defaults to 14.

include_national

Logical. Whether to include national results. Default TRUE.

output_dir

Character. Directory to save plot

save_plot

Logical. Whether to save

Value

List with two ggplot objects: an_plot and ar_plot

Combined Monthly Time Series Plots of AN and AR

Description

Creates both Attributable Number (AN) and Attributable Rate (AR) monthly time series plots in a single function call.

Usage

plot_air_pollution_an_ar_monthly(
  analysis_results,
  max_lag = 14L,
  include_national = TRUE,
  output_dir = NULL,
  save_plot = FALSE
)

Arguments

analysis_results

Results from analyze_air_pollution_daily

max_lag

Integer. Maximum lag used in analysis. Defaults to 14.

include_national

Logical. Whether to include national results. Default TRUE.

output_dir

Character. Directory to save plot

save_plot

Logical. Whether to save the plot

Value

List with two ggplot objects: an_plot and ar_plot

Plot exposure-response relationship with confidence intervals by region

Description

Creates faceted exposure-response plots showing RR with confidence intervals across PM2.5 concentrations for each region

Usage

plot_air_pollution_exposure_response(
  analysis_results,
  max_lag = 14L,
  include_national = TRUE,
  ref_pm25 = 15,
  output_dir = NULL,
  save_plot = FALSE
)

Arguments

analysis_results

Processed results with RR/AF/AN/AR with lag variables

include_national

Logical. Whether to include national results. Default TRUE.

ref_pm25

Numeric. Reference PM2.5 value to highlight.

output_dir

Character. Directory to save plot.

save_plot

Logical. Whether to save the plot.

Value

ggplot object

Plot Relative Risk (RR) by lag

Description

Plot Relative Risk (RR) by lag

Usage

plot_air_pollution_forest_by_lag(
  analysis_results,
  max_lag = 14L,
  output_dir = NULL,
  save_plot = FALSE
)

Arguments

analysis_results

Processed results with RR/AF/AN/AR with lag variables

max_lag

Integer. Maximum lag days. Defaults to 14.

output_dir

Character. Directory to save plot. Defaults to NULL.

save_plot

Logical. Whether to save the plot. Defaults to FALSE.

Value

ggplot object

Plot forest plot for PM2.5 effects by region

Description

Plot forest plot for PM2.5 effects by region

Usage

plot_air_pollution_forest_by_region(
  analysis_results,
  max_lag = 14L,
  include_national = TRUE,
  output_dir = NULL,
  save_plot = FALSE
)

Arguments

analysis_results

Processed results with RR/AF/AN/AR with lag variables

max_lag

Integer. The maximum lag days for outdoor PM2.5. Defaults to 14.

include_national

Logical. Whether to include national results. Default TRUE.

output_dir

Character. Directory to save plot. Defaults to NULL.

save_plot

Logical. Whether to save the plot. Defaults to FALSE.

Value

ggplot object

Plot histograms for AN and AR by month

Description

Creates histogram plots for Attributable Number (AN) and Attributable Rate (AR) aggregated by month with connecting lines

Usage

plot_air_pollution_monthly_histograms(
  analysis_results,
  max_lag = 14L,
  include_national = TRUE,
  output_dir = NULL,
  save_plot = FALSE
)

Arguments

analysis_results

Processed results with RR/AF/AN/AR with lag variables

include_national

Logical. Whether to include national results. Default TRUE.

output_dir

Character. Directory to save plots.

save_plot

Logical. Whether to save the plots.

Value

List with ggplot objects

Plot Power vs PM2.5 Concentration

Description

Plots the power statistic for each reference PM2.5 at and above the attributable risk threshold for each region.

Usage

plot_air_pollution_power(
  power_list,
  output_dir = NULL,
  save_plot = FALSE,
  ref_name = "WHO",
  include_national = TRUE
)

Arguments

power_list

A list containing power information by region.

output_dir

Character. Directory to save plot. Defaults to NULL.

save_plot

Logical. Whether to save the plot. Defaults to FALSE.

ref_name

Character. Reference standard name for plot title.

include_national

Logical. Whether to include national level in the plot. Defaults to TRUE.

Value

Invisible list of plot information

Plot Total Attributable Number by Region

Description

Aggregates wildfire smoke-related PM2.5 attributable numbers by region and creates a bar plot showing the total attributable number of deaths per region.

Usage

plot_an_by_region(data, output_dir = ".")

Arguments

data

A data frame containing columns:

region: Region names.
total_attributable_number: Numeric values of attributable numbers.

output_dir

A character string specifying the directory where the plot will be saved. Defaults to the current working directory (".").

Value

A ggplot object representing the bar plot.

Plot Attributable Risk by Region

Description

Aggregates wildfire-specific PM2.5 attributable risk (deaths per 100k) by region and creates a bar plot showing the mean attributable risk per region.

Usage

plot_ar_by_region(data, output_dir = ".")

Arguments

data

A data frame containing columns:

region: Region names.
deaths_per_100k: Numeric values of deaths per 100k population.
lower_ci_deaths_per_100k: Lower bound of confidence interval.
upper_ci_deaths_per_100k: Upper bound of confidence interval.

output_dir

A character string specifying the directory where the plot will be saved. Defaults to the current working directory (".").

Value

A ggplot object representing the bar plot.

Plot monthly deaths and PM2.5 concentrations with dual y-axes

Description

Aggregates data by month and creates a dual-axis plot showing average deaths per 100,000 and mean PM2.5 concentrations.

Usage

plot_ar_pm_monthly(data, save_outputs = FALSE, output_dir = NULL)

Arguments

data

A data frame with columns: month, deaths_per_100k, and monthly_avg_pm25. Month names must match month.abb.

save_outputs

Logical. If TRUE, saves the plot as PNG and the aggregated data as CSV. Defaults to FALSE.

output_dir

Character. Directory path where outputs are saved if save_outputs is TRUE. Must exist. Defaults to NULL.

Value

No return value. Generates a plot and optionally saves files.

Plot Attributable Health Metrics Across Spatial and Temporal Levels

Description

Visualizes attributable health metrics (e.g., attributable number, fraction, or rate) derived from attribution_calculation() across different spatial scales and time periods. The function automatically adapts plots to the selected spatial level (country, region, or district) and handles both single- and multi-year visualizations. It supports faceted, grouped, or aggregated visualizations and can optionally save output plots as PDF files.

Usage

plot_attribution_metric(
  attr_data,
  level = c("country", "region", "district"),
  metrics = c("AR_Number", "AR_Fraction", "AR_per_100k"),
  filter_year = NULL,
  param_term,
  case_type,
  save_fig = FALSE,
  output_dir = NULL
)

Arguments

attr_data

A data frame or tibble containing attribution results, typically generated by the attribution_calculation() function. Must include relevant columns such as year, region, district, AR_Number, AR_Fraction, and AR_per_100k.

level

Character. The spatial level for plotting. One of "country", "region", or "district". Determines the type and granularity of plots.

metrics

Character vector specifying which metrics to plot. Options include "AR_Number", "AR_Fraction", and "AR_per_100k". Multiple metrics can be plotted.

filter_year

Optional integer or vector of integers to restrict the plots to specific years. Defaults to NULL (all available years are included).

param_term

Character. The exposure variable term to evaluate (e.g., "tmax" for maximum temperature, "rainfall" for precipitation). Used for labeling the plots.

case_type

Character. The type of disease that the case column refers to (e.g., "malaria" or "diarrhea"). Used in titles and y-axis labels.

save_fig

Logical. If TRUE, saves all generated plots as PDF files to the specified directory. Defaults to FALSE.

output_dir

Optional string. Directory path where output PDF files will be saved when save_fig = TRUE. If the directory does not exist, it will be created automatically.

Details

This function produces publication-ready plots of attributable metrics:

Country level: Time series line plots with 95% confidence ribbons.
Region/District level (no filter): Horizontal bar plots showing aggregated metrics, grouped by administrative unit.
Region/District level (multi-year): Grouped bar plots comparing metrics across years.

The function automatically adjusts y-axis limits, formats numeric labels with commas, and includes optional text annotations (e.g., showing both attributable numbers and fractions). When save_fig = TRUE, one PDF file is created per metric and spatial level, and each file may contain multiple pages if many regions or districts are present.

Value

A named list of ggplot or patchwork plot objects, grouped by metric. Each element corresponds to one metric ("AR_Number", "AR_Fraction", "AR_per_100k") and may include one or more plots, depending on the level and year filters.

Plot Average Monthly Attributable Health Metrics with Climate Overlays

Description

Visualizes average monthly attributable health metrics (e.g., attributable number, fraction, or rate) derived from attribution analyses across different spatial scales. The function automatically adapts plots to the selected spatial level (country, region, or district) and summarizes seasonal patterns using monthly aggregation. Optionally, corresponding monthly climate variables can be overlaid on a secondary axis to support joint interpretation of health impacts and climate seasonality.

Usage

plot_avg_monthly(
  attr_data,
  level = c("country", "region", "district"),
  metrics = c("AR_Number", "AR_per_100k", "AR_Fraction"),
  c_data,
  param_term,
  case_type,
  filter_year = NULL,
  save_fig = FALSE,
  output_dir = NULL
)

Arguments

attr_data

A data frame or tibble containing attributable health metrics, typically generated by an attribution workflow. Must include at least month and the selected metric column (AR_Number, AR_Fraction, or AR_per_100k), as well as spatial identifiers (region or district) when applicable.

level

Character. The spatial level for plotting. One of "country", "region", or "district". Determines whether a single national plot or multiple subnational plots are produced.

metrics

Character. The attributable metrics to visualize. One or more of "AR_Number", "AR_Fraction", or "AR_per_100k". Controls aggregation rules, axis labeling, and numeric formatting.

c_data

A data frame containing monthly climate variables corresponding to the same spatial and temporal resolution as attr_data. When provided together with param_term, climate information is overlaid on a secondary y-axis.

param_term

Character string specifying the climate exposure variable (e.g., "tmax" for maximum temperature or "rainfall" for precipitation). Used for climate extraction and axis labeling.

filter_year

Optional integer or vector of integers to restrict the analysis to specific years prior to monthly aggregation. Defaults to NULL, in which case all available years are included.

save_fig

Logical. If TRUE, saves the generated plots as a PDF file. Defaults to FALSE.

output_dir

Optional character string specifying the directory where output PDF files will be saved when save_fig = TRUE. The directory is created automatically if it does not exist.

Details

This function produces publication-ready visualizations of average monthly attributable health metrics:

Country level: A single bar plot summarizing national average monthly attribution patterns.
Region/District level: One bar plot per administrative unit, showing average monthly attribution, with automatic pagination when many units are present.
Climate overlay (optional): Monthly climate exposure plotted as a line on a secondary y-axis to facilitate comparison with seasonal health impacts.

Metric-specific aggregation rules (sum or mean) and numeric formatting are applied automatically. Axis limits and breaks are dynamically adjusted to improve readability. When save_fig = TRUE, a single PDF file is created per metric and spatial level, with multiple pages used for region- or district-level outputs when necessary.

Value

A named list of ggplot objects. Each element corresponds to the country or an individual region or district and contains a monthly attribution plot. The list is returned invisibly when plots are saved to file.

Plot a grid of box plots for multiple numeric variables

Description

Plot a grid of box plots for multiple numeric variables

Usage

plot_boxplots(
  df,
  columns = NULL,
  select_numeric = FALSE,
  title = "Boxplots",
  ylabs = NULL,
  save_plot = FALSE,
  output_path = NULL
)

Arguments

df

The dataframe containing the data

columns

A character vector of numeric column names to plot

select_numeric

If TRUE, all numeric columns in df will be selected for plotting.

title

The overall title for the plot

ylabs

A character vector of y-axis labels (e.g., with units) corresponding to the columns.

save_plot

Whether to save the plot as a PDF

output_path

The file path to save the PDF (if save_plot is TRUE)

Plot a correlation matrix include a heatmap.

Description

Plot a correlation matrix include a heatmap.

Usage

plot_correlation_matrix(matrix_, title, output_path)

Arguments

matrix_

The matrix to plot.

title

The title for the correlation matrix.

output_path

The path to output the plot to.

Plot histograms of column distributions.

Description

Plot histograms of column distributions.

Usage

plot_distributions(
  df,
  columns,
  title,
  xlabs = NULL,
  save_hists = FALSE,
  output_path = NULL
)

Arguments

df

The dataframe containing the data.

columns

The columns to plot distributions for.

title

The title of your plot.

xlabs

A character vector of x-axis labels (e.g., with units) corresponding to the columns.

save_hists

Whether to save the histograms to file.

output_path

The path to save your distributions to.

Plot Time Series of Health and Climate Variables

Description

Generate time series plots for combined health and climate data prepared for spatiotemporal and DLNM analysis. Supports aggregation at the country, region, or district level.

Usage

plot_health_climate_timeseries(
  data,
  param_term,
  level = "country",
  filter_year = NULL,
  case_type,
  save_fig = FALSE,
  output_dir = NULL
)

Arguments

data

A data frame containing the combined health and climate data.

param_term

Character. The variable to plot (e.g., tmax, tmean, tmin, Malaria). Use "all" to include all available variables.

level

Character. Aggregation level: one of "country", "region", or "district". Defaults to "country".

filter_year

Optional numeric vector to filter data by year(s). Defaults to NULL.

case_type

Character. The type of disease that the case column refers to. Must be one of 'diarrhea' or 'malaria'.

save_fig

Boolean. Whether to save the figure as a PDF. Defaults to FALSE.

output_dir

Character. Directory path to save the figure. Default to NULL

Value

A ggplot object.

Visualise monthly random effects for selected INLA model

Description

Generates and saves a plot of monthly random effects for different regions, visualizing their contribution to Malaria Incidence Rate.

Usage

plot_monthly_random_effects(
  combined_data,
  model,
  save_fig = FALSE,
  output_dir = NULL
)

Arguments

combined_data

Data list from combine_health_climate_data() function.

model

The fitted model object.

save_fig

Boolean. Whether to save the plot as an output. Defaults to FALSE.

output_dir

Character. The path to save the visualisation to. Defaults to NULL.

Value

THe monthly random effects plot.

Plot the moving average of a column.

Description

Plot the moving average of a column.

Usage

plot_moving_average(
  df,
  time_col,
  value_col,
  ma_days,
  ma_sides,
  title,
  save_plot = FALSE,
  output_path = "",
  units = NULL
)

Arguments

df

The dataframe containing the raw data.

time_col

The column name of the column containing the timeseries.

value_col

The column name of the column containing the value.

ma_days

The number of days to use for MA calculations.

ma_sides

The number of sides to use for MA calculations (1 or 2).

title

The title for your plot.

save_plot

Whether or not to save the plot.

output_path

The path to output the plot to.

units

A named character vector of units for each variable.

Plot the rate of a dependent variable per 100,000 population per year.

Description

Plot the rate of a dependent variable per 100,000 population per year.

Usage

plot_rate_overall(
  df,
  dependent_col,
  population_col,
  date_col,
  title,
  save_rate = FALSE,
  output_path = NULL
)

Arguments

df

The dataframe containing the data.

dependent_col

The name of the column representing the dependent variable.

population_col

The name of the column representing the population.

date_col

The name of the column containing date values.

title

Character. The specific title for the subset of data being used.

save_rate

Whether to save the plot as a PDF.

output_path

The file path to save the plot if save_rate is TRUE.

Plot regional trends of a climate and healthoutcome.

Description

Plot regional trends of a climate and healthoutcome.

Usage

plot_regional_trends(
  df,
  region_col,
  outcome_cols,
  title = "Regional Averages",
  ylabs = NULL,
  save_plot = FALSE,
  output_path = ""
)

Arguments

df

The dataframe containing the raw data.

region_col

The name of the column containing regions.

outcome_cols

Character Vector. The names of the outcome columns to analyse.

title

The title of your plot.

ylabs

A character vector of y-axis labels (e.g., with units) corresponding to the columns.

save_plot

Whether or not to save the plot.

output_path

The path to output the plot to.

Read in Relative Risk plot at country, Region, and District level

Description

Plots the relative risk of Malaria cases by the maximum temperature and cumulative rainfall at country, Region and District level

Usage

plot_relative_risk(
  data,
  model,
  param_term,
  max_lag,
  nk,
  level,
  case_type,
  filter_year = NULL,
  output_dir = NULL,
  save_csv = FALSE,
  save_fig = FALSE
)

Arguments

data

Data list from combine_health_climate_data() function.

model

The fitted model from run_inla_models() function.

param_term

A character vector or list containing parameter terms such as tmax (temperature exposure) and rainfall (rainfall exposure). Default to tmax.

level

A character vector specifying the geographical disaggregation. Can take one of the following values: country, region, or district. Default to country.

case_type

Character. The type of disease that the case column refers to. Must be one of "diarrhea" or "malaria".

filter_year

Integer. The year to filter to data to. This gives the possibility to user to have the plot for a specific year. When Defaults to NULL, it provides the plot by grouping all the years in the dataset.

output_dir

Character. The path where the PDF file will be saved. Default to NULL.

save_csv

Boolean. If TRUE, saves the RR data to the specified directory. Defaults to FALSE.

save_fig

Boolean. If TRUE, saves the plot to the specified directory. Defaults to FALSE.

Value

Relative risk plot at country, region, and district levels.

Plot relative risk by PM2.5 levels for all regions and individually

Description

Generates one or more plots showing relative risk estimates across PM2.5 levels. If multiple regions are present, plots are created per region and for all regions combined. Optionally saves the output as a PDF.

Usage

plot_rr_by_pm(data, save_fig = FALSE, output_dir = NULL)

Arguments

data

A data frame with columns: pm_levels, relative_risk, ci_lower, ci_upper, and region.

save_fig

Logical. If TRUE, saves the plot(s) as a PDF file in output_dir. Defaults to FALSE.

output_dir

Character. Directory path where the PDF file will be saved if save_fig is TRUE. Must exist. Defaults to NULL.

Value

No return value. Generates one or more plots and optionally saves them to disk.

Create a relative risk plot across PM2.5 levels for a single region

Description

Generates a ggplot showing relative risk estimates and confidence intervals across PM2.5 levels for a given region.

Usage

plot_rr_by_pm_core(data, region_name = "All Regions", ylims = c(-2, 2))

Arguments

data

A data frame with columns: pm_levels, relative_risk, ci_lower, and ci_upper.

region_name

Optional character string used to label the plot title with a region name. Defaults to "All Regions".

ylims

Numeric vector of length 2 specifying y-axis limits. Defaults to c(-2, 2).

Value

A ggplot object showing relative risk and CI.

Plot Relative Risk Map at sub-national Level

Description

Generates a map of the relative risk of the diseases cases associated with climate hazards, including extreme temperature and cumulative rainfall, at a specified geographical level (district or region).

Usage

plot_rr_map(
  combined_data,
  model,
  param_term = "tmax",
  max_lag,
  nk,
  level = "district",
  case_type,
  filter_year = NULL,
  output_dir = NULL,
  save_fig = FALSE,
  save_csv = FALSE,
  cumulative = FALSE
)

Arguments

combined_data

A list returned from the combine_health_climate_data() function. This list should include both the health-climate data and the map data.

model

The fitted model object returned from the run_inla_models() function.

param_term

A character vector or list specifying the climate parameters (e.g., tmax for maximum temperature, rainfall for precipitation) to include in the map. Defaults to tmax.

level

A character string indicating the spatial aggregation level. Options are region or district. Defaults to district.

case_type

Character. The type of disease that the case column refers to. Must be one of diarrhea or malaria.

filter_year

Integer. The year to filter to data to. Defaults to NULL.

output_dir

Character. The directory path where the output PDF file should be saved. Defaults to NULL.

save_fig

Boolean. If TRUE, saves the plot to the specified directory. Defaults to FALSE.

cumulative

Boolean. If TRUE, plot and save cumulative risk of all year for the specific exposure at region and district level. Defaults to FALSE.

Value

Relative risk map at the chosen level.

Plot a grid of scatter graphs comparing one column to various others.

Description

Plot a grid of scatter graphs comparing one column to various others.

Usage

plot_scatter_grid(
  df,
  main_col,
  comparison_cols,
  title,
  save_scatters = FALSE,
  output_path = "",
  units = NULL
)

Arguments

df

The dataframe containing the raw data.

main_col

The main column to compare with all other columns.

comparison_cols

The columns to compare with.

title

The title of your plot.

save_scatters

Whether or not to save the plot.

output_path

The path to output the plot to.

units

A named character vector of units for each variable.

Plot seasonal trends of a health outcome and climate by month.

Description

Plot seasonal trends of a health outcome and climate by month.

Usage

plot_seasonal_trends(
  df,
  date_col,
  outcome_cols,
  title = "Seasonal Averages",
  ylabs = NULL,
  save_plot = FALSE,
  output_path = ""
)

Arguments

df

The dataframe containing the raw data.

date_col

The name of the column containing date values.

outcome_cols

Character Vector. The names of the outcome columns to analyse.

title

The title of your plot.

ylabs

A character vector of y-axis labels (e.g., with units) corresponding to the columns.

save_plot

Whether or not to save the plot.

output_path

The path to output the plot to.

Plot the total of selected variables per year.

Description

Plot the total of selected variables per year.

Usage

plot_total_variables_by_year(
  df,
  date_col,
  variables,
  title,
  save_total = FALSE,
  output_path = ""
)

Arguments

df

A dataframe containing the data.

date_col

The name of the column containing date values.

variables

Column names to be summed and plotted.

title

Character. The specific title for the subset of data being used.

save_total

if TRUE, saves each plot as a PDF.

output_path

The file path for saving plots.

Value

Plots are printed to the console or saved as PDF files.

Visualize yearly spatial random effect of the Diseases Incidence Rate (DIR).

Description

Generates and saves plots of yearly spatial random effect of the diseases incidence rate at district level.

Usage

plot_yearly_spatial_random_effect(
  combined_data,
  model,
  case_type,
  save_fig = FALSE,
  output_dir = NULL
)

Arguments

combined_data

Data list ⁠from combine_health_climate_data()⁠ function.

model

The fitted model from run_inla_models() function.

case_type

Character. The type of disease that the case column refers to. Must be one of diarrhea or malaria.

save_fig

Boolean. Whether to save the plot as an output. Defaults to FALSE.

output_dir

Character. The path to save the fitted model results to. Defaults to NULL.

Value

The yearly space random effect for the disease incidence rate plot.

Normalise descriptive stats data input to combined and regional dataframes.

Description

Normalise descriptive stats data input to combined and regional dataframes.

Usage

prepare_descriptive_input(
  data,
  aggregation_column = NULL,
  timeseries_col = NULL
)

Arguments

data

Dataframe or list of dataframes.

aggregation_column

Character. Region column for splitting dataframes.

timeseries_col

Character. Date column for timeseries analysis.

Value

A list with combined_df and region_df_list.

Validate and prepare base output directory for descriptive stats.

Description

Validate and prepare base output directory for descriptive stats.

Usage

prepare_descriptive_output_dir(output_path, create_base_dir = FALSE)

Arguments

output_path

Character. Base output path.

create_base_dir

Logical. Whether to create a missing base directory.

Value

Character. Validated output path.

Raise an Error if a Parameter's Value is NULL

Description

Raise an Error if a Parameter's Value is NULL

Usage

raise_if_null(param_nm, value)

Arguments

param_nm

Character. The parameter name.

value

Any. The value of the parameter.

Value

None. Stops execution if value is NULL.

Code for producing analysis for health effects of extreme weather events - wildfires Read in and format health data

Description

Reads in a CSV file for a daily time series of health and climate data, renames columns to standard names. Creates columns for day of week, month, and year columns derived from the date.

Usage

read_and_format_data(
  health_path,
  date_col,
  mean_temperature_col,
  health_outcome_col,
  population_col = NULL,
  region_col = NULL,
  rh_col = NULL,
  wind_speed_col = NULL
)

Arguments

health_path

Path to a CSV file containing a daily time series of data for a particular health outcome and climate variables, which may be disaggregated by region.

date_col

Character. Name of the column in the dataframe that contains the date. Date column should be in YYYY-MM-DD or YYYY/MM/DD format.

mean_temperature_col

Character. Name of the column in the dataframe that contains the daily mean temperature column.

health_outcome_col

Character. Name of the column in the dataframe that contains the daily health outcome count (e.g. number of deaths, hospital admissions)

population_col

Character. Name of the column in the dataframe that contains the population data. Defaults to NULL. If omitted, a pop column is used when present.

region_col

Character. Name of the column in the dataframe that contains the region names. Defaults to NULL.

rh_col

Character. Name of the column in the dataframe that contains daily relative humidity values. Defaults to NULL.

wind_speed_col

Character. Name of the column in the dataframe that contains daily wind speed. Defaults to NULL.

Value

Dataframe with formatted and renamed columns

Read a csv file into memory as a data frame.

Description

Read a csv file into memory as a data frame.

Usage

read_input_data(input_csv_path)

Arguments

input_csv_path

The path to the csv to read as a dataframe.

Value

A dataframe containing the data from the csv.

Examples

input_csv_path <- "directory/file_name.csv"

Reformat a dataframe using various different cleaning techniques.

Description

Take a dataframe, and apply various different cleaning methods to it in order to prepare the data for use with a climate indicator.

Usage

reformat_data(df, reformat_date = TRUE, fill_na = c(), year_from_date = TRUE)

Arguments

df

The dataframe to apply cleaning/reformatting to.

reformat_date

Whether or not to reformat the data to the Date datatype.

fill_na

A vector of column names to fill NA values in (fills with 0).

year_from_date

Derive a new column 'year' from the date column.

Value

The cleaned/reformatted data frame.

Run generic descriptive statistics and EDA outputs for indicator datasets.

Description

Run generic descriptive statistics and EDA outputs for indicator datasets.

Usage

run_descriptive_stats(
  data,
  output_path,
  aggregation_column = NULL,
  population_col = NULL,
  plot_corr_matrix = FALSE,
  correlation_method = "pearson",
  plot_dist = FALSE,
  plot_ma = FALSE,
  ma_days = 100,
  ma_sides = 1,
  timeseries_col = NULL,
  dependent_col,
  independent_cols,
  units = NULL,
  plot_na_counts = FALSE,
  plot_scatter = FALSE,
  plot_box = FALSE,
  plot_seasonal = FALSE,
  plot_regional = FALSE,
  plot_total = FALSE,
  detect_outliers = FALSE,
  calculate_rate = FALSE,
  run_id = NULL,
  create_base_dir = FALSE
)

Arguments

data

Dataframe or named list of dataframes. If a dataframe is provided and aggregation_column is passed, data are split by that column.

output_path

Character. Base output directory.

aggregation_column

Character. Column used to aggregate/split data by region.

population_col

Character. The column containing population data.

plot_corr_matrix

Logical. Whether to plot correlation matrix.

correlation_method

Character. Correlation method. One of 'pearson', 'spearman', 'kendall'.

plot_dist

Logical. Whether to plot distribution histograms.

plot_ma

Logical. Whether to plot moving averages over a timeseries.

ma_days

Integer. Number of days to use for moving average.

ma_sides

Integer. Sides to use for moving average (1 or 2).

timeseries_col

Character. Timeseries column used for moving averages and time-based plots.

dependent_col

Character. Dependent variable column.

independent_cols

Character vector. Independent variable columns.

units

Named character vector. Units for variables.

plot_na_counts

Logical. Whether to plot NA counts.

plot_scatter

Logical. Whether to plot scatter plots.

plot_box

Logical. Whether to plot box plots.

plot_seasonal

Logical. Whether to plot seasonal trends.

plot_regional

Logical. Whether to plot regional trends.

plot_total

Logical. Whether to plot total health outcomes by year.

detect_outliers

Logical. Whether to output an outlier table.

calculate_rate

Logical. Whether to plot annual rates per 100k.

run_id

Character. Optional run id. If NULL, a timestamped id is generated.

create_base_dir

Logical. Whether to create output_path if missing.

Value

A list with base_output_path, run_id, run_output_path, and region_output_paths.

Examples


df <- data.frame(
  date = as.Date("2024-01-01") + 0:29,
  region = rep(c("A", "B"), each = 15),
  outcome = sample(1:20, 30, replace = TRUE),
  temp = rnorm(30, 25, 3)
)

run_descriptive_stats(
  data = df,
  output_path = tempdir(),
  aggregation_column = "region",
  dependent_col = "outcome",
  independent_cols = c("temp"),
  timeseries_col = "date",
  run_id = NULL
)

Create descriptive statistics via API-friendly inputs.

Description

Create descriptive statistics via API-friendly inputs.

Usage

run_descriptive_stats_api(
  data,
  output_path,
  aggregation_column = NULL,
  population_col = NULL,
  dependent_col,
  independent_cols,
  units = NULL,
  plot_corr_matrix = FALSE,
  plot_dist = FALSE,
  plot_ma = FALSE,
  plot_na_counts = FALSE,
  plot_scatter = FALSE,
  plot_box = FALSE,
  plot_seasonal = FALSE,
  plot_regional = FALSE,
  plot_total = FALSE,
  correlation_method = "pearson",
  ma_days = 100,
  ma_sides = 1,
  timeseries_col = NULL,
  detect_outliers = FALSE,
  calculate_rate = FALSE,
  run_id = NULL,
  create_base_dir = TRUE
)

Arguments

data

The dataset for descriptive stats (list-like object or CSV path).

output_path

Character. Base output directory.

aggregation_column

Character. Column used to aggregate/split data by region.

population_col

Character. The column containing the population.

dependent_col

Character. The dependent column.

independent_cols

Character vector. The independent columns.

units

Named character vector. Units for each variable.

plot_corr_matrix

Logical. Whether to plot a correlation matrix.

plot_dist

Logical. Whether to plot histograms.

plot_ma

Logical. Whether to plot moving averages over a timeseries.

plot_na_counts

Logical. Whether to plot counts of NAs in each column.

plot_scatter

Logical. Whether to plot dependent vs independent columns.

plot_box

Logical. Whether to generate box plots for selected columns.

plot_seasonal

Logical. Whether to plot seasonal trends.

plot_regional

Logical. Whether to plot regional trends.

plot_total

Logical. Whether to plot total dependent values per year.

correlation_method

Character. Correlation method. One of 'pearson', 'spearman', 'kendall'.

ma_days

Integer. Number of days used in moving average calculations.

ma_sides

Integer. Number of sides used in moving average calculations (1 or 2).

timeseries_col

Character. Timeseries column.

detect_outliers

Logical. Whether to output an outlier table.

calculate_rate

Logical. Whether to plot annual rates per 100k.

run_id

Character. Optional run id.

create_base_dir

Logical. Whether to create output_path if missing. Defaults to TRUE.

Value

A list with base_output_path, run_id, run_output_path, and region_output_paths.

Examples


run_descriptive_stats_api(
  data = list(
    date = as.character(as.Date("2024-01-01") + 0:29),
    region = rep(c("A", "B"), each = 15),
    outcome = sample(1:20, 30, replace = TRUE),
    temp = rnorm(30, 25, 3)
  ),
  output_path = tempdir(),
  aggregation_column = "region",
  dependent_col = "outcome",
  independent_cols = c("temp"),
  timeseries_col = "date",
  plot_corr_matrix = TRUE
)

Run models of increasing complexity in INLA: Fit a baseline model including spatiotemporal random effects.

Description

: Create and run multiple INLA (Integrated Nested Laplace Approximation) models to the dataset, evaluates them using DIC (Deviance Information Criterion), and identifies the best-fitting model.

Usage

run_inla_models(
  combined_data,
  basis_matrices_choices,
  inla_param,
  max_lag,
  nk,
  case_type,
  output_dir = NULL,
  save_model = FALSE,
  family = "nbinomial",
  config = FALSE
)

Arguments

combined_data

A dataframe resulting from combine_health_climate_data() function.

basis_matrices_choices

A character vector specifying the basis matrix parameters to be included in the model. Possible values are tmax, and rainfall.

inla_param

A character vector specifying the confounding exposures to be included in the model. Possible values are tmax,tmin, rainfall, r_humidity, runoff, ndvi, Etc.

case_type

Character. The type of disease that the case column refers to. Must be one of diarrhea or malaria.

output_dir

Character. The path to save model output to. Defaults to NULL.

save_model

Boolean. Whether to save the results as a CSV. Defaults to FALSE.

family

Character. The probability distribution for the response variable. The user may also have the possibility to choose nbinomial for a negative binomial distribution. Defaults topoisson.

config

Boolean. Enable additional model configurations. Defaults to FALSE.

Value

A list containing the model, baseline_model, and the dic_table.

Save air pollution plot with standardized dimensions

Description

Save air pollution plot with standardized dimensions

Usage

save_air_pollution_plot(plot_object, output_dir, filename)

Arguments

plot_object

ggplot or grob object to save

output_dir

Character. Directory to save plot.

filename

Character. Name of the file (without or with .png extension).

Value

Invisibly returns the output path

Save results of wildfire related analysis

Description

Saves a CSV file of relative risk and confidence intervals for each lag value of wildfire-related PM2.5. Also optionally save results of attributable numbers/fractions.

Usage

save_wildfire_results(
  rr_results,
  an_ar_results = NULL,
  annual_af_an_results = NULL,
  output_folder_path
)

Arguments

rr_results

Dataframe of relative risk and confidence intervals for each lag of wildfire-related PM2.5.

an_ar_results

Dataframe containing attributable number/fraction results. Defaults to NULL.

output_folder_path

Path to folder where results should be saved.

Create a cross-basis matrix set for DLNM analysis

Description

Generates cross-basis matrices for lagged climate variables in a dataset, for use in Distributed Lag Nonlinear Models (DLNM).

Usage

set_cross_basis(data, max_lag = 2, nk = 2)

Arguments

data

A dataset returned from combine_health_climate_data(), including lagged variables like tmax_lag1, tmin_lag1, etc.

max_lag

Character. Number corresponding to the maximum lag to be considered for the delay effect. It should be between 2 an 4. Defaults to 2.

nk

Value

A list of cross-basis matrices including the basis matrix for maximum temperature, minimun temperature, cumulative rainfall, and relative humidity.

Suggest a column name based on fuzzy matching

Description

Uses Jaro-Winkler distance to find the closest match to a misspelled or incorrect column name.

Usage

suggest_column_match(input, available, threshold = 0.3)

Arguments

input

The column name that was not found

available

Character vector of available column names

threshold

Maximum distance threshold (0-1). Lower = stricter matching.

Value

The best matching column name, or NULL if no good match found.

Full analysis pipeline for the suicides and extreme heat indicator

Description

Runs the full pipeline to analyse the impact of extreme heat on suicides using a time-stratified case-crossover approach with distributed lag non-linear model. This function generates relative risk of the suicide-temperature association as well as attributable numbers, rates and fractions of suicides to a specified temperature threshold. Model validation statistics are also provided.

Usage

suicides_heat_do_analysis(
  data_path,
  date_col,
  region_col = NULL,
  temperature_col,
  health_outcome_col,
  population_col,
  country = "National",
  meta_analysis = FALSE,
  var_fun = "bs",
  var_degree = 2,
  var_per = c(25, 50, 75),
  lag_fun = "strata",
  lag_breaks = 1,
  lag_days = 2,
  independent_cols = NULL,
  control_cols = NULL,
  cenper = 50,
  attr_thr = 97.5,
  save_fig = FALSE,
  save_csv = FALSE,
  output_folder_path = NULL,
  seed = NULL
)

Arguments

data_path

Path to a csv file containing a daily time series of data for a particular health outcome and climate variables, which may be disaggregated by region.

date_col

Character. Name of the column in the dataframe that contains the date.

region_col

Character. Name of the column in the dataframe that contains the region names. Defaults to NULL.

temperature_col

Character. Name of the column in the dataframe that contains the temperature column.

health_outcome_col

Character. Name of the column in the dataframe that contains the health outcome count column (e.g. number of deaths, hospital admissions).

population_col

Character. Name of the column in the dataframe that contains the population estimate coloumn.

country

Character. Name of country for national level estimates.

meta_analysis

Boolean. Whether to perform a meta-analysis.

var_fun

Character. Exposure function for argvar (see dlnm::crossbasis). Defaults to 'bs'.

var_degree

Integer. Degree of the piecewise polynomial for argvar (see dlnm:crossbasis). Defaults to 2 (quadratic).

var_per

Vector. Internal knot positions for argvar (see dlnm::crossbasis). Defaults to c(25,50,75).

lag_fun

Character. Exposure function for arglag (see dlnm::crossbasis). Defaults to 'strata'.

lag_breaks

Integer. Internal cut-off point defining the strata for arglag (see dlnm:crossbasis). Defaults to 1.

lag_days

Integer. Maximum lag. Defaults to 2. (see dlnm:crossbasis).

independent_cols

Additional independent variables to test in model validation

control_cols

A list of confounders to include in the final model adjustment. Defaults to NULL if none.

cenper

Integer. Value for the percentile in calculating the centering value 0-100. Defaults to 50.

attr_thr

Integer. Percentile at which to define the temperature threshold for calculating attributable risk.

save_fig

Boolean. Whether to save the plot as an output. Defaults to FALSE.

save_csv

Boolean. Whether to save the results as a CSV. Defaults to FALSE.

output_folder_path

Path to folder where plots and/or CSV should be saved. Defaults to NULL.

seed

Optional integer random seed used when sampling residuals for model validation plots. Defaults to NULL.

Details

This analysis pipeline requires a daily time series of temperature and suicide deaths with population values as a minimum. This is then processed using a conditional Poisson case-crossover analysis with distributed lag non-linear model and optional meta-analysis. Meta-analysis is recommended if the input data is disaggregated by area.

The model parameters have default values, which are recommended to keep as based on existing studies. However, if desired these can be adjusted for sensitivity analysis.

Model validation testing is provided as a standard output from the pipeline so a user can assess the quality of the model. If a user has additional independent variables these can be specified as independent_cols and assessed within different model combinations in the outputs of this testing. These can be added in the final model via control_cols.

For attributable deaths the default is to use extreme heat as a threshold, defined as the 97.5th percentile of temperature over the corresponding time period for each geography. This can be adjusted if desired, following review of the relative risk association between temperature and suicides, using attr_thr.

Further details on the input data requirements, methodology, quality information and guidance on interpreting outputs can be found in the accompanying published doi:10.5281/zenodo.14050224.

Value

qaic_results A dataframe of QAIC and dispersion metrics for each model combination and geography.
qaic_summary A dataframe with the mean QAIC and dispersion metrics for each model combination.
vif_results A dataframe. Variance inflation factors for each independent variables by region.
vif_summary A dataframe with the mean variance inflation factors for each independent variable.
meta_test_res A dataframe of results from statistical tests on the meta model.
power_list A list containing power information by area.
rr_results Dataframe containing cumulative relative risk and confidence intervals from analysis.
res_attr_tot Dataframe. Total attributable fractions, numbers and rates for each area over the whole time series.
attr_yr_list List. Dataframes containing yearly estimates of attributable fractions, numbers and rates by area.
attr_mth_list List. Dataframes containing total attributable fractions, numbers and rates by calendar month and area.

References

Pearce M, Watkins E, Glickman M, Lewis B, Ingole V. Standards for Official Statistics on Climate-Health Interactions (SOSCHI): Suicides attributed to extreme heat: methodology. Zenodo; 2024. Available from: doi:10.5281/zenodo.14050224
Gasparrini A, Guo Y, Hashizume M, Lavigne E, Zanobetti A, Schwartz J, et al. Mortality risk attributable to high and low ambient temperature: a multicountry observational study. Lancet. 2015 Jul;386(9991):369-75. Available from: https://linkinghub.elsevier.com/retrieve/pii/S0140673614621140
Kim Y, Kim H, Gasparrini A, Armstrong B, Honda Y, Chung Y, et al. Suicide and Ambient Temperature: A Multi-Country Multi-City Study. Environ Health Perspect. 2019 Nov;127(11):1-10. Available from: https://pubmed.ncbi.nlm.nih.gov/31769300/
Gasparrini A, Armstrong B. Reducing and meta-analysing estimates from distributed lag non-linear models. BMC Med Res Methodol. 2013 Jan 9;13:1. Available from: doi:10.1186/1471-2288-13-1
Gasparrini A, Armstrong B, Kenward MG. Multivariate meta-analysis for non-linear and other multi-parameter associations. Stat Med. 2012 Dec 20;31(29):3821-39. Available from: doi:10.1002/sim.5471
Sera F, Armstrong B, Blangiardo M, Gasparrini A. An extended mixed-effects framework for meta-analysis. Stat Med. 2019 Dec 20;38(29):5429-44. Available from: doi:10.1002/sim.8362
Gasparrini A, Leone M. Attributable risk from distributed lag models. BMC Med Res Methodol. 2014 Dec 23;14(1):55. Available from: https://link.springer.com/article/10.1186/1471-2288-14-55

Examples


example_data <- data.frame(
  date = seq.Date(as.Date("2020-01-01"), by = "day", length.out = 365),
  region = "Example Region",
  tmean = stats::runif(365, 5, 30),
  suicides = stats::rpois(365, lambda = 2),
  pop = 250000
)
example_path <- tempfile(fileext = ".csv")
utils::write.csv(example_data, example_path, row.names = FALSE)

suicides_heat_do_analysis(
  data_path = example_path,
  date_col = "date",
  region_col = "region",
  temperature_col = "tmean",
  health_outcome_col = "suicides",
  population_col = "pop",
  country = "Example Region",
  meta_analysis = FALSE,
  var_fun = "bs",
  var_degree = 2,
  var_per = c(25, 50, 75),
  lag_fun = "strata",
  lag_breaks = 1,
  lag_days = 2,
  independent_cols = NULL,
  control_cols = NULL,
  cenper = 50,
  attr_thr = 97.5,
  save_fig = FALSE,
  save_csv = FALSE,
  output_folder_path = tempdir()
)

Summarise AF and AN numbers by region and year

Description

Takes daily data with attributable fraction and attributable number and summarises by year and region.

Usage

summarise_AF_AN(data, monthly = TRUE)

Arguments

data

Dataframe containing daily data including calculated AF and AN.

monthly

Bool. Whether to summarise by month as well as year and region. Defaults to TRUE.

Value

Dataframe containing summarised AF and AN data, by year, region and optionall month (if monthly == TRUE).

Full analysis for the 'mortality attributable to high and low temperatures' indicator

Description

Runs the full methodology to analyse the impact of high and low temperatures on mortality using a quasi-Poisson time series approach with a distributed lag non-linear model. This function generates the relative risk of the temperature-mortality association as well as attributable numbers, rates and fractions of mortalities to specified temperature thresholds for high and low temperatures. Model validation statistics are also provided.

Usage

temp_mortality_do_analysis(
  data_path,
  date_col,
  region_col,
  temperature_col,
  dependent_col,
  population_col,
  country = "National",
  independent_cols = NULL,
  control_cols = NULL,
  var_fun = "bs",
  var_degree = 2,
  var_per = c(10, 75, 90),
  lagn = 21,
  lagnk = 3,
  dfseas = 8,
  meta_analysis = FALSE,
  attr_thr_high = 97.5,
  attr_thr_low = 2.5,
  save_fig = FALSE,
  save_csv = FALSE,
  output_folder_path = NULL,
  seed = NULL
)

Arguments

data_path

Path to a csv file containing a daily time series of data for a particular health outcome and climate variables, which may be disaggregated by geography.

date_col

Character. Name of the column in the dataframe containing the date.

region_col

Character. Name of the column in the dataframe that contains the geography name(s).

temperature_col

Character. Name of the column in the dataframe that contains the temperature column.

dependent_col

Character. Name of the column in the dataframe containing the dependent health outcome variable e.g. deaths.

population_col

Character. Name of the column in the dataframe that contains the population estimate per geography.

country

Character. Name of country for national-level estimates. Defaults to 'National'.

independent_cols

List. Additional independent variables to test in model validation as confounders. Defaults to NULL.

control_cols

List. Confounders to include in the final model adjustment. Defaults to NULL.

var_fun

Character. Exposure function for argvar (see dlnm::crossbasis). Defaults to 'bs'.

var_degree

Integer. Degree of the piecewise polynomial for argvar (see dlnm:crossbasis). Defaults to 2 (quadratic).

var_per

Vector. Internal knot positions for argvar (see dlnm::crossbasis). Defaults to c(10, 75, 90).

lagn

Integer. Number of days in the lag period. Defaults to 21. (see dlnm::crossbasis).

lagnk

Integer. Number of knots in lag function. Defaults to 3. (see dlnm::logknots).

dfseas

Integer. Degrees of freedom for seasonality. Defaults to 8.

meta_analysis

Boolean. Whether to perform a meta-analysis. Defaults to FALSE.

attr_thr_high

Integer. Percentile at which to define the high temperature threshold for calculating attributable risk. Defaults to 97.5.

attr_thr_low

Integer. Percentile at which to define the low temperature threshold for calculating attributable risk. Defaults to 2.5.

save_fig

Boolean. Whether to save the plot as an output. Defaults to FALSE.

save_csv

Boolean. Whether to save the results as a CSV. Defaults to FALSE.

output_folder_path

Path to folder where plots and/or CSV should be saved. Defaults to NULL.

seed

Optional integer random seed used when sampling residuals for model validation plots. Defaults to NULL.

Details

This analysis requires a daily time series of temperature and death counts with population values as a minimum. This is then processed using a quasi-Poisson time series regression analysis with a distributed lag non-linear model and optional meta-analysis. Meta-analysis is recommended if the input data is disaggregated by area.

The model parameters have default values, which are recommended to keep as based on existing studies. However, if desired these can be adjusted for if appropriate for the user's context.

For attributable deaths the default is to use a high temperature threshold, defined as the 97.5th percentile of the temperature distribution over the full time period for each geography. The low temperature thresholds is similarly defined at the 2.5th percentile. These can be adjusted if desired, following review of the relative risk association between temperature and mortality using attr_thr_high or attr_thr_low.

Further details on the input data requirements, methodology, quality information and guidance on interpreting outputs can be found in the accompanying published doi:10.5281/zenodo.14865904.

Value

qaic_results Dataframe. QAIC and dispersion metrics for each model combination and geography.
qaic_summary Dataframe. Mean QAIC and dispersion metrics for each model combination.
vif_results Dataframe. Variance inflation factors for each independent variables by geography.
vif_summary Dataframe. Mean variance inflation factors for each independent variable.
adf_results Dataframe. ADF test results for each geography.
power_list List. Power information by area.
rr_results Dataframe containing cumulative relative risk and confidence intervals from analysis.
res_attr_tot Dataframe. Total attributable fractions, numbers and rates for each area over the whole time series.
attr_yr_list List. Dataframes containing yearly estimates of attributable fractions, numbers and rates by area.
attr_mth_list List. Dataframes containing total attributable fractions, numbers and rates by calendar month and area.

References

Watkins E, Hunt C, Lewis B, Ingole V, Glickman M. Standards for Official Statistics on Climate-Health Interactions (SOSCHI): Mortality attributed to high and low temperatures: methodology. Zenodo; 2026. Available from: doi:10.5281/zenodo.14865904
Gasparrini A, Guo Y, Hashizume M, Lavigne E, Zanobetti A, Schwartz J, et al. Mortality risk attributable to high and low ambient temperature: a multicountry observational study. Lancet. 2015 Jul;386(9991):369-75. Available from: https://linkinghub.elsevier.com/retrieve/pii/S0140673614621140
Gasparrini A, Armstrong B. Reducing and meta-analysing estimates from distributed lag non-linear models. BMC Medical Research Methodology. 2013 Jan 9;13:1. Available from: doi:10.1186/1471-2288-13-1
Gasparrini A, Armstrong B, Kenward MG. Multivariate meta-analysis for non-linear and other multi-parameter associations. Statistics in Medicine. 2012 Dec 20;31(29):3821-39. Available from: doi:10.1002/sim.5471

Examples


example_data <- data.frame(
  date = seq.Date(as.Date("2020-01-01"), by = "day", length.out = 365),
  region = "Example Region",
  tmean = stats::runif(365, -2, 32),
  deaths = stats::rpois(365, lambda = 8),
  pop = 500000
)
example_path <- tempfile(fileext = ".csv")
utils::write.csv(example_data, example_path, row.names = FALSE)

temp_mortality_do_analysis(
  data_path = example_path,
  date_col = "date",
  temperature_col = "tmean",
  dependent_col = "deaths",
  population_col = "pop",
  region_col = "region",
  country = "Example Region",
  meta_analysis = FALSE,
  independent_cols = NULL,
  control_cols = NULL,
  var_fun = "bs",
  var_degree = 2,
  var_per = c(10, 75, 90),
  lagn = 7,
  lagnk = 2,
  dfseas = 4,
  attr_thr_high = 97.5,
  attr_thr_low = 2.5,
  save_fig = FALSE,
  save_csv = FALSE,
  output_folder_path = tempdir()
)

Stratify data by time period

Description

Adds columns for strata for each region:year:month:dayofweek and for the total counts of a health outcome across days in each stratum.

Usage

time_stratify(data)

Arguments

data

Dataframe containing a daily time series of climate and health data. Assumes that 'data' has a 'month', 'year', 'dow' and 'region' column.

Value

Dataframe with additional columns for stratum (region:year:month:dayofweek) and for the total counts of a health outcome across days in each stratum.

Ensure that the `case_type` parameter is valid

Description

Ensures that the case_type parameter is either malaria or diarrhea to comply with supported indicators.

Usage

validate_case_type(case_type)

Arguments

case_type

Character. The value of the case_type parameter.

Value

Character. The lower case_type.

Preflight validation for descriptive statistics columns based on enabled features.

Description

Preflight validation for descriptive statistics columns based on enabled features.

Usage

validate_descriptive_columns(
  df,
  context = "dataset",
  dependent_col,
  independent_cols,
  aggregation_column = NULL,
  population_col = NULL,
  timeseries_col = NULL,
  plot_corr_matrix = FALSE,
  plot_dist = FALSE,
  plot_ma = FALSE,
  plot_scatter = FALSE,
  plot_box = FALSE,
  plot_seasonal = FALSE,
  plot_regional = FALSE,
  plot_total = FALSE,
  write_outlier_table = FALSE,
  calculate_rate = FALSE,
  is_full_dataset = FALSE
)

Arguments

df

Dataframe. Dataset to validate.

context

Character. Context label for error messages.

dependent_col

Character. Dependent column.

independent_cols

Character vector. Independent columns.

aggregation_column

Character. Region aggregation column.

population_col

Character. Population column.

timeseries_col

Character. Timeseries column.

plot_corr_matrix

Logical. Correlation matrix toggle.

plot_dist

Logical. Distribution plot toggle.

plot_ma

Logical. Moving average toggle.

plot_scatter

Logical. Scatter plot toggle.

plot_box

Logical. Boxplot toggle.

plot_seasonal

Logical. Seasonal plot toggle.

plot_regional

Logical. Regional plot toggle.

plot_total

Logical. Total-by-year plot toggle.

write_outlier_table

Logical. Outlier table toggle.

calculate_rate

Logical. Rate plot toggle.

is_full_dataset

Logical. Whether this dataset is the full combined dataset.

Value

None. Stops execution if required columns/params are missing.

This is full analysis pipeline to analyse the impact of wildfire-related PM2.5 on a health outcome.

Description

Runs full analysis pipeline for analysis of the impact of wildfire-related PM2.5 on a health outcome using time stratified case-crossover approach with conditional quasi-Poisson regression model. This function generates relative risk of the mortality associated to wildfire-related PM2.5 as well as attributable numbers, rates and fractions of health outcome. Model validation statistics are also provided.

Usage

wildfire_do_analysis(
  health_path,
  join_wildfire_data = FALSE,
  ncdf_path = NULL,
  shp_path = NULL,
  date_col,
  region_col,
  shape_region_col = NULL,
  mean_temperature_col,
  health_outcome_col,
  population_col = NULL,
  rh_col = NULL,
  wind_speed_col = NULL,
  pm_2_5_col = NULL,
  wildfire_lag = 3,
  temperature_lag = 1,
  spline_temperature_lag = 0,
  spline_temperature_degrees_freedom = 6,
  predictors_vif = NULL,
  calc_relative_risk_by_region = FALSE,
  scale_factor_wildfire_pm = 10,
  save_fig = FALSE,
  save_csv = FALSE,
  output_folder_path = NULL,
  create_run_subdir = FALSE,
  print_vif = FALSE,
  print_model_summaries = FALSE
)

Arguments

health_path

join_wildfire_data

Boolean. If TRUE, a daily time series of wildfire-related PM2.5 concentration is joined to the health data. If FALSE, the data set is loaded without any additional joins. Defaults to FALSE.

ncdf_path

Path to a NetCDF file containing a daily time series of gridded wildfire-related PM2.5 concentration data.

shp_path

Path to a shapefile .shp of the geographical boundaries for which to extract mean values of wildfire-related PM2.5

date_col

Character. Name of the column in the dataframe that contains the date.

region_col

Character. Name of the column in the dataframe that contains the region names.

shape_region_col

Character. Name of the column in the shapefile dataframe that contains the region names.

mean_temperature_col

Character. Name of the column in the dataframe that contains the mean temperature column.

health_outcome_col

Character. Name of the column in the dataframe that contains the health outcome count column (e.g. number of deaths, hospital admissions)

population_col

Character. Name of the column in the dataframe that contains the population data. Defaults to NULL. This is only required when requesting region-level AF/AN outputs and no pop column is already present in the input data.

rh_col

Character. Name of the column containing relative humidity values. Defaults to NULL.

wind_speed_col

Character. Name of the column containing wind speed. Defaults to NULL.

pm_2_5_col

Character. The name of the column containing PM2.5 values in micrograms. This is only required if health data isn't joined. Defaults to NULL.

wildfire_lag

Integer. The number of days for which to calculate the lags for wildfire PM2.5. Default is 3.

temperature_lag

Integer. The number of days for which to calculate the lags for temperature. Default is 1.

spline_temperature_lag

Integer. The number of days of lag in the temperature variable from which to generate splines. Default is 0 (unlagged temperature variable).

spline_temperature_degrees_freedom

Integer. Degrees of freedom for the spline(s).

predictors_vif

Character vector with each of the predictors to include in the model. Must contain at least 2 variables. Defaults to NULL.

calc_relative_risk_by_region

Bool. Whether to calculate Relative Risk by region. Default: FALSE

scale_factor_wildfire_pm

save_fig

Boolean. Whether to save the plot as an output.

save_csv

Boolean. Whether to save the results as a CSV

output_folder_path

Path. Path to folder where plots and/or CSV should be saved.

create_run_subdir

Boolean. If TRUE, create a timestamped subdirectory under output_folder_path for this run's outputs. Defaults to FALSE.

print_vif

Bool, whether or not to print VIF (variance inflation factor) for each predictor. Defaults to FALSE.

print_model_summaries

Bool. Whether to print the model summaries to console. Defaults to FALSE.

Details

This analysis pipeline requires a daily time series with mean wildfire PM2.5, mean temperature and health outcome (all-cause mortality, respiratory, cardiovascular, hospital admissions etc) with population values as a minimum. This is then processed using a time stratified case crossover approach with conditional Poisson case-crossover analysis and optional meta-analysis. Meta-analysis is recommended if the input data is disaggregated by area.

The model parameters have default values, which are recommended to keep as based on existing studies. However, if desired these can be adjusted for sensitivity analysis.

Model validation testing is provided as a standard output from the pipeline so a user can assess the quality of the model. Additionally, users can incorporate extra independent variables-such as relative humidity or wind speed-directly into the model for enhanced analysis.

Further details on the input data requirements, methodology, quality information and guidance on interpreting outputs can be found in the accompanying published doi:10.5281/zenodo.14052184.

Value

rr_results A dataframe with relative risk estimates and confidence intervals for each region.
rr_pm A dataframe of relative risk estimates for wildfire-specific PM2.5 exposure across regions as PM values changes.
af_an_results A dataframe containing attributable fractions, attributable numbers and deaths per 100k population for each region
annual_af_an_resultsA dataframe containing annual attributable numbers and fractions for each region
calculate_qaic A dataframe of QAIC and dispersion metrics for each model combination and geography.
check_wildfire_vif A dataframe containing Variance inflation factors for each independent variables by region.

References

Brown A, Soutter E, Ingole V., Standards for Official Statistics on Climate-Health Interactions (SOSCHI): Wildfires: introduction. Zenodo; 2024. Available from: https://zenodo.org/records/14052184
Hänninen R, Sofiev M, Uppstu A, Kouznetsov R.Daily surface concentration of fire related PM2.5 for 2003-2023, modelled by SILAM CTM when using the MODIS satellite data for the fire radiative power. Finnish Meteorological Institute; 2024. Available from: doi:10.57707/fmi-b2share.d1cac971b3224d438d5304e945e9f16c
GADM. Database for Global Administrative Areas.Available from: https://gadm.org/download_country.html
Tobias A, Kim Y, Madaniyazi L. Time-stratified case-crossover studies for aggregated data in environmental epidemiology: a tutorial. Int J Epidemiol. 2024;53(2). Available from: doi:10.1093/ije/dyae020
Wu Y, Li S, Guo Y. Space-Time-Stratified Case-Crossover Design in Environmental Epidemiology Study. Heal Data Sci. 2021; Available from: doi:10.34133/2021/9870798

Examples


example_data <- data.frame(
  date = seq.Date(as.Date("2020-01-01"), by = "day", length.out = 180),
  region = "Example Region",
  death = stats::rpois(180, lambda = 4),
  population = 400000,
  tmean = stats::runif(180, 10, 35),
  mean_PM = stats::runif(180, 0, 25)
)
example_path <- tempfile(fileext = ".csv")
utils::write.csv(example_data, example_path, row.names = FALSE)

wildfire_do_analysis(
health_path = example_path,
join_wildfire_data = FALSE,
ncdf_path = NULL,
shp_path = NULL,
date_col = "date",
region_col = "region",
shape_region_col = NULL,
mean_temperature_col = "tmean",
health_outcome_col = "death",
population_col = "population",
rh_col = NULL,
wind_speed_col = NULL,
pm_2_5_col = " mean_PM ",
wildfire_lag = 3,
temperature_lag = 1,
spline_temperature_lag = 0,
spline_temperature_degrees_freedom = 4,
predictors_vif = NULL,
calc_relative_risk_by_region = FALSE,
scale_factor_wildfire_pm = 10,
save_fig = FALSE,
save_csv = FALSE,
output_folder_path = tempdir(),
create_run_subdir = FALSE,
print_vif = FALSE,
print_model_summaries = FALSE)

Run plotting code inside a safely managed PDF device.

Description

Run plotting code inside a safely managed PDF device.

Usage

with_pdf_device(output_path, width = 14, height = 8, context = "plot", plot_fn)

Arguments

output_path

Character. Output path for the PDF file.

width

Numeric. PDF width in inches.

height

Numeric. PDF height in inches.

context

Character. Context label used in error messages.

plot_fn

Function. Plotting function to execute.

Value

None. Writes a PDF and closes device safely.

Package {climatehealth}

climatehealth: Statistical Tools for Modelling Climate-Health Impacts

Description

Overview

Included Indicators

License

The full range of topics include

Author(s)

See Also

English day of week names

Description

Usage

Arguments

Value

English month names

Description

Usage

Arguments

Value

Temporarily set English locale for date operations

Description

Usage

Arguments

Value

Raise a typed error with structured metadata

Description

Usage

Arguments

Value

Examples

Raise a column-not-found error with available columns

Description

Usage

Arguments

Value

Examples

Raise a model error (statistical/computational failures)

Description

Usage

Arguments

Value

Examples

Raise a validation error (data/parameter issues)

Description

Usage

Arguments

Value

Examples

Aggregate air pollution results by month

Description

Usage

Arguments

Value

Aggregate air pollution results by region

Description

Usage

Arguments

Value

Aggregate air pollution results by year

Description

Usage

Arguments

Value

Split dataframe into multiple dataframes, based on a columns value.

Description

Usage

Arguments

Value

Descriptive statistics

Description

Usage

Arguments

Value

Comprehensive Air Pollution Analysis Pipeline

Description

Usage

Arguments

Value

Examples

Perform meta analysis with multiple lag structures