Type: Package
Title: Join Gridded Weather Data to Event Tables
Version: 0.2.0
URL: https://github.com/hauae/weatherjoin
BugReports: https://github.com/hauae/weatherjoin/issues
Description: High-level tools to attach gridded weather data from the NASA POWER Project to event-based datasets. The package plans efficient spatio-temporal API calls via the 'nasapower' R package, caches downloaded segments locally, and joins weather variables back to the input table using exact or rolling joins. This package is not affiliated with or endorsed by NASA.
License: MIT + file LICENSE
Encoding: UTF-8
RoxygenNote: 7.3.3
Imports: data.table, jsonlite
Suggests: nasapower, digest, fst, anytime, testthat (≥ 3.0.0), knitr, rmarkdown, withr
Depends: R (≥ 4.1.0)
Config/testthat/edition: 3
VignetteBuilder: knitr
NeedsCompilation: no
Packaged: 2026-01-25 13:52:17 UTC; 00758120
Author: Przemek Dolowy [aut, cre] (affiliation: Harper Adams University)
Maintainer: Przemek Dolowy <pdolowy@harper-adams.ac.uk>
Repository: CRAN
Date/Publication: 2026-01-29 18:50:01 UTC

weatherjoin: Join Gridded Weather Data to Event Tables

Description

High-level tools to attach gridded weather data from the NASA POWER project to event-based datasets. The package plans efficient spatio-temporal API calls, caches downloaded segments locally, and joins weather variables back to the input table using exact or rolling joins. NASA POWER data are retrieved via the 'nasapower' R package. This package is not affiliated with or endorsed by NASA.

Author(s)

Maintainer: Przemek Dolowy pdolowy@harper-adams.ac.uk (Harper Adams University)

See Also

Useful links:


Join weather back to events (supports rolling join for hourly)

Description

Join weather back to events (supports rolling join for hourly)

Usage

.attach_weather(
  x,
  weather,
  params,
  tz = "UTC",
  roll = c("nearest", "last", "none"),
  roll_max_hours = NULL,
  coord_digits = 5
)

Build standard time keys used by weatherjoin

Description

Build standard time keys used by weatherjoin

Usage

.build_time(DT, time, tz = "UTC", time_api_resolved = c("daily", "hourly"))

Arguments

DT

data.table with input data.

time

User ⁠time=⁠ specification (single column or multiple columns).

tz

Timezone used for parsing/constructing timestamps (default UTC).

time_api_resolved

"daily" or "hourly" (already resolved from user setting/guess).

Value

DT with timestamp_utc (POSIXct) and t_utc (numeric seconds) columns added.


Check cache coverage for planned calls

Description

Internal helper. Determines which planned provider calls are satisfied by existing cache entries and which must be fetched.

Usage

.cache_check(
  calls,
  time_api,
  params,
  site_elevation_col = "site_elevation",
  settings,
  cache_dir,
  cache_scope = c("user", "project"),
  pkg = "weatherjoin",
  cache_max_age_days = 30,
  refresh = c("if_missing", "if_stale", "always"),
  match_mode = c("cover", "exact"),
  param_match = c("superset", "exact")
)

Plan provider calls: for each loc_id, split by time sparsity

Description

Plan provider calls: for each loc_id, split by time sparsity

Usage

.call_plan(
  x,
  time_col = "timestamp_utc",
  loc_id_col = "loc_id",
  rep_lat_col = "rep_lat",
  rep_lon_col = "rep_lon",
  tz = "UTC"
)

Placeholder elevation lookup

Description

Placeholder elevation lookup

Usage

.elev_lookup(lon, lat, method = c("constant"), constant = 100, ...)

Fetch NASA POWER for planned calls

Description

Fetch NASA POWER for planned calls

Usage

.fetch_power(
  calls_to_fetch,
  time_api,
  params,
  community = "ag",
  time_standard = "UTC",
  settings = list(),
  cache_dir = NULL,
  cache_scope = c("user", "project"),
  pkg = "weatherjoin",
  dummy_hour = 12L,
  verbose = FALSE,
  ...
)

Multi-column time input path Map time columns to roles

Description

Multi-column time input path Map time columns to roles

Usage

.map_time_columns(time_cols, names_x)

Arguments

time_cols

Character vector of column names supplied by the user via ⁠time=⁠.

names_x

Names of the input table.

Value

A list with mode ("ymd" or "ydoy") and role names: year, month, day, hour (optional), doy (optional).


Normalize POWER output time columns to timestamp_utc (UTC)

Description

Normalize POWER output time columns to timestamp_utc (UTC)

Usage

.normalize_power_time(
  w,
  time_api = c("hourly", "daily"),
  tz = "UTC",
  dummy_hour = 12L
)

Resolve time_api based on user choice and input resolution

Description

Resolve time_api based on user choice and input resolution

Usage

.resolve_time_api(
  dt,
  time_api = c("guess", "hourly", "daily"),
  input_res = c("hourly", "daily"),
  tz = "UTC",
  dummy_hour = 12L
)

Spatial planning: map points to representative locations

Description

Spatial planning: map points to representative locations

Usage

.spatial_plan(
  x,
  spatial_mode = c("cluster", "exact", "by_group"),
  lat_col = "lat",
  lon_col = "lon",
  group_col = NULL,
  rep_method = c("median", "centroid"),
  cluster_radius_m = 250,
  keep_diag = TRUE,
  check_range = TRUE,
  coord_digits = 5L
)

Split sparse time points into segments using a gap penalty (hours)

Description

Split sparse time points into segments using a gap penalty (hours)

Usage

.split_time_ranges(times_utc)

SIngle-column time input path Validate and normalize a time column

Description

SIngle-column time input path Validate and normalize a time column

Usage

.validate_single_time(
  raw,
  tz = "UTC",
  dummy_hour = 12L,
  time_api_resolved = c("daily", "hourly"),
  time_col = "<time>",
  max_examples = 5L
)

Multi-column time input path Validate time components and build Date safely

Description

Multi-column time input path Validate time components and build Date safely

Usage

.validate_time_components(
  y,
  m = NULL,
  d = NULL,
  doy = NULL,
  h = NULL,
  mode = c("ymd", "ydoy"),
  time_api_resolved = c("daily", "hourly"),
  time_cols = character(),
  max_examples = 5L
)

Arguments

y, m, d

Integer-ish vectors (for mode="ymd").

doy

Integer-ish vector (for mode="ydoy").

h

Optional integer-ish vector.

mode

"ymd" or "ydoy"

time_api_resolved

"hourly" or "daily" (for hourly requirement checks)

time_cols

Character vector of user-specified columns for error context.

max_examples

How many bad examples to show in error messages.

Value

A list with date (Date) and hour (integer, possibly NA if missing and not allowed).


Internal: load required packages (used for interactive sourcing too)

Description

Internal: load required packages (used for interactive sourcing too)

Usage

.wj_load(pkgs = c("data.table"), attach = FALSE, quiet = TRUE)

Get weatherjoin option with default

Description

Get weatherjoin option with default

Usage

.wj_opt(name, default)

Join gridded weather data to an event table

Description

Attach gridded weather variables from NASA POWER to rows of an event table. The function:

Usage

join_weather(
  x,
  params,
  time,
  lat_col = "lat",
  lon_col = "lon",
  time_api = c("guess", "hourly", "daily"),
  tz = "UTC",
  roll = c("nearest", "last", "none"),
  roll_max_hours = NULL,
  spatial_mode = c("cluster", "exact", "by_group"),
  group_col = NULL,
  cluster_radius_m = 250,
  site_elevation = c("constant", "auto"),
  elev_constant = 100,
  elev_fun = NULL,
  community = "ag",
  cache_scope = c("user", "project"),
  cache_dir = NULL,
  verbose = FALSE,
  ...
)

Arguments

x

A data.frame/data.table with event rows.

params

Character vector of NASA POWER parameter codes (e.g. "T2M").

time

A single column name containing time (POSIXct/Date/character/numeric) OR a character vector of column names used to assemble a timestamp (e.g. c("YEAR","MO","DY","HR")).

lat_col, lon_col

Column names for latitude and longitude (decimal degrees).

time_api

One of "guess", "hourly", "daily". If "daily" is chosen while the input contains time-of-day information, timestamps are downsampled to dates (with a fixed hour). If "hourly" is chosen but the input has no time-of-day information, an error is raised.

tz

Time zone used to interpret/construct input timestamps (default "UTC"). Weather is requested from NASA POWER in UTC.

roll

Join behaviour when matching timestamps: "nearest" (default, recommended), "last", or "none" (exact). Rolling is applied when joining hourly weather to event times.

roll_max_hours

Maximum allowed time distance (hours) for a rolling match. If NULL, a safe default is used: 1 hour for hourly joins and 24 hours for daily joins.

spatial_mode

How to reduce many points to representative locations before calling POWER: "cluster" (default), "exact", or "by_group". Clustering reduces accidental explosion of provider calls and matches POWER's coarse spatial resolution.

group_col

Grouping column used when spatial_mode="by_group".

cluster_radius_m

Clustering radius in meters when spatial_mode="cluster".

site_elevation

Elevation strategy for POWER calls: "constant" or "auto". Elevation is resolved for representative locations and becomes part of the cache identity.

elev_constant

Constant elevation (meters) used when site_elevation="constant" and as a fallback for "auto".

elev_fun

Optional function function(lon, lat, ...) returning elevation (meters) for representative points.

community

Passed to nasapower::get_power() (e.g. "ag").

cache_scope

Where to store cache by default: "user" or "project".

cache_dir

Optional explicit cache directory. If NULL, determined by cache_scope.

verbose

If TRUE, print progress messages.

...

Passed through to nasapower::get_power().

Value

A data.table with weather columns appended. Rows with missing/invalid inputs keep their original values and receive NA weather.

See Also

wj_cache_list, wj_cache_clear, weatherjoin_options


weatherjoin options

Description

Most users will not need to change package options. Advanced configuration can be controlled via options().

Details

Cache policy

Time splitting and call planning

These options control how sparse time series are split into separate provider calls. They are primarily performance controls; incorrect values will not change the meaning of returned weather values, only how much data is downloaded and cached.

Time construction

Diagnostics

Use withr for temporary changes:

withr::local_options(list(
  weatherjoin.split_penalty_hours = 168,
  weatherjoin.max_parts = 25
))

Clear cached weather data

Description

Deletes cached files and (optionally) removes rows from the cache index.

Usage

wj_cache_clear(
  cache_dir = NULL,
  cache_scope = c("user", "project"),
  pkg = "weatherjoin",
  filter = NULL,
  keep_index = FALSE,
  dry_run = FALSE,
  verbose = TRUE
)

Arguments

cache_dir

Optional explicit cache directory.

cache_scope

Where to store cache by default: "user" or "project".

pkg

Package name used for "user" cache scope.

filter

Optional expression evaluated within the cache index to select entries to remove.

keep_index

If TRUE, leaves index rows (useful for debugging); default FALSE.

dry_run

If TRUE, prints what would be deleted but does not delete.

verbose

If TRUE, prints progress.

Value

Invisibly returns the rows selected for deletion.


List cached weather segments

Description

Returns the cache index (one row per cached segment).

Usage

wj_cache_list(
  cache_dir = NULL,
  cache_scope = c("user", "project"),
  pkg = "weatherjoin"
)

Arguments

cache_dir

Optional explicit cache directory.

cache_scope

Where to store cache by default: "user" or "project".

pkg

Package name used for "user" cache scope.

Value

A data.table index of cached segments.


Upgrade cache index schema

Description

Ensures the cache index contains required columns and correct types.

Usage

wj_cache_upgrade_index(
  cache_dir = NULL,
  cache_scope = c("user", "project"),
  pkg = "weatherjoin",
  verbose = TRUE
)

Arguments

cache_dir

Optional explicit cache directory.

cache_scope

Where to store cache by default: "user" or "project".

pkg

Package name used for "user" cache scope.

verbose

If TRUE, prints progress.

Value

The upgraded cache index.

mirror server hosted at Truenetwork, Russian Federation.