Title: Download Infectious Disease Data from 'SurvStat' (Robert Koch Institute)
Version: 0.1.2
Description: Provides an interface to the 'SurvStat' web service from the Robert Koch Institute (https://tools.rki.de/SurvStat/SurvStatWebService.svc) allowing downloads of disease time series stratified by pathogen type and subtype, age, and geography from notifiable disease reports in Germany.
License: MIT + file LICENSE
Encoding: UTF-8
RoxygenNote: 7.3.3.9007
Suggests: knitr, rmarkdown, ggplot2, testthat
VignetteBuilder: knitr
Imports: dplyr, magrittr, xml2, stringr, tibble, httr, curl, whisker, fs, purrr, tidyr, cli, locfit, rlang, sf
Depends: R (≥ 3.5)
LazyData: true
Language: en-GB
LazyDataCompression: xz
URL: https://bristol-vaccine-centre.github.io/rsurvstat/index.html, https://github.com/bristol-vaccine-centre/rsurvstat, https://bristol-vaccine-centre.github.io/rsurvstat/
BugReports: https://github.com/bristol-vaccine-centre/rsurvstat/issues
Config/Needs/build: terminological/pkgtools, robchallen/roxygen2
NeedsCompilation: no
Packaged: 2026-01-12 13:18:52 UTC; vp22681
Author: Robert Challen ORCID iD [aut, cre], Bristol Vaccine Centre [fnd, cph]
Maintainer: Robert Challen <rob.challen@bristol.ac.uk>
Repository: CRAN
Date/Publication: 2026-01-17 11:50:02 UTC

Survstat option accessor

Description

Survstat options are values that may have children.

Usage

## S3 method for class 'survstat_option'
x$y

Arguments

x

the options

y

the item

Value

the value of the list item or an error if it does not exist


Pipe operator

Description

See magrittr::%>% for details.

Usage

lhs %>% rhs

Arguments

lhs

A value or the magrittr placeholder.

rhs

A function call using the magrittr semantics.

Value

The result of calling rhs(lhs).


Support for auto suggests on survstat_options

Description

Support for auto suggests on survstat_options

Usage

## S3 method for class 'survstat_option'
.DollarNames(x, pattern)

Arguments

x

a survstat_option

pattern

a matching pattern

Value

the names of the children


Check for supported curl version

Description

Check for supported curl version

Usage

.check_curl()

Value

boolean (+/- warning)

Unit tests


.check_curl()

Convert a nested dataframe to a multilevel list

Description

Convert a nested dataframe to a multilevel list

Usage

.df_to_list_of_lists(df, ...)

Arguments

df

a nested dataframe

...

Named arguments passed on to .transpose

x

a data.frame or row_list

.fix

collapse or expand names in redundant multi-level row_lists. Either FALSE or a string to join or split the names of the multi-level list by

...

not used

Value

a list of lists

Unit tests



iris_list = .df_to_list_of_lists(datasets::iris)
# TODO: iris_list has lost Petal.Length as it is interpreting Petal.Width as
# nested item and it overwrites Petal.Length rather than merging with it.

testthat::expect_equal(
  iris_list[[1]]$Species,
  iris$Species[[1]]
)

mtcars_nest = datasets::mtcars 
  dplyr::mutate(name = rownames(.)) 
  tidyr::nest(details = -c(cyl,gear))

mtcars_list = mtcars_nest 

mtcars_unnest = mtcars_list 

testthat::expect_equal(
  mtcars_list[[1]]$details[[1]]$name,
  mtcars_nest$details[[1]]$name[[1]]
)


Convert a multilevel list to a nested dataframe

Description

Convert a multilevel list to a nested dataframe

Usage

.list_of_lists_to_df(lst, ...)

Arguments

lst

a multilevel list

...

Named arguments passed on to .transpose

x

a data.frame or row_list

.fix

collapse or expand names in redundant multi-level row_lists. Either FALSE or a string to join or split the names of the multi-level list by

...

not used

Value

a dataframe with each sublist nested as a dataframe

Unit tests


iris_list = .df_to_list_of_lists(iris, .fix=FALSE)
iris2 = .list_of_lists_to_df(iris_list, .fix=FALSE)

testthat::expect_equal(datasets::iris, as.data.frame(iris2))

mtcars_nest = datasets::mtcars 
  dplyr::mutate(name = rownames(.)) 
  tidyr::nest(details = -c(cyl,gear))

mtcars_list = mtcars_nest 

mtcars_nest2 = mtcars_list 

testthat::expect_equal(
  mtcars_nest2$details[[2]],
  mtcars_nest$details[[2]]
)

# test unequal length vector column is mapped to list of vectors
# and multiply named nests are treated as rows
testlist = list(
   row = list(a=1:5, b="x"),
   row = list(a=2:4, b="y"),
   row = list(a=3, b="z")
)
testdf = testlist 
testthat::expect_equal(testdf$b, c("x", "y", "z"))
testthat::expect_equal(testdf$a[[2]], 2:4)


Transform a nested dataframe to / from a row by row list

Description

Data frames are column lists, which may have nested dataframes. This function transforms a data frame to row based list with named sub lists with one entry per dataframe column (a row_list). It alternative converts a row_list back to a nested data frame

Usage

.transpose(x, ..., .fix = ".")

Arguments

x

a data.frame or row_list

...

not used

.fix

collapse or expand names in redundant multi-level row_lists. Either FALSE or a string to join or split the names of the multi-level list by

Value

either a dataframe or a list of class row_list representing the dataframe as a list of named lists.

Unit tests



# create a test nested data frame:

mtcars_nest = datasets::mtcars 
  dplyr::mutate(name = rownames(.)) 
  tidyr::nest(by_carb = -c(cyl,gear,carb)) 
  tidyr::nest(by_cyl_and_gear = -c(cyl,gear))

mtcars_list = mtcars_nest 

mtcars_nest2 = mtcars_list 

testthat::expect_equal(mtcars_nest, mtcars_nest2)

Tree printing method for list objects. This is an interactive function.

Description

Tree printing method for list objects. This is an interactive function.

Usage

.tree(x, max_levels = 6, ..., verbose = TRUE)

Arguments

x

A list

max_levels

The maximum number of levels to show

...

Additional arguments:

  • max_width the number of items horizontally to show before truncating.

  • max_length the number of items vertically to show before truncating.

  • others are passed to format(...)

verbose

print output to the console (the default)

Value

The hierarchy as a string, called for side effects


A Berlin outline sf map

Description

A Berlin outline sf map

Usage

data(BerlinMap)

Format

A sf dataframe containing the following columns:

1 rows


The CountyKey71Map dataset

Description

This matches the CountyKey71 dimension in SurvStat. This is the 400 Stadtkreis and Landkreise administrative regions in Germany, plus 12 Berlin boroughs (Bezirke) which replace the Berlin Kriese (Id: 11000). The boroughs have sequential Ids from ⁠[11001]⁠ to ⁠[11012]⁠

Usage

data(CountyKey71Map)

Format

A sf dataframe containing the following columns:

Any grouping allowed.

411 rows


The FedStateKey71Map dataset.

Description

This matches the FedStateKey71 dimension in SurvStat. This is the 16 federal states in Germany.

Usage

data(FedStateKey71Map)

Format

A sf dataframe containing the following columns:

16 rows


The NutsKey71Map dataset

Description

This matches the NutsKey71 dimension in SurvStat. This is the 38 NUTS2 level administrative regions in Germany.

Usage

data(NutsKey71Map)

Format

A sf dataframe containing the following columns:

38 rows


SurvStat age group list

Description

Usage

age_groups

Format

An object of class list of length 8.

References

https://survstat.rki.de/Content/Query/Create.aspx


Delete all cached SurvStat requests

Description

This function is only intended to be used interactively. The cache can be controlled with set_cache_settings()

Usage

cache_clear(confirm = utils::askYesNo("Are you sure?"))

Arguments

confirm

can be set to TRUE to make function non interactive.

Value

nothing. called for side effects

Examples

cache_clear( confirm = interactive() )

Commands supported by the SurvStat service

Description

Not all services support all 3 methods.

The 3 different resolution levels of the geospatial data

Usage

commands

return_measures

geography_resolution

Format

An object of class list of length 8.

An object of class list of length 3.

An object of class list of length 3.

References

https://survstat.rki.de/Content/Query/Create.aspx

https://survstat.rki.de/Content/Query/Create.aspx

https://survstat.rki.de/Content/Query/Create.aspx


Data sources in the SurvStat service

Description

Data sources in the SurvStat service

Usage

cubes

Format

An object of class list of length 3.

References

https://survstat.rki.de/Content/Query/Create.aspx


SurvStat disease list

Description

Supported diseases:

Usage

diseases

Format

An object of class list of length 121.

References

https://survstat.rki.de/Content/Query/Create.aspx


Infer and fit a population model from SurvStat output

Description

SurvStat can be queried for count or incidence. From the combination of these metrics queried across the whole range of disease notifications for any given year we can infer a stratified population size, that SurvStat is using to calculate it's incidence. This is simply modelled with a local polynomial over time to allow us to fill in weekly population denominators.

Usage

fit_population(count_df, .progress = TRUE)

infer_population(
  age_group = NULL,
  geography = NULL,
  years = NULL,
  .progress = TRUE
)

Arguments

count_df

a dataframe from the output of get_timeseries() or get_snapshot()

.progress

by default a progress bar is shown, which may be important if many downloads are needed to fulfil the request. It can be disabled by setting this to FALSE here.

age_group

(optional) the age group of interest as a SurvStat key, see rsurvstat::age_groups for a list of valid options.

geography

(optional) one of "state", "nuts", or "county" to define the resolution of the query. Does not accept a sf map or subset of (unlike get_timeseries()).

years

(optional) a vector of years to limit the response to. This may be useful to limit the size of returned pages in the event the SurvStat service hits a data transfer limit.

Value

the count_df dataframe with an additional population column

a dataframe with geography, age grouping, year and population columns

Functions

Examples



# snapshot:
get_snapshot(
  disease = diseases$`COVID-19`,
  geography = "state",
  season=2024
) %>%
fit_population() %>%
dplyr::glimpse()

# timeseries
# A weekly population estimate is inferred from the yearly data:
get_timeseries(
  diseases$`COVID-19`,
  measure = "Count",
  age_group = age_groups$children_coarse
) %>%
fit_population() %>%
dplyr::glimpse()



infer_population(years=2020:2025) %>% dplyr::glimpse()


Retrieve data from the SurvStat web service relating to a single time period.

Description

This function gets a snapshot of disease count or incidence data from the Robert Koch Institute SurvStat web service, based on either whole epidemiological season or an individual week within a season. Seasons are whole years starting either at the beginning of the calendar year, at week 27 or at week 40.

Usage

get_snapshot(
  disease = NULL,
  measure = c("Count", "Incidence"),
  ...,
  season,
  season_week = NULL,
  season_start = 1,
  age_group = NULL,
  age_range = c(0, Inf),
  disease_subtype = FALSE,
  geography = NULL,
  .progress = TRUE
)

Arguments

disease

the disease of interest as a SurvStat key, see rsurvstat::diseases for a current list of these. This is technically optional, and if omitted the counts of all diseases will be returned. Keys are the same as the options in the SurvStat user interface found here. IfSG and state variants of diseases are counts that are reported directly to the Robert Koch Institute or indirectly via state departments.

measure

one of "Count" (default) or "Incidence" per 100,000 per week or year depending on the context.

...

not used, must be empty.

season

the start year of the season in which the snapshot is taken

season_week

the start week within the season of the snapshot. If missing then the whole season is used

season_start

the week of the calendar year in which the season starts this can be one of 1, 27 or 40.

age_group

(optional) the age group of interest as a SurvStat key, see rsurvstat::age_groups for a list of valid options.

age_range

(optional) a length 2 vector with the minimum and maximum ages to consider

disease_subtype

if TRUE the returned count will be broken down by disease or pathogen subtype (assuming disease was provided).

geography

(optional) a geographical breakdown. This can be given as a character where it must be one of state, nuts, or county specifying the 16 region FedStateKey71Map, 38 region NutsKey71Map, or 411 region CountyKey71Map data respectively. Alternatively it can be given as a as a sf dataframe, subsetting one of these maps, in which case only that subset of regions will be returned.

.progress

by default a progress bar is shown, which may be important if many downloads are needed to fulfil the request. It can be disabled by setting this to FALSE here.

Details

The snapshot can be stratified by any combination of age, geography, disease, disease subtype. Queries to SurvStat are cached and paged, but obviously multidimensional extracts have the potential to need a lot of downloading.

Value

a data frame with at least year (the start of the epidemiological season) and start_week (the calendar week in which the epidemiological season starts), and one of count or incidence columns. Most likely it will also have disease_name and disease_code columns, and some of age_name, age_code, age_low, age_high, geo_code, geo_name, disease_subtype_code, disease_subtype_name depending on options.

Examples


get_snapshot(
  diseases$`COVID-19`,
  measure = "Count",
  season = 2024,
  age_group = age_groups$children_coarse
)

get_snapshot(
  diseases$`COVID-19`,
  measure = "Count",
  age_group = age_groups$children_coarse,
  season = 2024,
  geography = rsurvstat::FedStateKey71Map[1:10,]
)


Retrieve time series data from the SurvStat web service.

Description

This function gets a weekly timeseries of disease count or incidence data from the Robert Koch Institute SurvStat web service. The timeseries can be stratified by any combination of age, geography, disease, disease subtype. Queries to SurvStat are cached and paged, but obviously multidimensional extracts have the potential to need a lot of downloading.

Usage

get_timeseries(
  disease = NULL,
  measure = c("Count", "Incidence"),
  ...,
  age_group = NULL,
  age_range = c(0, Inf),
  disease_subtype = FALSE,
  years = NULL,
  geography = NULL,
  trim_zeros = c("leading", "both", "none"),
  .progress = TRUE
)

Arguments

disease

the disease of interest as a SurvStat key, see rsurvstat::diseases for a current list of these. This is technically optional, and if omitted the counts of all diseases will be returned. Keys are the same as the options in the SurvStat user interface found here. IfSG and state variants of diseases are counts that are reported directly to the Robert Koch Institute or indirectly via state departments.

measure

one of "Count" (default) or "Incidence" per 100,000 per week or year depending on the context.

...

not used, must be empty.

age_group

(optional) the age group of interest as a SurvStat key, see rsurvstat::age_groups for a list of valid options.

age_range

(optional) a length 2 vector with the minimum and maximum ages to consider

disease_subtype

if TRUE the returned count will be broken down by disease or pathogen subtype (assuming disease was provided).

years

(optional) a vector of years to limit the response to. This may be useful to limit the size of returned pages in the event the SurvStat service hits a data transfer limit.

geography

(optional) a geographical breakdown. This can be given as a character where it must be one of state, nuts, or county specifying the 16 region FedStateKey71Map, 38 region NutsKey71Map, or 411 region CountyKey71Map data respectively. Alternatively it can be given as a as a sf dataframe, subsetting one of these maps, in which case only that subset of regions will be returned.

trim_zeros

get rid of zero counts. Either "both" (from start and end), "leading" (from start only - the default) or "none".

.progress

by default a progress bar is shown, which may be important if many downloads are needed to fulfil the request. It can be disabled by setting this to FALSE here.

Value

a data frame with at least date (weekly), and one of count or incidence columns. Most likely it will also have disease_name and disease_code columns, and some of age_name, age_code, age_low, age_high, geo_code, geo_name, disease_subtype_code, disease_subtype_name depending on options. The dataframe will be grouped to make sure each group contains a single timeseries.

Examples


# age stratified
get_timeseries(
  diseases$`COVID-19`,
  measure = "Count",
  age_group = age_groups$children_coarse
) %>% dplyr::glimpse()

# geographic
get_timeseries(
  diseases$`COVID-19`,
  measure = "Count",
  geography = "state"
) %>% dplyr::glimpse()

# disease stratified, subset of years:
get_timeseries(
  measure = "Count",
  years = 2024
) %>% dplyr::glimpse()


Languages supported by the SurvStat service

Description

Languages supported by the SurvStat service

Usage

languages

Format

An object of class list of length 2.

References

https://survstat.rki.de/Content/Query/Create.aspx


Path to user cache directory

Description

This functions uses R_USER_CACHE_DIR if set. Otherwise, they follow platform conventions. Typical user cache directories are:

Usage

rappdirs_user_cache_dir(
  appname = NULL,
  appauthor = appname,
  version = NULL,
  opinion = TRUE,
  expand = TRUE,
  os = NULL
)

Arguments

appname

is the name of application. If NULL, just the system directory is returned.

appauthor

(only required and used on Windows) is the name of the app author or distributing body for this application. Typically it is the owning company name. This falls back to app name.

version

is an optional version path element to append to the path. You might want to use this if you want multiple versions of your app to be able to run independently. If used, this would typically be "<major>.<minor>". Only applied when app name is not NULL.

opinion

(logical) Use FALSE to disable the appending of Cache on Windows. See discussion below.

expand

If TRUE (the default) will expand the R_LIBS specifiers with their equivalents. See R_LIBS() for list of all possibly specifiers.

os

Operating system whose conventions are used to construct the requested directory. Possible values are "win", "mac", "unix". If NULL (the default) then the current OS will be used.

Opinion

On Windows the only suggestion in the MSDN docs is that local settings go in the CSIDL_LOCAL_APPDATA directory. This is identical to the non-roaming app data dir. But apps typically put cache data somewhere under this directory so rappdirs_user_cache_dir() appends Cache to the CSIDL_LOCAL_APPDATA value, unless opinion = FALSE.

Unit tests


rappdirs_user_cache_dir("rappdirs")

See Also

tempdir() for a non-persistent temporary directory.


Set options for the rsurvstat cache

Description

By default successful requests to SurvStat are cached for 7 days to prevent repeated querying of the service. This is stored in the usual R package cache location by default (e.g. "~/.cache/rsurvstat" on mac / linux). Caching can be switched off altogether.

Usage

set_cache_settings(..., active = NULL, dir = NULL, stale = NULL)

Arguments

...

you can also submit the settings as a named list.

active

boolean (optional), set to FALSE to disable caching

dir

file path (optional), the location of the cache

stale

numeric (optional), the number of days before a cached item is considered out of date

Value

the old cache settings as a list

Examples

old_settings = set_cache_settings(active = FALSE)
set_cache_settings(old_settings)

mirror server hosted at Truenetwork, Russian Federation.