| Title: | Access Brazilian Public Health Data |
| Version: | 0.1.1 |
| Description: | Provides easy access to Brazilian public health data from multiple sources including VIGITEL (Surveillance of Risk Factors for Chronic Diseases by Telephone Survey), PNS (National Health Survey), SIM (Mortality Information System), SINASC (Live Birth Information System), and other health information systems. Data is downloaded from the Brazilian Ministry of Health VIGITEL repository https://svs.aids.gov.br/download/Vigitel/. Data is returned in tidy format following tidyverse conventions. |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| Depends: | R (≥ 4.2.0) |
| Imports: | tibble, dplyr, readxl, curl, cli, rlang, stringr, janitor, purrr |
| Suggests: | testthat (≥ 3.0.0), knitr, rmarkdown, furrr, future, arrow |
| Config/testthat/edition: | 3 |
| VignetteBuilder: | knitr |
| URL: | https://github.com/SidneyBissoli/healthbR |
| BugReports: | https://github.com/SidneyBissoli/healthbR/issues |
| NeedsCompilation: | no |
| Packaged: | 2026-02-04 02:06:15 UTC; SIDNEY |
| Author: | Sidney Bissoli |
| Maintainer: | Sidney Bissoli <sbissoli76@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-02-04 08:20:36 UTC |
healthbR: Access Brazilian Public Health Data
Description
Provides easy access to Brazilian public health data from multiple sources including VIGITEL (Surveillance of Risk Factors for Chronic Diseases by Telephone Survey), PNS (National Health Survey), SIM (Mortality Information System), SINASC (Live Birth Information System), and other health information systems. Data is downloaded from the Brazilian Ministry of Health VIGITEL repository https://svs.aids.gov.br/download/Vigitel/. Data is returned in tidy format following tidyverse conventions.
Author(s)
Maintainer: Sidney Bissoli sbissoli76@gmail.com (ORCID)
See Also
Useful links:
Report bugs at https://github.com/SidneyBissoli/healthbR/issues
Check arrow availability and stop with informative message
Description
Check arrow availability and stop with informative message
Usage
check_arrow(feature = "Parquet file support")
Arguments
feature |
Character describing what feature requires arrow |
Value
NULL (invisibly), stops if arrow not available
Check if arrow package is available
Description
Check if arrow package is available
Usage
has_arrow()
Value
TRUE if arrow is available, FALSE otherwise
List Available Data Sources
Description
Returns information about all data sources available in healthbR.
Usage
list_sources()
Value
A tibble with columns:
-
source: Source code (e.g., "vigitel", "sim") -
name: Full name of the data source -
description: Brief description -
years: Range of available years -
status: Implementation status ("available", "planned")
Examples
list_sources()
Utility Functions for healthbR
Description
Utility Functions for healthbR
Get VIGITEL base URL
Description
Get VIGITEL base URL
Usage
vigitel_base_url()
Value
Character string with base URL
Get VIGITEL cache directory
Description
Get VIGITEL cache directory
Usage
vigitel_cache_dir(cache_dir = NULL)
Arguments
cache_dir |
Optional custom cache directory. If NULL, uses default user cache directory. |
Value
Path to cache directory
Get VIGITEL cache status
Description
Shows which years are cached and file sizes.
Usage
vigitel_cache_status(cache_dir = NULL)
Arguments
cache_dir |
Character. Optional custom cache directory. If NULL (default), uses the standard user cache directory. |
Value
A tibble with cache information
Examples
# check cache status
vigitel_cache_status()
Clear VIGITEL cache
Description
Removes all cached VIGITEL data files (Excel and Parquet).
Usage
vigitel_clear_cache(keep_parquet = FALSE, cache_dir = NULL)
Arguments
keep_parquet |
Logical. If TRUE, keep Parquet files and only remove Excel files. Default is FALSE (remove all). |
cache_dir |
Character. Optional custom cache directory. If NULL (default), uses the standard user cache directory. |
Value
NULL (invisibly)
Examples
# remove all cached files from default cache
vigitel_clear_cache()
Convert Excel file to Parquet format
Description
Convert Excel file to Parquet format
Usage
vigitel_convert_to_parquet(year, force = FALSE, cache_dir = NULL)
Arguments
year |
Integer year |
force |
Logical. If TRUE, reconvert even if parquet exists. |
cache_dir |
Optional custom cache directory |
Value
Path to parquet file (invisibly)
Load VIGITEL microdata
Description
Downloads (if necessary) and loads VIGITEL survey microdata into R. Data is automatically converted to Parquet format for faster subsequent loading. The data includes survey weights for proper statistical analysis.
Usage
vigitel_data(
year,
vars = NULL,
force_download = FALSE,
parallel = TRUE,
lazy = FALSE,
cache_dir = NULL
)
Arguments
year |
Year(s) of the survey. Can be:
|
vars |
Character vector. Variable names to select, or NULL for all variables. Default is NULL. |
force_download |
Logical. If TRUE, re-download and reconvert data. Default is FALSE. |
parallel |
Logical. If TRUE, download and process multiple years in parallel. Default is TRUE when multiple years are requested. |
lazy |
Logical. If TRUE, return an Arrow Dataset for lazy evaluation
instead of loading all data into memory. Useful for filtering large
datasets before collecting. Use |
cache_dir |
Character. Optional custom cache directory. If NULL (default),
uses the standard user cache directory. Use |
Details
On first access, data is downloaded from the Ministry of Health and converted to Parquet format. Subsequent loads read directly from the Parquet file, which is significantly faster.
The arrow package is required for Parquet file support. If not
installed, an informative error message will be shown with installation
instructions.
For parallel downloads, the function uses the furrr and future
packages if installed. Install them with install.packages(c("furrr", "future"))
to enable parallel processing. The number of workers is automatically set
based on available CPU cores. If these packages are not installed, processing
falls back to sequential mode.
When lazy = TRUE, the function returns an Arrow Dataset that supports
dplyr operations (filter, select, mutate, etc.) without loading data into
memory. This is useful for working with large datasets or when you only
need a subset of the data. Call collect() to retrieve the results
as a tibble.
The VIGITEL survey uses complex sampling weights. For proper statistical
analysis, use survey packages like survey or srvyr.
The weight variable is named pesorake.
Value
A tibble with the VIGITEL microdata. When multiple years are
requested, a year column is added to identify the source year.
If lazy = TRUE, returns an Arrow Dataset that can be queried
with dplyr verbs before calling collect().
Examples
# single year (uses tempdir to avoid leaving files on system)
df <- vigitel_data(2023, cache_dir = tempdir())
# specific variables
df <- vigitel_data(2023, vars = c("cidade", "sexo", "idade", "pesorake"),
cache_dir = tempdir())
Load single year of VIGITEL data
Description
Load single year of VIGITEL data
Usage
vigitel_data_single(
year,
vars = NULL,
force_download = FALSE,
lazy = FALSE,
cache_dir = NULL
)
Arguments
year |
Integer year |
vars |
Character vector of variables or NULL |
force_download |
Logical |
lazy |
Logical. If TRUE, return Arrow object for lazy evaluation. |
cache_dir |
Optional custom cache directory |
Value
A tibble or Arrow Table (if lazy = TRUE)
Get VIGITEL variable dictionary
Description
Returns the data dictionary with variable descriptions, labels, and coding information for VIGITEL surveys.
Usage
vigitel_dictionary(force_download = FALSE, cache_dir = NULL)
Arguments
force_download |
Logical. If TRUE, re-download the dictionary. |
cache_dir |
Character. Optional custom cache directory. If NULL (default),
uses the standard user cache directory. Use |
Value
A tibble with variable metadata
Examples
# get the dictionary (uses tempdir to avoid leaving files)
dict <- vigitel_dictionary(cache_dir = tempdir())
# view column names
names(dict)
Download VIGITEL microdata for a specific year
Description
Downloads the VIGITEL survey microdata file from the Ministry of Health website. Files are cached locally to avoid repeated downloads.
Usage
vigitel_download(year, force = FALSE, cache_dir = NULL)
Arguments
year |
Integer. Year of the survey (use |
force |
Logical. If TRUE, re-download even if file exists in cache. Default is FALSE. |
cache_dir |
Character. Optional custom cache directory. If NULL (default),
uses the standard user cache directory. Use |
Value
Path to the downloaded file (invisibly)
Examples
# download 2023 data (uses tempdir to avoid leaving files)
vigitel_download(2023, cache_dir = tempdir())
Download VIGITEL data dictionary
Description
Downloads the official VIGITEL data dictionary from the Ministry of Health.
Usage
vigitel_download_dictionary(force = FALSE, cache_dir = NULL)
Arguments
force |
Logical. If TRUE, re-download even if cached. |
cache_dir |
Optional custom cache directory |
Value
Path to the downloaded file (invisibly)
Get path to Excel file for a specific year
Description
Get path to Excel file for a specific year
Usage
vigitel_excel_path(year, cache_dir = NULL)
Arguments
year |
Integer year |
cache_dir |
Optional custom cache directory |
Value
Path to excel file
Build VIGITEL file URL for a specific year
Description
Build VIGITEL file URL for a specific year
Usage
vigitel_file_url(year)
Arguments
year |
Integer year |
Value
Character string with file URL
Get VIGITEL survey information
Description
Returns metadata about the VIGITEL survey.
Usage
vigitel_info()
Value
A list with survey information
Examples
vigitel_info()
Get path to Parquet file for a specific year
Description
Get path to Parquet file for a specific year
Usage
vigitel_parquet_path(year, cache_dir = NULL)
Arguments
year |
Integer year |
cache_dir |
Optional custom cache directory |
Value
Path to parquet file
Parse year argument
Description
Converts various year input formats to integer vector.
Usage
vigitel_parse_years(year)
Arguments
year |
Year specification (integer, character, vector, or "all") |
Value
Integer vector of years
List VIGITEL variables
Description
Returns a character vector of variable names available in a VIGITEL survey year.
Usage
vigitel_variables(year, cache_dir = NULL)
Arguments
year |
Integer. Year of the survey. |
cache_dir |
Character. Optional custom cache directory. If NULL (default),
uses the standard user cache directory. Use |
Value
A character vector of variable names
Examples
# list variables for 2023 (uses tempdir to avoid leaving files)
vigitel_variables(2023, cache_dir = tempdir())
List available VIGITEL survey years
Description
Returns a vector of years for which VIGITEL microdata is available for download from the Ministry of Health website.
Usage
vigitel_years()
Value
An integer vector of available years
Examples
vigitel_years()