The dataset R Package

rhub devel-version Codecov test coverage Project Status: Active CRAN_Status_Badge CRAN_time_from_release Status at rOpenSci Software Peer Review DOI dataobservatory

Overview

The dataset package helps you create semantically rich, machine-readable, and interoperable datasets in R. It introduces S3 classes that extend data frames, vectors, and bibliographic entries with formal metadata structures inspired by:

The goal is to preserve metadata when reusing statistical and repository datasets, improve interoperability, and make it easy to turn tidy data frames into web-ready, publishable datasets that comply with ISO and W3C standards.

Installation

You can install the latest released version of dataset from CRAN with:

install.packages("dataset")

To install the development version from GitHub with pak or remotes:

# install.packages("pak")
pak::pak("dataobservatory-eu/dataset")

# install.packages("remotes")
remotes::install_github("dataobservatory-eu/dataset")

Minimal Example

library(dataset)
df <- dataset_df(
  country = defined(
    c("AD", "LI"),
    label = "Country",
    namespace = "https://www.geonames.org/countries/$1/"
  ),
  gdp = defined(c(3897, 7365),
    label = "GDP",
    unit = "million euros"
  ),
  dataset_bibentry = dublincore(
    title = "GDP Dataset",
    creator = person("Jane", "Doe", role = "aut"),
    publisher = "Small Repository"
  )
)
print(df)
#> Doe (2025): GDP Dataset [dataset]
#>   rowid country   gdp 
#>   <chr> <chr>   <dbl>
#> 1 obs1  AD       3897
#> 2 obs2  LI       7365

Export as RDF triples:

dataset_to_triples(df, format = "nt")
#> [1] "<http://example.com/dataset#obsobs1> <http://example.com/prop/country> <https://www.geonames.org/countries/AD/> ."
#> [2] "<http://example.com/dataset#obsobs2> <http://example.com/prop/country> <https://www.geonames.org/countries/LI/> ."
#> [3] "<http://example.com/dataset#obsobs1> <http://example.com/prop/gdp> \"3897\"^^<xsd:decimal> ."                     
#> [4] "<http://example.com/dataset#obsobs2> <http://example.com/prop/gdp> \"7365\"^^<xsd:decimal> ."

Retain automatically recorded provenance:

provenance(df)
#> [1] "<http://example.com/dataset_prov.nt> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/prov#Bundle> ."                  
#> [2] "<http://example.com/dataset#> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/prov#Entity> ."                         
#> [3] "<http://example.com/dataset#> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/linked-data/cube#DataSet> ."                 
#> [4] "_:doejane <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/prov#Agent> ."                                              
#> [5] "<https://doi.org/10.32614/CRAN.package.dataset> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/prov#SoftwareAgent> ."
#> [6] "<http://example.com/creation> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/prov#Activity> ."                       
#> [7] "<http://example.com/creation> <http://www.w3.org/ns/prov#generatedAtTime> \"2025-11-16T08:47:24Z\"^^<xsd:dateTime> ."

Contributing

We welcome contributions and discussion!

Code of Conduct

This project follows the rOpenSci Code of Conduct. By participating, you are expected to uphold these guidelines.

mirror server hosted at Truenetwork, Russian Federation.