Data Package

Peter Desmet

Data Package is a simple container format to describe a coherent collection of data (a dataset), including its contributors, licenses, etc.

In this document we use the terms “package” for Data Package, “resource” for Data Resource, “dialect” for Table Dialect, and “schema” for Table Schema.

General implementation

Frictionless supports reading, manipulating and writing packages. Much of its functionality is focused on manipulating resources (see vignette("data-resource")).

Read

read_package() reads a package from datapackage.json file (path or URL):

library(frictionless)
file <- system.file("extdata", "datapackage.json", package = "frictionless")
package <- read_package(file)

print.datapackage() prints a human-readable summary of a package:

package
#> A Data Package with 3 resources:
#> • deployments
#> • observations
#> • media
#> Use `unclass()` to print the Data Package as a list.

Manipulate

A package is a list, with all the properties that were present in the datapackage.json file (e.g. name, id, etc.). Frictionless adds the custom property "directory" to support reading data (which is removed when writing to disk) and extends the class with "datapackage" to support printing and checking:

attributes(package)
#> $names
#> [1] "name"      "id"        "created"   "image"     "licenses"  "temporal" 
#> [7] "resources" "directory"
#> 
#> $class
#> [1] "datapackage" "list"

create_package() creates a package from scratch or from an existing package. It adds the required properties and class if those are missing:

# From scratch
create_package()
#> A Data Package with 0 resources.
#> Use `unclass()` to print the Data Package as a list.

# From an existing package
create_package(package)
#> A Data Package with 3 resources:
#> • deployments
#> • observations
#> • media
#> Use `unclass()` to print the Data Package as a list.

check_package() checks if a package contains the required properties and class:

invalid_package <- example_package()
invalid_package$resources <- NULL
check_package(invalid_package)
#> Error in `check_package()`:
#> ! `package` must be a Data Package object.
#> ✖ `package` is missing a resources property or it is not a list.
#> ℹ Create a valid Data Package object with `read_package()` or
#>   `create_package()`.

You can manipulate the package list, but frictionless does not provide functions to do that. Use {purrr} or base R instead (see vignette("frictionless")). Note however that some functions (e.g. unclass() or append()) removes the custom class, creating an invalid package. You can fix this by calling create_package() on your package.

Most functions have package as their first argument and return package. This allows you to pipe the functions:

library(dplyr) # Or library(magrittr)
my_package <-
  create_package() %>%
  add_resource(resource_name = "iris", data = iris) %>%
  append(c("title" = "my_package"), after = 0) %>%
  create_package() # To add the datapackage class again
my_package
#> A Data Package with 1 resource:
#> • iris
#> Use `unclass()` to print the Data Package as a list.

Write

write_package() writes a package to disk as a datapackage.json file. For some resources, it also writes the data files. See the function documentation and vignette("data-resource") for details.

Properties implementation

resources

resources is required. It is used by resources() and many other functions. check_package() returns an error if it is missing.

profile

profile is ignored by read_package() and not set (to e.g. "tabular-data-package") by create_package().

name

name is ignored by read_package() and not set by create_package().

id

id is ignored by read_package() and not set by create_package(). print.datapackage() adds an extra sentence when id is a URL (like a DOI):

package <- example_package()
package$id <- "https://doi.org/10.5281/zenodo.10053702/"
package
#> A Data Package with 3 resources:
#> • deployments
#> • observations
#> • media
#> For more information, see <https://doi.org/10.5281/zenodo.10053702/>.
#> Use `unclass()` to print the Data Package as a list.

licenses

licenses is ignored by read_package() and not set by create_package().

title

title is ignored by read_package() and not set by create_package().

description

description is ignored by read_package() and not set by create_package().

homepage

homepage is ignored by read_package() and not set by create_package().

image

image is ignored by read_package() and not set by create_package().

version

version is ignored by read_package() and not set by create_package().

created

created is ignored by read_package() and not set by create_package().

keywords

keywords is ignored by read_package() and not set by create_package().

contributors

contributors is ignored by read_package() and not set by create_package().

sources

sources is ignored by read_package() and not set by create_package().

mirror server hosted at Truenetwork, Russian Federation.