In HDF5, attributes are small pieces of metadata attached to groups or datasets. They are best used to store descriptive information: units, timestamps, descriptions, or experimental parameters—separately from the main data array.
This vignette covers how to write, read, and manage these attributes
using h5lite, as well as important limitations regarding
their structure.
There are two ways to write attributes in h5lite:
explicitly (targeting an object) or implicitly (saving R
attributes).
You can write an attribute to any existing group or dataset using the
attr argument in h5_write(). This is useful
for adding metadata after the data has been saved.
# First, write a dataset
h5_write(1:10, file, "measurements/temperature")
# Now, attach attributes to it
h5_write(I("Celsius"), file, "measurements/temperature", attr = "units")
h5_write(I("2023-10-27"), file, "measurements/temperature", attr = "date")
h5_write(I(0.1), file, "measurements/temperature", attr = "precision")Note: If the attribute already exists, it will be overwritten.
h5lite automatically preserves custom R attributes
attached to your objects. When you write an R object, any attributes
(except for standard internal ones like dim,
names, or class) are written as HDF5
attributes.
# Create a vector with custom R attributes
data <- rnorm(5)
attr(data, "description") <- I("Randomized control group")
attr(data, "valid") <- I(TRUE)
# Write the object
h5_write(data, file, "experiment/control")
# Check the file - the attributes are there
h5_attr_names(file, "experiment/control")
#> [1] "description" "valid"
h5_str(file)
#> /
#> ├── measurements/
#> │ └── temperature <uint8 × 10>
#> │ ├── @units <utf8[7] scalar>
#> │ ├── @date <utf8[10] scalar>
#> │ └── @precision <float64 scalar>
#> └── experiment/
#> └── control <float64 × 5>
#> ├── @description <utf8[24] scalar>
#> └── @valid <uint8 scalar>If you only need a specific piece of metadata without reading the
full dataset, you can use h5_read(..., attr = "name").
When you read a dataset, h5lite automatically reads all
attached attributes and re-attaches them to the resulting R object.
# Read the full dataset
temps <- h5_read(file, "measurements/temperature")
# The attributes are available in R
attributes(temps)
#> $units
#> [1] "Celsius"
#>
#> $date
#> [1] "2023-10-27"
#>
#> $precision
#> [1] 0.1
str(temps)
#> int [1:10] 1 2 3 4 5 6 7 8 9 10
#> - attr(*, "units")= chr "Celsius"
#> - attr(*, "date")= chr "2023-10-27"
#> - attr(*, "precision")= num 0.1Use h5_attr_names() to list the names of all attributes
attached to a specific object.
While attributes are powerful for storing metadata, they are
fundamentally simpler structures than HDF5 Datasets. HDF5 enforces
specific constraints that affect how h5lite can store
complex R objects as attributes.
HDF5 Dimension Scales (the mechanism
h5lite uses to store names,
dimnames, and row.names) can only be attached
to Datasets. They cannot be attached to attributes.
This means if you write a named vector, matrix, or array as an attribute, the names will be lost.
# A vector with names
named_vec <- c(a = 1, b = 2, c = 3)
# Write as a standard Dataset -> Names are preserved
h5_write(named_vec, file, "my_dataset")
h5_names(file, "my_dataset")
#> [1] "a" "b" "c"
# Write as an Attribute -> Names are LOST
h5_write(named_vec, file, "measurements/temperature", attr = "meta_vec")
h5_names(file, "measurements/temperature", attr = "meta_vec")
#> character(0)Exception: Data Frames There is one major exception:
data.frame objects.
Because HDF5 stores data frames as Compound Types,
the column names are baked into the type definition itself, not stored
as side-loaded metadata. Therefore, column names are
preserved even when writing a data frame as an attribute.
However, row.names (which rely on dimension scales) will
still be lost.
In HDF5, you cannot attach attributes to other attributes. This hierarchy is strictly one level deep: Groups/Datasets can have attributes, but attributes cannot.
Consequently, you cannot treat an attribute as a “Group” or folder to
store other items. If you need a hierarchical structure for your
metadata, you should create a Group (e.g., /metadata) and
store your metadata as Datasets inside it, rather than attaching them as
attributes to another object.
Attributes in HDF5 are typed just like datasets. h5lite
allows you to control the storage type of attributes using the
as argument in h5_write() or
h5_read().
To target an attribute specifically, prefix the name with
@ in the as vector.
# Write the temperature data again, but use a fixed length string for 'description'
h5_write(data, file, "experiment/control", as = c("@description" = "ascii[]"))
# Store an attribute as a `uint8` instead of the default `int32`
h5_write(I(42), file, "measurements/temperature", "sensor_id", as = "uint8")You might notice that standard R attributes like dim are
not visible in h5_attr_names().
This is because h5lite handles structural attributes
implicitly. The dimensions of the attribute data itself are stored in
the HDF5 Dataspace, not as a separate attribute. h5lite
automatically restores the dim attribute on the R object
when reading, ensuring matrices and arrays retain their shape.