h5lite is designed to seamlessly map R’s diverse data
structures to HDF5’s portable format. This vignette explains the
supported R data types, how h5lite writes them to HDF5, and
how you can precisely control data types and compression when
needed.
h5lite supports reading and writing a wide range of R
data types. The table below lists the default mapping when writing to
HDF5.
| R Data Type | HDF5 Equivalent | Description |
|---|---|---|
| Numeric | variable | Selects optimal type: uint8,
float32, etc. |
| Logical | H5T_STD_U8LE |
Stored as 0 (FALSE) or 1 (TRUE)
(uint8). |
| Character | H5T_STRING |
Variable or fixed-length UTF-8 strings. |
| Complex | H5T_COMPLEX |
Native HDF5 2.0+ complex numbers. |
| Raw | H5T_OPAQUE |
Raw bytes / binary data. |
| Factor | H5T_ENUM |
Integer indices with label mapping. |
| integer64 | H5T_STD_I64LE |
64-bit signed integers via bit64
package. |
| POSIXt | H5T_STRING |
ISO 8601 string
(YYYY-MM-DDTHH:MM:SSZ). |
| List | H5O_TYPE_GROUP |
Recursive container structure. |
| Data Frame | H5T_COMPOUND |
Table of mixed types. |
| NULL | H5S_NULL |
Creates a placeholder. |
Atomic data types (Integer, integer64, Double, Logical, Character, Complex, Raw, and POSIXt) can be written to HDF5 as scalars, 1D vectors, or N-dimensional arrays.
I().dim
attributes are written as N-dimensional datasets, preserving their
shape.# 1. Scalar (0 dims)
h5_write(I(42), file, "structure/scalar")
# 2. Vector (1 dim)
h5_write(c(1, 2, 3), file, "structure/vector")
# 3. Matrix (2 dims)
h5_write(matrix(1:9, 3, 3), file, "structure/matrix")For more complex dimensional structures, refer to
vignette('matrices').
R uses 32-bit integers and 64-bit doubles. When writing with
as = "auto", h5lite analyzes the range of your
data to select the most compact HDF5 type.
float64
(H5T_IEEE_F64LE)float64.int[8|16|32|64], uint[8|16|32|64],
float[16|32|64], or bfloat16.integer64)int64
(H5T_STD_I64LE)R does not natively support 64-bit integers, but h5lite
supports reading and writing them via the bit64
package.
R’s default numeric type is double-precision.
float64
(H5T_IEEE_F64LE)int[8|16|32|64],
uint[8|16|32|64], float[16|32|64], or
bfloat16uint8
(H5T_STD_U8LE)float64
(H5T_IEEE_F64LE)int[8|16|32|64],
uint[8|16|32|64], float[16|32|64], or
bfloat16HDF5 supports two methods for storing strings. By default
(as = "auto"), h5lite chooses the best
approach:
NA or if string lengths are highly inconsistent.NA to allow for compression.Explicitly requested with as = "utf8" or
as = "ascii".
NA: YESUse as = "ascii[10]"/as = "utf8[10]"
(explicit size=10) or
as = "ascii[]"/as = "utf8[]" (auto-detect max
length).
NA: NO# UTF-8 auto-detected fixed length
h5_write(c("apple", "banana"), file, "strings/fixed_utf8", as = "utf8[]")
# ASCII fixed length (1 byte)
h5_write(c("A", "B", "C"), file, "strings/fixed_ascii", as = "ascii[1]")Technical Note:
h5liteusesH5T_C_S1for all strings, andH5T_STR_NULLTERMfor all fixed length strings.
POSIXt)R date-time objects (POSIXct / POSIXlt) are
stored as Strings in ISO 8601 format
(YYYY-MM-DDTHH:MM:SSZ). This ensures maximum portability
with other languages and HDF5 tools that do not share R’s specific
epoch-based integer representation.
R complex numbers are written using the new complex floating-point
type introduced in HDF5 2.0.0 (H5T_COMPLEX_IEEE_F64LE).
Compatibility Warning: This data type for complex numbers is a feature specific to HDF5 version 2.0+. Datasets written with this type generally cannot be read by HDF5 readers built against older versions of the library (e.g., HDF5 1.10 or 1.12). Ensure that any downstream tools or libraries used to read these files are updated to support HDF5 2.0 standards.
Raw vectors (bytes) are stored as HDF5 OPAQUE types.
This is ideal for storing binary blobs, images, or serialized objects
where you need to preserve the exact byte sequence without
interpretation.
R Factors are stored as HDF5 ENUM types. This maps the
integer codes to the factor levels (labels) efficiently within the file
header, ensuring the labels are preserved without duplicating string
data for every element.
R lists are mapped to HDF5 Groups. Since lists are
recursive containers, h5lite walks the list and creates a
dataset (or subgroup) for every element found. You can use
as = c("element_name" = "skip") to exclude specific
items.
Data Frames are stored as HDF5 Compound types
(tables). This ensures that rows are kept together in memory. You can
use the as argument to specify the type of individual
columns.
For a comprehensive guide, see
vignette('data-frames').
The NULL object in R is mapped to a dataset with a
NULL Dataspace (H5S_NULL). This creates a
dataset that exists in the file structure but contains no data elements
and consumes no storage space.
HDF5 supports transparent data compression using the zlib (deflate)
algorithm. You can control the compression intensity using the
compress argument.
TRUE: Enables standard compression
(Level 5).FALSE / 0: Disables
compression.1 - 9: Specific
compression level (1 = fastest, 9 = most compressed).When compression is enabled (level > 0), h5lite
automatically applies the HDF5 Byte Shuffle Filter
before the data is compressed. The Shuffle Filter does not compress data
itself; rather, it rearranges the byte stream to make it more
compressible by zlib.
It works by separating the bytes of each value by their significance. For example, in a 4-byte integer array:
Why this helps: * Integers: Small
integers often have many zero-padding bytes. The shuffle filter groups
these zeros into long runs, which zlib compresses extremely efficiently.
This allows int32 data to compress nearly as well as
int8 data if the values are small. *
Doubles: Floating point numbers often share the same
exponent bytes if they are in a similar range. The shuffle filter groups
these identical exponent bytes, creating repetitive patterns that zlib
can compress.