The goal of the {phutil} package is to provide utility functions for persistence data analysis. In particular, the package defines a data structure for hosting persistence data. The package is part of the TDAverse suite of packages, which aims to provide a consistent interface for topological data analysis in R that nicely integrates with the tidymodels ecosystem.
You can install the development version of {phutil} from GitHub with:
# install.packages("pak")
::pak("tdaverse/phutil") pak
persistence
classThe package currently provides a new data structure for hosting
persistence data. The data set noisy_circle_points
is a
simulated data set consisting of 100 points sampled from a circle with
additive Gaussian noise using a standard deviation of \(0.05\):
library(phutil)
head(noisy_circle_points)
#> x y
#> [1,] 0.6651721 0.6363172
#> [2,] -0.7481066 -0.6901262
#> [3,] -0.8288544 -0.5519689
#> [4,] -0.7650181 -0.7436499
#> [5,] 0.6337296 -0.7607464
#> [6,] -0.6077663 -0.7036492
The data set noisy_circle_ripserr
is a persistence
diagram computed using the ripserr
package with the ripserr::vietoris_rips()
and stored as an object of class PHom
, which is a light
wrapper around a data frame with 3 variables:
head(noisy_circle_ripserr)
#> dimension birth death
#> 1 0 0 0.008162723
#> 2 0 0 0.008548402
#> 3 0 0 0.009238432
#> 4 0 0 0.014004707
#> 5 0 0 0.020555701
#> 6 0 0 0.024621462
The data set noisy_circle_tda_rips
is a persistence
diagram computed using the TDA package with the TDA::ripsDiag()
and stored as a list of length 1 containing the object
diagram
of class diagram
, which is a matrix
with 3 columns:
head(noisy_circle_tda_rips$diagram)
#> dimension Birth Death
#> [1,] 0 0 1.6322000
#> [2,] 0 0 0.3677392
#> [3,] 0 0 0.3325817
#> [4,] 0 0 0.3226673
#> [5,] 0 0 0.2096916
#> [6,] 0 0 0.1953531
The data structure adopted by the TDAverse suite of packages and provided by this package is designed to be a common interface for persistence data. This allows for a seamless integration of persistence data across different packages and ensures that the data is always in a consistent format.
Specifically, the persistence
class is a list with the
following components:
pairs
: A list of 2-column matrices containing
birth-death pairs. The -th element of the list corresponds to
the -th homology dimension. If there is no pairs for a given
dimension but there are pairs in higher dimensions, the corresponding
element(s) is/are filled with a numeric matrix.
metadata
: A list of length 6 containing information
about how the data was computed:
orderered_pairs
: A boolean indicating whether the pairs
in the pairs
list are ordered (i.e. the first column is
strictly less than the second column).data
: The name of the object containing the original
data on which the persistence data was computed.engine
: The name of the package and the function of
this package that computed the persistence data in the form
"package_name::package_function"
.filtration
: The filtration used in the computation in a
human-readable format (i.e. full names, capitals where need, etc.).parameters
: A list of parameters used in the
computation.call
: The exact call that generated the persistence
data.The as_persistence()
function is used to convert
persistence data from different packages into the
persistence
class. For example, the persistence data
computed using the ripserr package can be converted
into the persistence
class as follows:
as_persistence(noisy_circle_ripserr)
#>
#> ── Persistence Data ────────────────────────────────────────────────────────────
#> ℹ There are 99 and 2 pairs in dimensions 0 and 1 respectively.
#> ℹ Computed from a Vietoris-Rips/Cubical filtration using `ripserr::<vietoris_rips/cubical>()`.
#> ! With unknown parameters.
Similarly, the persistence data computed using the
TDA package can be converted into the
persistence
class as follows:
as_persistence(noisy_circle_tda_rips)
#>
#> ── Persistence Data ────────────────────────────────────────────────────────────
#> ℹ There are 100 and 2 pairs in dimensions 0 and 1 respectively.
#> ℹ Computed from a Vietoris-Rips filtration using `TDA::ripsDiag()`.
#> ℹ With the following parameters: maxdimension = 1 and maxscale = 1.6322.
The package also provides a function to compute the bottleneck
distance between two persistence diagrams. The function
bottleneck_distance()
takes two persistence diagrams as
input and computes the bottleneck distance between them:
library(phutil)
<- persistence_sample[[1]]
diag1 <- persistence_sample[[2]]
diag2 bottleneck_distance(diag1, diag2)
#> [1] 0.06197453
One can also compute the Wasserstein distance between two persistence
diagrams using the wasserstein_distance()
function:
wasserstein_distance(diag1, diag2)
#> [1] 1.53403
It is also possible to compute these distances between all pairs of
persistence diagrams in a list using
e.g. wasserstein_pairwise_distances()
:
wasserstein_pairwise_distances(persistence_sample[1:5])
#> 1 2 3 4
#> 2 1.534030
#> 3 2.301812 1.852241
#> 4 2.354967 2.362616 1.922799
#> 5 2.374818 2.707366 1.985518 3.154120
It returns an object of class dist
. It is parallelized
using OpenMP and relies under the
hood on the Hera C++
library which is BSD-licensed. The code interfacing R and C++ is
generated by the header-only {cpp11} package which is
MIT-licensed.
Please note that the {phutil} package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
This project was funded by an ISC grant from the R Consortium and done in coordination with Jason Cory Brunson and with guidance from Bertrand Michel and Paul Rosen. It builds upon conversations with Mathieu Carrière and Vincent Rouvreau who are among the authors of the GUDHI library. Package development also benefited from the support of colleagues at the Department of Mathematics Jean Leray and the use of equipment at Nantes University.