phutil

R-CMD-check Codecov test coverage

The goal of the {phutil} package is to provide utility functions for persistence data analysis. In particular, the package defines a data structure for hosting persistence data. The package is part of the TDAverse suite of packages, which aims to provide a consistent interface for topological data analysis in R that nicely integrates with the tidymodels ecosystem.

Installation

You can install the development version of {phutil} from GitHub with:

# install.packages("pak")
pak::pak("tdaverse/phutil")

The persistence class

The package currently provides a new data structure for hosting persistence data. The data set noisy_circle_points is a simulated data set consisting of 100 points sampled from a circle with additive Gaussian noise using a standard deviation of \(0.05\):

library(phutil)
head(noisy_circle_points)
#>               x          y
#> [1,]  0.6651721  0.6363172
#> [2,] -0.7481066 -0.6901262
#> [3,] -0.8288544 -0.5519689
#> [4,] -0.7650181 -0.7436499
#> [5,]  0.6337296 -0.7607464
#> [6,] -0.6077663 -0.7036492

The data set noisy_circle_ripserr is a persistence diagram computed using the ripserr package with the ripserr::vietoris_rips() and stored as an object of class PHom, which is a light wrapper around a data frame with 3 variables:

head(noisy_circle_ripserr)
#>   dimension birth       death
#> 1         0     0 0.008162723
#> 2         0     0 0.008548402
#> 3         0     0 0.009238432
#> 4         0     0 0.014004707
#> 5         0     0 0.020555701
#> 6         0     0 0.024621462

The data set noisy_circle_tda_rips is a persistence diagram computed using the TDA package with the TDA::ripsDiag() and stored as a list of length 1 containing the object diagram of class diagram, which is a matrix with 3 columns:

head(noisy_circle_tda_rips$diagram)
#>      dimension Birth     Death
#> [1,]         0     0 1.6322000
#> [2,]         0     0 0.3677392
#> [3,]         0     0 0.3325817
#> [4,]         0     0 0.3226673
#> [5,]         0     0 0.2096916
#> [6,]         0     0 0.1953531

The data structure adopted by the TDAverse suite of packages and provided by this package is designed to be a common interface for persistence data. This allows for a seamless integration of persistence data across different packages and ensures that the data is always in a consistent format.

Specifically, the persistence class is a list with the following components:

The as_persistence() function is used to convert persistence data from different packages into the persistence class. For example, the persistence data computed using the ripserr package can be converted into the persistence class as follows:

as_persistence(noisy_circle_ripserr)
#> 
#> ── Persistence Data ────────────────────────────────────────────────────────────
#> ℹ There are 99 and 2 pairs in dimensions 0 and 1 respectively.
#> ℹ Computed from a Vietoris-Rips/Cubical filtration using `ripserr::<vietoris_rips/cubical>()`.
#> ! With unknown parameters.

Similarly, the persistence data computed using the TDA package can be converted into the persistence class as follows:

as_persistence(noisy_circle_tda_rips)
#> 
#> ── Persistence Data ────────────────────────────────────────────────────────────
#> ℹ There are 100 and 2 pairs in dimensions 0 and 1 respectively.
#> ℹ Computed from a Vietoris-Rips filtration using `TDA::ripsDiag()`.
#> ℹ With the following parameters: maxdimension = 1 and maxscale = 1.6322.

Distances between persistence diagrams

The package also provides a function to compute the bottleneck distance between two persistence diagrams. The function bottleneck_distance() takes two persistence diagrams as input and computes the bottleneck distance between them:

library(phutil)
diag1 <- persistence_sample[[1]]
diag2 <- persistence_sample[[2]]
bottleneck_distance(diag1, diag2)
#> [1] 0.06197453

One can also compute the Wasserstein distance between two persistence diagrams using the wasserstein_distance() function:

wasserstein_distance(diag1, diag2)
#> [1] 1.53403

It is also possible to compute these distances between all pairs of persistence diagrams in a list using e.g. wasserstein_pairwise_distances():

wasserstein_pairwise_distances(persistence_sample[1:5])
#>          1        2        3        4
#> 2 1.534030                           
#> 3 2.301812 1.852241                  
#> 4 2.354967 2.362616 1.922799         
#> 5 2.374818 2.707366 1.985518 3.154120

It returns an object of class dist. It is parallelized using OpenMP and relies under the hood on the Hera C++ library which is BSD-licensed. The code interfacing R and C++ is generated by the header-only {cpp11} package which is MIT-licensed.

Contributions

Code of Conduct

Please note that the {phutil} package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Acknowledgments

This project was funded by an ISC grant from the R Consortium and done in coordination with Jason Cory Brunson and with guidance from Bertrand Michel and Paul Rosen. It builds upon conversations with Mathieu Carrière and Vincent Rouvreau who are among the authors of the GUDHI library. Package development also benefited from the support of colleagues at the Department of Mathematics Jean Leray and the use of equipment at Nantes University.

mirror server hosted at Truenetwork, Russian Federation.