Learn R Programming

TDA (version 1.9.1)

maxPersistence: Maximal Persistence Method

Description

Given a point cloud and a function built on top of the data, we are interested in studying the evolution of the sublevel sets (or superlevel sets) of the function, using persistent homology. The Maximal Persistence Method selects the optimal smoothing parameter of the function, by maximizing the number of significant topological features, or by maximizing the total significant persistence of the features. For each value of the smoothing parameter, the function maxPersistence computes a persistence diagram using gridDiag and returns the values of the two criteria, the dimension of detected features, their persistence, and a bootstrapped confidence band. The features that fall outside of the band are statistically significant. See References.

Usage

maxPersistence(
    FUN, parameters, X, lim, by,
    maxdimension = length(lim) / 2 - 1, sublevel = TRUE,
    library = "GUDHI", B = 30, alpha = 0.05,
    bandFUN = "bootstrapBand", distance = "bottleneck",
    dimension = min(1, maxdimension), p = 1, parallel = FALSE,
    printProgress = FALSE, weight = NULL)

Value

The function maxPersistence returns an object of the class "maxPersistence", a list with the following components

parameters

the same vector parameters given in input

sigNumber

a numeric vector storing the number of significant features in the persistence diagrams computed using each value in parameters

sigPersistence

a numeric vector storing the sum of significant persistence of the features in the persistence diagrams, computed using each value in parameters

bands

a numeric vector storing the bootstrap band's width, for each value in parameters

Persistence

a list of the same lenght of parameters. Each element of the list is a \(P_i\) by 2 matrix, where \(P_i\) is the number of features found using the parameter \(i\): the first column stores the dimension of each feature and the second column the persistence abs(death-birth|).

Arguments

FUN

the name of a function whose inputs are: 1) X, a \(n\) by \(d\) matrix of coordinates of the input point cloud, where \(d\) is the dimension of the space; 2) a matrix of coordinates of points forming a grid at which the function can be evaluated (note that this grid is not passed as an input, but is automatically computed by maxPersistence); 3) a real valued smoothing parameter. For example, see kde, dtm, kernelDist.

parameters

a numerical vector, storing a sequence of values for the smoothing paramter of FUN among which maxPersistence will select the optimal ones.

X

a \(n\) by \(d\) matrix of coordinates of the input point cloud, where \(d\) is the dimension of the space.

lim

a \(2\) by \(d\) matrix, where each column specifying the range of each dimension of the grid, over which the function FUN is evaluated.

by

either a number or a vector of length \(d\) specifying space between points of the grid in each dimension. If a number is given, then same space is used in each dimension.

maxdimension

a number that indicates the maximum dimension to compute persistent homology to. The default value is \(d - 1\), which is (dimension of embedding space - 1).

sublevel

a logical variable indicating if the persistent homology should be computed for sublevel sets of FUN (TRUE) or superlevel sets (FALSE). The default value is TRUE.

library

a string specifying which library to compute the persistence diagram. The user can choose either the library "GUDHI", "Dionysus", or "PHAT". The default value is "GUDHI".

bandFUN

the function to be used in the computation of the confidence band. Either "bootstrapDiagram" or "bootstrapBand".

B

the number of bootstrap iterations.

alpha

for each value store in parameters, maxPersistence computes a (1-alpha) confidence band.

distance

optional (if bandFUN == bootstrapDiagram): a string specifying the distance to be used for persistence diagrams: either "bottleneck" or "wasserstein"

dimension

optional (if bandFUN == bootstrapDiagram): an integer or a vector specifying the dimension of the features used to compute the bottleneck distance. 0 for connected components, 1 for loops, 2 for voids. The default value is 1.

p

optional (if bandFUN == bootstrapDiagram AND distance == "wasserstein"): integer specifying the power to be used in the computation of the Wasserstein distance. The default value is 1.

parallel

logical: if TRUE, the bootstrap iterations are parallelized, using the library parallel.

printProgress

if TRUE, a progress bar is printed. The default value is FALSE.

weight

either NULL, a number, or a vector of length \(n\). If it is NULL, weight is not used. If it is a number, then same weight is applied to each points of X. If it is a vector, weight represents weights of each points of X.

Author

Jisu Kim and Fabrizio Lecci

Details

The function maxPersistence calls the gridDiag function, which computes the persistence diagram of sublevel (or superlevel) sets of a function, evaluated over a grid of points.

References

Chazal F, Cisewski J, Fasy BT, Lecci F, Michel B, Rinaldo A, Wasserman L (2014). "Robust Topological Inference: distance-to-a-measure and kernel distance."

Fasy BT, Lecci F, Rinaldo A, Wasserman L, Balakrishnan S, Singh A (2013). "Statistical Inference For Persistent Homology", (arXiv:1303.7117). Annals of Statistics.

See Also

gridDiag, kde, kernelDist, dtm, bootstrapBand

Examples

Run this code
## input data: circle with clutter noise
n <- 600
percNoise <- 0.1
XX1 <- circleUnif(n)
noise <- cbind(runif(percNoise * n, -2, 2), runif(percNoise * n, -2, 2))
X <- rbind(XX1, noise)

## limits of the Gird at which the density estimator is evaluated
Xlim <- c(-2, 2)
Ylim <- c(-2, 2)
lim <- cbind(Xlim, Ylim)
by <- 0.2

B <- 80
alpha <- 0.05

## candidates
parametersKDE <- seq(0.1, 0.5, by = 0.2)

maxKDE <- maxPersistence(kde, parametersKDE, X, lim = lim, by = by,
                         bandFUN = "bootstrapBand", B = B, alpha = alpha,
                         parallel = FALSE, printProgress = TRUE)
print(summary(maxKDE))

par(mfrow = c(1,2))
plot(X, pch = 16, cex = 0.5, main = "Circle")
plot(maxKDE)

Run the code above in your browser using DataLab