Learn R Programming

TDA (version 1.9.1)

bootstrapDiagram: Bootstrapped Confidence Set for a Persistence Diagram, using the Bottleneck Distance (or the Wasserstein distance).

Description

The function bootstrapDiagram computes a (1-alpha) confidence set for the Persistence Diagram of a filtration of sublevel sets (or superlevel sets) of a function evaluated over a grid of points. The function returns the (1-alpha) quantile of B bottleneck distances (or Wasserstein distances), computed in B iterations of the bootstrap algorithm.

Usage

bootstrapDiagram(
    X, FUN, lim, by, maxdimension = length(lim) / 2 - 1,
    sublevel = TRUE, library = "GUDHI", B = 30, alpha = 0.05,
    distance = "bottleneck", dimension = min(1, maxdimension),
	p = 1, parallel = FALSE, printProgress = FALSE, weight = NULL,
    ...)

Value

The function bootstrapDiagram returns the (1-alpha) quantile of the values computed by the bootstrap algorithm.

Arguments

X

an \(n\) by \(d\) matrix of coordinates, used by the function FUN, where \(n\) is the number of points stored in X and \(d\) is the dimension of the space.

FUN

a function whose inputs are 1) an \(n\) by \(d\) matrix of coordinates X, 2) an \(m\) by \(d\) matrix of coordinates Grid, 3) an optional smoothing parameter, and returns a numeric vector of length \(m\). For example see distFct, kde, and dtm which compute the distance function, the kernel density estimator and the distance to measure, over a grid of points using the input X. Note that Grid is not an input of bootstrapDiagram, but is automatically computed by the function using lim and by.

lim

a \(2\) by \(d\) matrix, where each column specifies the range of each dimension of the grid, over which the function FUN is evaluated.

by

either a number or a vector of length \(d\) specifying space between points of the grid in each dimension. If a number is given, then same space is used in each dimension.

maxdimension

a number that indicates the maximum dimension to compute persistent homology to. The default value is \(d - 1\), which is (dimension of embedding space - 1).

sublevel

a logical variable indicating if the Persistence Diagram should be computed for sublevel sets (TRUE) or superlevel sets (FALSE) of the function. The default value is TRUE.

library

a string specifying which library to compute the persistence diagram. The user can choose either the library "GUDHI", "Dionysus", or "PHAT". The default value is "GUDHI".

B

the number of bootstrap iterations. The default value is 30.

alpha

The function bootstrapDiagram returns a (1 - alpha) quantile. The default value is 0.05.

distance

a string specifying the distance to be used for persistence diagrams: either "bottleneck" or "wasserstein". The default value is "bottleneck".

dimension

dimension is an integer or a vector specifying the dimension of the features used to compute the bottleneck distance. 0 for connected components, 1 for loops, 2 for voids, and so on. The default value is 1 if \(maxdimension \ge 1\), and else 0.

p

if distance == "wasserstein", then p is an integer specifying the power to be used in the computation of the Wasserstein distance. The default value is 1.

parallel

logical: if TRUE the bootstrap iterations are parallelized, using the library parallel. The default value is FALSE.

printProgress

if TRUE a progress bar is printed. The default value is FALSE.

weight

either NULL, a number, or a vector of length \(n\). If it is NULL, weight is not used. If it is a number, then same weight is applied to each points of X. If it is a vector, weight represents weights of each points of X. The default value is NULL.

...

additional parameters for the function FUN.

Author

Jisu Kim and Fabrizio Lecci

Details

The function bootstrapDiagram uses gridDiag to compute the persistence diagram of the input function using the entire sample. Then the bootstrap algorithm, for B times, computes the bottleneck distance between the original persistence diagram and the one computed using a subsample. Finally the (1-alpha) quantile of these B values is returned. See (Chazal, Fasy, Lecci, Michel, Rinaldo, and Wasserman, 2014) for discussion of the method.

References

Chazal F, Fasy BT, Lecci F, Michel B, Rinaldo A, Wasserman L (2014). "Robust Topological Inference: Distance-To-a-Measure and Kernel Distance." Technical Report.

Wasserman L (2004), "All of statistics: a concise course in statistical inference." Springer.

Morozov D (2007). "Dionysus, a C++ library for computing persistent homology." https://www.mrzv.org/software/dionysus/

See Also

bottleneck, bootstrapBand, distFct, kde, kernelDist, dtm, summary.diagram, plot.diagram

Examples

Run this code
## confidence set for the Kernel Density Diagram

# input data
n <- 400
XX <- circleUnif(n)

## Ranges of the grid
Xlim <- c(-1.8, 1.8)
Ylim <- c(-1.6, 1.6)
lim <- cbind(Xlim, Ylim)
by <- 0.05

h <- .3  #bandwidth for the function kde

#Kernel Density Diagram of the superlevel sets
Diag <- gridDiag(XX, kde, lim = lim, by = by, sublevel = FALSE,
                 printProgress = TRUE, h = h) 

# confidence set
B <- 10       ## the number of bootstrap iterations should be higher!
              ## this is just an example
alpha <- 0.05

cc <- bootstrapDiagram(XX, kde, lim = lim, by = by, sublevel = FALSE, B = B,
          alpha = alpha, dimension = 1, printProgress = TRUE, h = h)

plot(Diag[["diagram"]], band = 2 * cc)

Run the code above in your browser using DataLab