Given a point cloud and a function built on top of the data, we are interested in studying the evolution of the sublevel sets (or superlevel sets) of the function, using persistent homology. The Maximal Persistence Method selects the optimal smoothing parameter of the function, by maximizing the number of significant topological features, or by maximizing the total significant persistence of the features. For each value of the smoothing parameter, the function maxPersistence
computes a persistence diagram using gridDiag
and returns the values of the two criteria, the dimension of detected features, their persistence, and a bootstrapped confidence band. The features that fall outside of the band are statistically significant. See References.
maxPersistence(
FUN, parameters, X, lim, by,
maxdimension = length(lim) / 2 - 1, sublevel = TRUE,
library = "GUDHI", B = 30, alpha = 0.05,
bandFUN = "bootstrapBand", distance = "bottleneck",
dimension = min(1, maxdimension), p = 1, parallel = FALSE,
printProgress = FALSE, weight = NULL)
The function maxPersistence
returns an object of the class "maxPersistence", a list with the following components
the same vector parameters
given in input
a numeric vector storing the number of significant features in the persistence diagrams computed using each value in parameters
a numeric vector storing the sum of significant persistence of the features in the persistence diagrams, computed using each value in parameters
a numeric vector storing the bootstrap band's width, for each value in parameters
a list of the same lenght of parameters
. Each element of the list is a \(P_i\) by 2 matrix, where \(P_i\) is the number of features found using the parameter \(i\): the first column stores the dimension of each feature and the second column the persistence abs(death-birth|).
the name of a function whose inputs are: 1) X
, a \(n\) by \(d\) matrix of coordinates of the input point cloud, where \(d\) is the dimension of the space; 2) a matrix of coordinates of points forming a grid at which the function can be evaluated (note that this grid is not passed as an input, but is automatically computed by maxPersistence
); 3) a real valued smoothing parameter. For example, see kde
, dtm
, kernelDist
.
a numerical vector, storing a sequence of values for the smoothing paramter of FUN
among which maxPersistence
will select the optimal ones.
a \(n\) by \(d\) matrix of coordinates of the input point cloud, where \(d\) is the dimension of the space.
a \(2\) by \(d\) matrix, where each column specifying the range of each dimension of the grid, over which the function FUN
is evaluated.
either a number or a vector of length \(d\) specifying space between points of the grid in each dimension. If a number is given, then same space is used in each dimension.
a number that indicates the maximum dimension to compute persistent homology to. The default value is \(d - 1\), which is (dimension of embedding space - 1).
a logical variable indicating if the persistent homology should be computed for sublevel sets of FUN
(TRUE
) or superlevel sets (FALSE
). The default value is TRUE
.
a string specifying which library to compute the persistence diagram. The user can choose either the library "GUDHI"
, "Dionysus"
, or "PHAT"
. The default value is "GUDHI"
.
the function to be used in the computation of the confidence band. Either "bootstrapDiagram"
or "bootstrapBand"
.
the number of bootstrap iterations.
for each value store in parameters
, maxPersistence
computes a (1-alpha
) confidence band.
optional (if bandFUN == bootstrapDiagram): a string specifying the distance to be used for persistence diagrams: either "bottleneck"
or "wasserstein"
optional (if bandFUN == bootstrapDiagram): an integer or a vector specifying the dimension of the features used to compute the bottleneck distance. 0 for connected components, 1 for loops, 2 for voids. The default value is 1
.
optional (if bandFUN == bootstrapDiagram AND distance == "wasserstein"): integer specifying the power to be used in the computation of the Wasserstein distance. The default value is 1
.
logical: if TRUE
, the bootstrap iterations are parallelized, using the library parallel
.
if TRUE
, a progress bar is printed. The default value is FALSE
.
either NULL, a number, or a vector of length \(n\). If it is NULL, weight is not used. If it is a number, then same weight is applied to each points of X
. If it is a vector, weight
represents weights of each points of X
.
Jisu Kim and Fabrizio Lecci
The function maxPersistence
calls the gridDiag
function, which computes the persistence diagram of sublevel (or superlevel) sets of a function, evaluated over a grid of points.
Chazal F, Cisewski J, Fasy BT, Lecci F, Michel B, Rinaldo A, Wasserman L (2014). "Robust Topological Inference: distance-to-a-measure and kernel distance."
Fasy BT, Lecci F, Rinaldo A, Wasserman L, Balakrishnan S, Singh A (2013). "Statistical Inference For Persistent Homology", (arXiv:1303.7117). Annals of Statistics.
gridDiag
, kde
, kernelDist
, dtm
, bootstrapBand
## input data: circle with clutter noise
n <- 600
percNoise <- 0.1
XX1 <- circleUnif(n)
noise <- cbind(runif(percNoise * n, -2, 2), runif(percNoise * n, -2, 2))
X <- rbind(XX1, noise)
## limits of the Gird at which the density estimator is evaluated
Xlim <- c(-2, 2)
Ylim <- c(-2, 2)
lim <- cbind(Xlim, Ylim)
by <- 0.2
B <- 80
alpha <- 0.05
## candidates
parametersKDE <- seq(0.1, 0.5, by = 0.2)
maxKDE <- maxPersistence(kde, parametersKDE, X, lim = lim, by = by,
bandFUN = "bootstrapBand", B = B, alpha = alpha,
parallel = FALSE, printProgress = TRUE)
print(summary(maxKDE))
par(mfrow = c(1,2))
plot(X, pch = 16, cex = 0.5, main = "Circle")
plot(maxKDE)
Run the code above in your browser using DataLab