Learn R Programming

TDA (version 1.9.1)

bootstrapBand: Bootstrap Confidence Band

Description

The function bootstrapBand computes a uniform symmetric confidence band around a function of the data X, evaluated on a Grid, using the bootstrap algorithm. See Details and References.

Usage

bootstrapBand(
    X, FUN, Grid, B = 30, alpha = 0.05, parallel = FALSE,
    printProgress = FALSE, weight = NULL, ...)

Value

The function bootstrapBand returns a list with the following elements:

width

number: (1-alpha) quantile of the values computed by the bootstrap algorithm. It corresponds to half of the width of the unfiorm confidence band; that is, width is the distance of the upper and lower limits of the band from the function evaluated using the original dataset X.

fun

a numeric vector of length \(m\), storing the values of the input function FUN, evaluated on the Grid using the original data X.

band

an \(m\) by 2 matrix that stores the values of the lower limit of the confidence band (first column) and upper limit of the confidence band (second column), evaluated over the Grid.

Arguments

X

an \(n\) by \(d\) matrix of coordinates of points used by the function FUN, where \(n\) is the number of points and \(d\) is the dimension.

FUN

a function whose inputs are an \(n\) by \(d\) matrix of coordinates X, an \(m\) by \(d\) matrix of coordinates Grid and returns a numeric vector of length \(m\). For example see distFct, kde, and dtm which compute the distance function, the kernel density estimator and the distance to measure over a grid of points, using the input X.

Grid

an \(m\) by \(d\) matrix of coordinates, where \(m\) is the number of points in the grid, at which FUN is evaluated.

B

the number of bootstrap iterations.

alpha

bootstrapBand returns a (1-alpha) confidence band. The default value is 0.05.

parallel

logical: if TRUE the bootstrap iterations are parallelized, using the library parallel. The default value is FALSE.

printProgress

if TRUE, a progress bar is printed. The default value is FALSE.

weight

either NULL, a number, or a vector of length \(n\). If it is NULL, weight is not used. If it is a number, then same weight is applied to each points of X. If it is a vector, weight represents weights of each points of X. The default value is NULL.

...

additional parameters for the function FUN.

Author

Jisu Kim and Fabrizio Lecci

Details

First, the input function FUN is evaluated on the Grid using the original data X. Then, for B times, the bootstrap algorithm subsamples n points of X (with replacement), evaluates the function FUN on the Grid using the subsample, and computes the \(\ell_\infty\) distance between the original function and the bootstrapped one. The result is a sequence of B values. The (1-alpha) confidence band is constructed by taking the (1-alpha) quantile of these values.

References

Wasserman L (2004). "All of statistics: a concise course in statistical inference." Springer.

Fasy BT, Lecci F, Rinaldo A, Wasserman L, Balakrishnan S, Singh A (2013). "Statistical Inference For Persistent Homology: Confidence Sets for Persistence Diagrams." (arXiv:1303.7117). Annals of Statistics.

Chazal F, Fasy BT, Lecci F, Michel B, Rinaldo A, Wasserman L (2014). "Robust Topological Inference: Distance-To-a-Measure and Kernel Distance." Technical Report.

See Also

kde, dtm

Examples

Run this code
# Generate data from mixture of 2 normals.
n <- 2000
X <- c(rnorm(n / 2), rnorm(n / 2, mean = 3, sd = 1.2))

# Construct a grid of points over which we evaluate the function
by <- 0.02
Grid <- seq(-3, 6, by = by)

## bandwidth for kernel density estimator
h <- 0.3
## Bootstrap confidence band
band <- bootstrapBand(X, kde, Grid, B = 80, parallel = FALSE, alpha = 0.05,
                      h = h)

plot(Grid, band[["fun"]], type = "l", lwd = 2,
     ylim = c(0, max(band[["band"]])), main = "kde with 0.95 confidence band")
lines(Grid, pmax(band[["band"]][, 1], 0), col = 2, lwd = 2)
lines(Grid, band[["band"]][, 2], col = 2, lwd = 2)

Run the code above in your browser using DataLab