Learn R Programming

feature (version 1.2.15)

featureSignif: Feature significance for kernel density estimation

Description

Identify significant features of kernel density estimates of 1- to 4-dimensional data.

Usage

featureSignif(x, bw, gridsize, scaleData=FALSE, addSignifGrad=TRUE,
   addSignifCurv=TRUE, signifLevel=0.05)

Arguments

x

data matrix

bw

vector of bandwidth(s)

gridsize

vector of estimation grid sizes

scaleData

flag for scaling the data i.e. transforming to unit variance for each dimension.

addSignifGrad

flag for computing significant gradient regions

addSignifCurv

flag for computing significant curvature regions

signifLevel

significance level

Value

Returns an object of class fs which is a list with the following fields

x

data matrix

names

name labels used for plotting

bw

vector of bandwidths

fhat

kernel density estimate on a grid

grad

logical grid for significant gradient

curv

logical grid for significant curvature

gradData

logical vector for significant gradient data points

gradDataPoints

significant gradient data points

curvData

logical vector for significant curvature data points

curvDataPoints

significant curvature data points

Details

Feature significance is based on significance testing of the gradient (first derivative) and curvature (second derivative) of a kernel density estimate. This was developed for 1-d data by Chaudhuri & Marron (1995), for 2-d data by Godtliebsen, Marron & Chaudhuri (1999), and for 3-d and 4-d data by Duong, Cowling, Koch & Wand (2007).

The test statistic for gradient testing is at a point \(\mathbf{x}\) is $$W(\mathbf{x}) = \Vert \widehat{\nabla f} (\mathbf{x}; \mathbf{H}) \Vert^2$$ where \(\widehat{\nabla f} (\mathbf{x};\mathbf{H})\) is kernel estimate of the gradient of \(f(\mathbf{x})\) with bandwidth \(\mathbf{H}\), and \(\Vert\cdot\Vert\) is the Euclidean norm. \(W(\mathbf{x})\) is approximately chi-squared distributed with \(d\) degrees of freedom where \(d\) is the dimension of the data.

The analogous test statistic for the curvature is $$W^{(2)}(\mathbf{x}) = \Vert \mathrm{vech} \widehat{\nabla^{(2)}f} (\mathbf{x}; \mathbf{H})\Vert ^2$$ where \(\widehat{\nabla^{(2)} f} (\mathbf{x};\mathbf{H})\) is the kernel estimate of the curvature of \(f(\mathbf{x})\), and vech is the vector-half operator. \(W^{(2)}(\mathbf{x})\) is approximately chi-squared distributed with \(d(d+1)/2\) degrees of freedom.

Since this is a situation with many dependent hypothesis tests, we use the Hochberg multiple comparison testing procedure to control the overall level of significance. See Hochberg (1988) and Duong, Cowling, Koch & Wand (2007).

References

Chaudhuri, P. & Marron, J.S. (1999) SiZer for exploration of structures in curves. Journal of the American Statistical Association, 94, 807-823.

Duong, T., Cowling, A., Koch, I. & Wand, M.P. (2008) Feature significance for multivariate kernel density estimation. Computational Statistics and Data Analysis, 52, 4225-4242.

Hochberg, Y. (1988) A sharper Bonferroni procedure for multiple tests of significance. Biometrika, 75, 800-802.

Godtliebsen, F., Marron, J.S. & Chaudhuri, P. (2002) Significance in scale space for bivariate density estimation. Journal of Computational and Graphical Statistics, 11, 1-22.

Wand, M.P. & Jones, M.C. (1995) Kernel Smoothing. Chapman & Hall/CRC, London.

See Also

featureSignifGUI, plot.fs

Examples

Run this code
# NOT RUN {
## Univariate example
data(earthquake)
eq3 <- -log10(-earthquake[,3])
fs <- featureSignif(eq3, bw=0.1)
plot(fs, addSignifGradRegion=TRUE)

## Bivariate example
library(MASS)
data(geyser)
fs <- featureSignif(geyser)
plot(fs, addKDE=FALSE, addData=TRUE)  ## data only
plot(fs, addKDE=TRUE)                 ## KDE plot only
plot(fs, addSignifGradRegion=TRUE)    
plot(fs, addKDE=FALSE, addSignifCurvRegion=TRUE)
plot(fs, addSignifCurvData=TRUE, curvCol="cyan")
# }

Run the code above in your browser using DataLab