Hpi, Hpi.diag, hpi: Plug-in bandwidth selector

Description

Plug-in bandwidth for for 1- to 6-dimensional data.

Usage

Hpi(x, nstage=2, pilot="samse", pre="sphere", Hstart,
    binned=FALSE, bgridsize, amise=FALSE, kfold=1)
Hpi.diag(x, nstage=2, pilot="amse", pre="scale", Hstart,
    binned=FALSE, bgridsize, kfold=1)
hpi(x, nstage=2, binned=TRUE, bgridsize)

Arguments

vector or matrix of data values

nstage

number of stages in the plug-in bandwidth selector (1 or 2)

pilot

"amse" = AMSE pilot bandwidths, "samse" = single SAMSE pilot bandwidth, "unconstr" = unconstrained pilot bandwidth matrix

pre

"scale" = pre-scaling, "sphere" = pre-sphering

Hstart

initial bandwidth matrix, used in numerical optimisation

binned

flag for binned kernel estimation

bgridsize

vector of binning grid sizes - required only if binned=TRUE

amise

flag for returning estimated AMISE

kfold

value for k-fold bandwidth selection. See details below

Value

Plug-in bandwidth. If amise=TRUE then the plug-in bandwidth plus the estimated AMISE is returned in a list.

Details

hpi is the univariate plug-in selector of Wand & Jones (1994). Hpi is a multivariate generalisation of this.

Use Hpi for full bandwidth matrices and Hpi.diag for diagonal bandwidth matrices. For AMSE pilot bandwidths, see Wand & Jones (1994). For SAMSE pilot bandwidths, see Duong & Hazelton (2003). The latter is a modification of the former, in order to remove any possible problems with non-positive definiteness. Unconstrained pilot bandwidths are available for d = 1, ..., 5 (but are extremely computationally intensive for the latter dimensions). See Chac'on & Duong (2008).

For d = 1, the selector hpi is exactly the same as KernSmooth's dpik. This is always computed as binned estimator. For d = 2, 3, 4 and binned=TRUE, estimates are computed over a binning grid defined by bgridsize. Otherwise it's computed exactly. For details on the pre-transformations in pre, see pre.sphere and pre.scale.

If Hstart is not given then it defaults to k*var(x) where k = $\left[\frac{4}{n(d+2)}\right]^{2/(d+4)}$, n = sample size, d = dimension of data. For large samples, k-fold bandwidth selection can significantly reduce computation time. The full data sample is partitioned into k sub-samples and a bandwidth matrix is computed for each of these sub-samples. The bandwidths are averaged and re-weighted to serve as a proxy for the full sample selector.

References

Chac'on, J.E. & Duong, T. (2008) Multivariate plug-in bandwidth selection with unconstrained pilot matrices. Test. Accepted. Duong, T. & Hazelton, M.L. (2003) Plug-in bandwidth matrices for bivariate kernel density estimation. Journal of Nonparametric Statistics, 15, 17-30. Sheather, S.J. & Jones, M.C. (1991) A reliable data-based bandwidth selection method for kernel density estimation. Journal of the Royal Statistical Society, Series B, 53, 683-690. Wand, M.P. & Jones, M.C. (1994) Multivariate plugin bandwidth selection. Computational Statistics, 9, 97-116.

Examples

Run this code

data(unicef)
Hpi(unicef)
Hpi(unicef, pilot="unconstr")
Hpi.diag(unicef, binned=TRUE)
hpi(unicef[,1])

Run the code above in your browser using DataLab