Learn R Programming

ks (version 1.7.0)

Hpi: Plug-in bandwidth selector

Description

Plug-in bandwidth for for 1- to 6-dimensional data.

Usage

Hpi(x, nstage=2, pilot="samse", pre="sphere", Hstart,
    binned=FALSE, bgridsize, amise=FALSE, kfold=1)
Hpi.diag(x, nstage=2, pilot="samse", pre="scale", Hstart,
    binned=FALSE, bgridsize, amise=FALSE, kfold=1)
hpi(x, nstage=2, binned=TRUE, bgridsize)

Arguments

x
vector or matrix of data values
nstage
number of stages in the plug-in bandwidth selector (1 or 2)
pilot
"amse" = AMSE pilot bandwidths, "samse" = single SAMSE pilot bandwidth, "unconstr" = unconstrained pilot bandwidth
pre
"scale" = pre-scaling, "sphere" = pre-sphering
Hstart
initial bandwidth matrix, used in numerical optimisation
binned
flag for binned kernel estimation. Default is FALSE.
bgridsize
vector of binning grid sizes
amise
flag to return the minimal scaled PI value
kfold
value for k-fold bandwidth selection. See details below.

Value

  • Plug-in bandwidth. If amise=TRUE then the minimal scaled PI value is returned too.

Details

hpi is the univariate plug-in selector of Wand & Jones (1994), i.e. it is exactly the same as KernSmooth's dpik. Hpi is a multivariate generalisation of this. Use Hpi for full bandwidth matrices and Hpi.diag for diagonal bandwidth matrices.

For AMSE pilot bandwidths, see Wand & Jones (1994). For SAMSE pilot bandwidths, see Duong & Hazelton (2003). The latter is a modification of the former, in order to remove any possible problems with non-positive definiteness. Unconstrained pilot bandwidths are available for d = 1, ..., 5 (but are extremely computationally intensive for the latter dimensions). See Chacon & Duong (2010).

For d = 1, 2, 3, 4 and binned=TRUE, estimates are computed over a binning grid defined by bgridsize. Otherwise it's computed exactly. For details on the pre-transformations in pre, see pre.sphere and pre.scale.

If Hstart is not given then it defaults to k*var(x) where $k=\left[\frac{4}{n(d+2)}\right]^{2/(d+4)}$, n = sample size, d = dimension of data. For large samples, k-fold bandwidth selection can significantly reduce computation time. The full data sample is partitioned into k sub-samples and a bandwidth matrix is computed for each of these sub-samples. The bandwidths are averaged and re-weighted to serve as a proxy for the full sample selector. (Temporarily disabled).

References

Chacon, J.E. & Duong, T. (2010) Multivariate plug-in bandwidth selection with unconstrained pilot matrices. Test, 19, 375-398. Duong, T. & Hazelton, M.L. (2003) Plug-in bandwidth matrices for bivariate kernel density estimation. Journal of Nonparametric Statistics, 15, 17-30. Sheather, S.J. & Jones, M.C. (1991) A reliable data-based bandwidth selection method for kernel density estimation. Journal of the Royal Statistical Society, Series B, 53, 683-690. Wand, M.P. & Jones, M.C. (1994) Multivariate plugin bandwidth selection. Computational Statistics, 9, 97-116.

Examples

Run this code
data(unicef)
Hpi(unicef)
Hpi(unicef, pilot="unconstr")
Hpi.diag(unicef, binned=TRUE)
hpi(unicef[,1])

Run the code above in your browser using DataLab