kdens_bandwidth: Optimal scale matrix for kernel density estimation

Description

Given an n sample from a multivariate distribution on the half-space defined by \(\{\boldsymbol{x} \in \mathbb{R}^d: \boldsymbol{\beta}^\top\boldsymbol{x}>0\}\), the function computes the bandwidth (type="isotropic") or scale matrix that minimizes the asymptotic mean integrated squared error away from the boundary. The latter depend on the true unknown density, which is replaced by the kernel density or a MIG distribution evaluated at the maximum likelihood estimator. The integral or the integrated squared error are obtained by Monte Carlo integration with N simulations

Usage

kdens_bandwidth(
  x,
  beta,
  shift,
  family = c("mig", "hsgauss", "tnorm"),
  method = c("amise", "lcv", "lscv", "rlcv"),
  type = c("isotropic", "diag", "full"),
  approx = c("kernel", "mig", "tnorm"),
  transformation = c("none", "scaling", "spherical"),
  N = 10000L,
  buffer = 0,
  maxiter = 2000L,
  ...
)

Value

a d by d scale matrix

Arguments

x: an n by d matrix of observations
beta: d vector defining the half-space
shift: location vector for translating the half-space. If missing, defaults to zero
family: distribution for smoothing, either mig for multivariate inverse Gaussian, tnorm for truncated normal on the half-space and hsgauss for the Gaussian smoothing after suitable transformation.
method: estimation criterion, either amise for the expression that minimizes the asymptotic integrated squared error, lcv for likelihood (leave-one-out) cross-validation, lscv for least-square cross-validation or rlcv for robust cross validation of Wu (2019)
type: string indicating whether to compute an isotropic model or estimate the optimal scale matrix via optimization
approx: string; distribution to approximate the true density function \(f(x)\); either kernel for the kernel estimator evaluated at the sample points (except for method="amise", which isn't supported), mig for multivariate inverse Gaussian with the method of moments or tnorm for the multivariate truncated Gaussian evaluated by maximum likelihood.
transformation: string for optional scaling of the data before computing the bandwidth. Either standardization to unit variance scaling, spherical transformation to unit variance and zero correlation (spherical), or none (default).
N: integer number of simulations for Monte Carlo integration
buffer: double indicating the buffer from the half-space
maxiter: integer; max number of iterations in the call to optim.
...: additional parameters, currently ignored

References

Wu, X. (2019). Robust likelihood cross-validation for kernel density estimation. Journal of Business & Economic Statistics, 37(4), 761–770. tools:::Rd_expr_doi("10.1080/07350015.2018.1424633") Bowman, A.W. (1984). An alternative method of cross-validation for the smoothing of density estimates, Biometrika, 71(2), 353–360. tools:::Rd_expr_doi("10.1093/biomet/71.2.353") Rudemo, M. (1982). Empirical choice of histograms and kernel density estimators. Scandinavian Journal of Statistics, 9(2), 65–78. http://www.jstor.org/stable/4615859