Learn R Programming

mig (version 2.0)

kdens_bandwidth: Optimal scale matrix for kernel density estimation

Description

Given an n sample from a multivariate distribution on the half-space defined by \(\{\boldsymbol{x} \in \mathbb{R}^d: \boldsymbol{\beta}^\top\boldsymbol{x}>0\}\), the function computes the bandwidth (type="isotropic") or scale matrix that minimizes the asymptotic mean integrated squared error away from the boundary. The latter depend on the true unknown density, which is replaced by the kernel density or a MIG distribution evaluated at the maximum likelihood estimator. The integral or the integrated squared error are obtained by Monte Carlo integration with N simulations

Usage

kdens_bandwidth(
  x,
  beta,
  shift,
  family = c("mig", "hsgauss", "tnorm"),
  method = c("amise", "lcv", "lscv", "rlcv"),
  type = c("isotropic", "diag", "full"),
  approx = c("kernel", "mig", "tnorm"),
  transformation = c("none", "scaling", "spherical"),
  N = 10000L,
  buffer = 0,
  maxiter = 2000L,
  ...
)

Value

a d by d scale matrix

Arguments

x

an n by d matrix of observations

beta

d vector defining the half-space

shift

location vector for translating the half-space. If missing, defaults to zero

family

distribution for smoothing, either mig for multivariate inverse Gaussian, tnorm for truncated normal on the half-space and hsgauss for the Gaussian smoothing after suitable transformation.

method

estimation criterion, either amise for the expression that minimizes the asymptotic integrated squared error, lcv for likelihood (leave-one-out) cross-validation, lscv for least-square cross-validation or rlcv for robust cross validation of Wu (2019)

type

string indicating whether to compute an isotropic model or estimate the optimal scale matrix via optimization

approx

string; distribution to approximate the true density function \(f(x)\); either kernel for the kernel estimator evaluated at the sample points (except for method="amise", which isn't supported), mig for multivariate inverse Gaussian with the method of moments or tnorm for the multivariate truncated Gaussian evaluated by maximum likelihood.

transformation

string for optional scaling of the data before computing the bandwidth. Either standardization to unit variance scaling, spherical transformation to unit variance and zero correlation (spherical), or none (default).

N

integer number of simulations for Monte Carlo integration

buffer

double indicating the buffer from the half-space

maxiter

integer; max number of iterations in the call to optim.

...

additional parameters, currently ignored

References

Wu, X. (2019). Robust likelihood cross-validation for kernel density estimation. Journal of Business & Economic Statistics, 37(4), 761–770. tools:::Rd_expr_doi("10.1080/07350015.2018.1424633") Bowman, A.W. (1984). An alternative method of cross-validation for the smoothing of density estimates, Biometrika, 71(2), 353–360. tools:::Rd_expr_doi("10.1093/biomet/71.2.353") Rudemo, M. (1982). Empirical choice of histograms and kernel density estimators. Scandinavian Journal of Statistics, 9(2), 65–78. http://www.jstor.org/stable/4615859