avm.ci: Bootstrap Confidence Intervals for Linear Regression Error Variances

Description

Uses bootstrap methods to compute approximate confidence intervals for error variances in a heteroskedastic linear regression model based on an auxiliary linear variance model (ALVM) or auxiliary nonlinear variance model (ANLVM).

Usage

avm.ci(
  object,
  bootobject = NULL,
  bootavmobject = NULL,
  jackobject = NULL,
  bootCImethod = c("pct", "bca", "stdnorm"),
  bootsampmethod = c("pairs", "wild"),
  Bextra = 500L,
  Brequired = 1000L,
  conf.level = 0.95,
  expand = TRUE,
  retune = FALSE,
  resfunc = c("identity", "hccme"),
  qtype = 6,
  rm_on_constraint = TRUE,
  rm_nonconverged = TRUE,
  jackknife_point = FALSE,
  ...
)

Value

An object of class "avm.ci", containing the following:

climits, an $n\times 2$ matrix with lower confidence limits in the first column and upper confidence limits in the second
var.est, a vector of length $n$ of point estimates $\hat{\omega}$ of the error variances. This is the same vector passed within object, unless jackknife_point is TRUE.
conf.level, corresponding to the eponymous argument
bootCImethod, corresponding to the eponymous argument
bootsampmethod, corresponding to the eponymous argument or otherwise extracted from bootobject

Arguments

object: An object of class "alvm.fit" or of class "anlvm.fit", containing information on a fitted ALVM or ANLVM
bootobject: An object of class "bootlm", containing information on a set of $B$ bootstrapped versions of a linear regression model, obtained by a nonparametric bootstrap method suitable for heteroskedastic linear models. If set to NULL (the default), it is generated by calling bootlm.
bootavmobject: An object of class "bootavm", containing information on an ALVM or ANLVM fit to $B$ bootstrapped linear regression models. If set to NULL (the default), it is generated by calling the non-exported function bootavm.
jackobject: An object of class "jackavm", containing information on ALVMs or ANVLMs fit to jackknife versions of a linear regression model. If set to NULL (the default), it is generated by calling the non-exported function jackavm.
bootCImethod: A character specifying the method to use when computing the approximate bootstrap confidence interval. The default, "pct", corresponds to the percentile interval. "bca" corresponds to the Bias-Corrected and accelerated (BCa) modification of the percentile interval. "stdnorm" corresponds to a naive standard normal interval with bootstrap standard error estimates.
bootsampmethod: A character specifying the method to use for generating nonparametric bootstrap linear regression models. Corresponds to the sampmethod argument of bootlm and defaults to "pairs". Warning: in simulations, bootstrap intervals computed using the wild bootstrap have shown very poor coverage probabilities. Ignored unless bootobject is NULL.
Bextra: An integer indicating the maximum number of additional bootstrap models that should be fitted in an attempt to obtain Brequired appropriate sets of bootstrap variance estimates, as explained above under Brequired. Defaults to 500L. Ignored if rm_on_constraint is set to FALSE (for an ALVM) or if rm_nonconverged is set to FALSE (for an ANLVM).
Brequired: An integer indicating the number of bootstrap regression models that should be used to compute the bootstrap confidence intervals. The default behaviour is to base the interval estimates only on bootstrap ALVM variance estimates that are not on the constraint boundary or on bootstrap ANLVMs where the estimation algorithm converged. Consequently, if this is not the case for all of the first Brequired bootstrap models, additional bootstrap models are used (up to a maximum of Bextra). Defaults to 1000L.
conf.level: A double representing the confidence level $1-\alpha$; must be between 0 and 1. Defaults to 0.95.
expand: A logical specifying whether to implement the expansion technique described in Hesterberg15;textualskedastic. Defaults to TRUE.
retune: A logical specifying whether to re-tune hyperparameters and re-select features each time an ALVM or (in the case of feature selection) ANLVM is fit to a bootstrapped regression model. If FALSE (the default), the hyperparameter value and selected features from the ALVM fit to the original model are reused in every bootstrap model. Setting to TRUE is more theoretically sound but increases computation time substantially.
resfunc: Either a character naming a function to call to apply a transformation to the Ordinary Least Squares residuals, or a function to apply for the same purpose. This argument is ignored if sampmethod is "pairs". The only two character values accepted are "identity", in which case no transformation is applied to the residuals, and "hccme", in which case the transformation corresponds to a heteroskedasticity-consistent covariance matrix estimator calculated from hccme. If resfunc is a function, it is assumed that its first argument is the numeric vector of residuals.
qtype: A numeric corresponding to the type argument of quantile. Defaults to 6.
rm_on_constraint: A logical specifying whether to exclude bootstrapped ALVMs from the interval estimation method where the ALVM parameter estimate falls on the constraint boundary. Defaults to TRUE.
rm_nonconverged: A logical specifying whether to exclude bootstrapped ANLVMs from the interval estimation method where the optimisation algorithm used in quasi-likelihood estimation of the ANLVM parameter did not converge. Defaults to TRUE.
jackknife_point: A logical specifying whether to replace the point estimates of the error variances $\omega$ with jackknife estimates based only on the leave-one-out auxiliary models where the parameter estimates do not lie on the constraint boundary (in the ALVM case) or where the quasi-likelihood estimation algorithm converged (in the ANLVM case). Defaults to FALSE.
...: Other arguments to pass to non-exported helper functions

Details

$B$ resampled versions of the original linear regression model (which can be accessed using object$ols) are generated using a nonparametric bootstrap method that is suitable for heteroskedastic linear regression models, namely either the pairs bootstrap or the wild bootstrap (bootstrapping residuals is not suitable). Depending on the class of object, either an ALVM or an ANLVM is fit to each of the bootstrapped regression models. The distribution of the $B$ bootstrap estimates of each error variance $\omega_i$, $i=1,2,\ldots,n$, is used to construct an approximate confidence interval for $\omega_i$. This is done using one of three methods. The first is the percentile interval, which simply takes the empirical $\alpha/2$ and $1-\alpha/2$ quantiles of the $i$th bootstrap variance estimates. The second is the Bias-Corrected and accelerated (BCa) method as described in Efron93;textualskedastic, which is intended to improve on the percentile interval (although simulations have not found it to yield better coverage probabilities). The third is the naive standard normal interval, which takes $\hat{\omega}_i \pm z_{1-\alpha/2} \widehat{\mathrm{SE}}$, where $\widehat{\mathrm{SE}}$ is the standard deviation of the $B$ bootstrap estimates of $\omega_i$. By default, the expansion technique described in Hesterberg15;textualskedastic is also applied; evidence from simulations suggests that this does improve coverage probabilities.

Technically, the hyperparameters of the ALVM, such as $\lambda$ (for a penalised polynomial or thin-plate spline model) or $n_c$ (for a clustering model) should be re-tuned every time the ALVM is fitted to another bootstrapped regression model. However, due to the computational cost, this is not done by avm.ci unless retune is set to TRUE.

When obtained from ALVMs, bootstrap estimates of $\omega_i$ that fall on the constraint boundary (i.e., are estimated to be near 0) are ignored by default; there is an attempt to obtain Brequired bootstrap estimates of each $\omega_i$ that do not fall on the constraint boundary. This fine-tuning can be turned off by setting the rm_onconstraint argument to FALSE; the amount of effort put into obtaining non-boundary estimates is controlled using the Bextra argument. When ANLVMs are used, the default behaviour is to try to obtain Brequired bootstrap estimates of $\omega$ where the Gauss-Newton algorithm applied for quasi-likelihood estimation has converged, and ignore estimates obtained from non-convergent cases. This behaviour can be toggled using the rm_nonconverged argument.

References

Examples

Run this code

mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars)
myalvm <- alvm.fit(mtcars_lm, model = "cluster")
# Brequired would of course not be so small in practice
ci.alvm <- avm.ci(myalvm, Brequired = 5)

Run the code above in your browser using DataLab