horn: Horn's Test for Heteroskedasticity in a Linear Regression Model

Description

This function implements the nonparametric test of Horn81;textualskedastic for testing for heteroskedasticity in a linear regression model.

Usage

horn(
  mainlm,
  deflator = NA,
  restype = c("ols", "blus"),
  alternative = c("two.sided", "greater", "less"),
  exact = (nres

Arguments

mainlm

Either an object of class "lm" (e.g., generated by lm), or a list of two objects: a response vector and a design matrix. The objects are assumed to be in that order, unless they are given the names "X" and "y" to distinguish them. The design matrix passed in a list must begin with a column of ones if an intercept is to be included in the linear model. The design matrix passed in a list should not contain factors, as all columns are treated 'as is'. For tests that use ordinary least squares residuals, one can also pass a vector of residuals in the list, which should either be the third object or be named "e".

deflator

Either a character specifying a column name from the design matrix of mainlm or an integer giving the index of a column of the design matrix. This variable is suspected to be related to the error variance under the alternative hypothesis. deflator may not correspond to a column of 1's (intercept). Default NA means the data will be left in its current order (e.g. in case the existing index is believed to be associated with error variance).

restype

A character specifying which residuals to use: "ols" for OLS residuals (the default) or the "blus" for BLUS residuals. The advantage of using BLUS residuals is that, under the null hypothesis, the assumption that the random series is independent and identically distributed is met (whereas with OLS residuals it is not). The disadvantage of using BLUS residuals is that only \(n-p\) residuals are used rather than the full \(n\).

alternative

A character specifying the form of alternative hypothesis; one of "two.sided" (default), "greater", or "less". "two.sided" corresponds to any trend in the absolute residuals when ordered by deflator. "greater" corresponds to a negative trend in the absolute residuals when ordered by deflator. "less" corresponds to a positive trend in the absolute residuals when ordered by deflator. (Notice that \(D\) tends to be small when there is a positive trend.)

exact

A logical. Should exact \(p\)-values be computed? If FALSE, a normal approximation is used. Defaults to TRUE only if the number of absolute residuals being ranked is \(\le 10\).

statonly

A logical. If TRUE, only the test statistic value is returned, instead of an object of class "htest". Defaults to FALSE.

...

Optional further arguments to pass to blus.

Value

An object of class "htest". If object is not assigned, its attributes are displayed in the console as a tibble using tidy.

Details

The test entails specifying a 'deflator', an explanatory variable suspected of being related to the error variance. Residuals are ordered by the deflator and the nonparametric trend statistic \(D=\sum (R_i - i)^2\) proposed by Lehmann75;textualskedastic is then computed on the absolute residuals and used to test for an increasing or decreasing trend, either of which would correspond to heteroskedasticity. Exact probabilities for the null distribution of \(D\) can be obtained from functions dDtrend and pDtrend, but since computation time increases rapidly with \(n\), use of a normal approximation is recommended for \(n>10\). Lehmann75;textualskedastic proves that \(D\) is asymptotically normally distributed and the approximation of the statistic \(Z=(D-E(D))/\sqrt{V(D)}\) to the standard normal distribution is already quite good for \(n=11\).

The expectation and variance of \(D\) (when ties are absent) are respectively \(E(D)=\frac{n^3-n}{6}\) and \(V(D)=\frac{n^2(n+1)^2(n-1)}{36}\); see Lehmann75;textualskedastic for \(E(D)\) and \(V(D)\) when ties are present. When ties are absent, a continuity correction is used to improve the normal approximation. When exact distribution is used, two-sided \(p\)-value is computed by doubling the one-sided \(p\)-value, since the distribution of \(D\) is symmetric. The function does not support the exact distribution of \(D\) in the presence of ties, so in this case the normal approximation is used regardless of \(n\).

References

Examples

Run this code

# NOT RUN {
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars)
horn(mtcars_lm, deflator = "qsec")
horn(mtcars_lm, deflator = "qsec", restype = "blus")

# }

Run the code above in your browser using DataLab