Fits an Auxiliary Linear Variance Model (ALVM) to estimate the error variances of a heteroskedastic linear regression model.
alvm.fit(
mainlm,
M = NULL,
model = c("cluster", "spline", "linear", "polynomial", "basic", "homoskedastic"),
varselect = c("none", "hettest", "cv.linear", "cv.cluster", "qgcv.linear",
"qgcv.cluster"),
lambda = c("foldcv", "qgcv"),
nclust = c("elbow.swd", "elbow.mwd", "elbow.both", "foldcv"),
clustering = NULL,
polypen = c("L2", "L1"),
d = 2L,
solver = c("auto", "quadprog", "quadprogXT", "roi", "osqp"),
tsk = NULL,
tsm = NULL,
constol = 1e-10,
cvoption = c("testsetols", "partitionres"),
nfolds = 5L,
reduce2homosked = TRUE,
...
)
An object of class "alvm.fit"
, containing the following:
coef.est
, a vector of parameter estimates, \(\hat{\gamma}\)
var.est
, a vector of estimates \(\hat{\omega}\) of the error
variances for all observations
method
, a character corresponding to the model
argument
ols
, the lm
object corresponding to the original linear
regression model
fitinfo
, a list containing four named objects: Msq
(the
elementwise-square of the annihilator matrix \(M\)), L
(the
linear predictor matrix \(L\)), clustering
(a list object
with results of the clustering procedure), and gam.object
, an
object of class "gam"
(see gamObject
). The
last two are set to NA
unless the clustering ALVM or thin-plate
spline ALVM is used, respectively
hyperpar
, a named list of hyperparameter values,
lambda
, nclust
, tsk
, and d
, and tuning
methods, lambdamethod
and nclustmethod
. Values
corresponding to unused hyperparameters are set to NA
.
selectinfo
, a list containing two named objects,
varselect
(the value of the eponymous argument), and
selectedcols
(a numeric vector with column indices of \(X\)
that were selected, with 1
denoting the intercept column)
pentype
, a character corresponding to the polypen
argument
solver
, a character corresponding to the solver
argument (or specifying the QP solver actually used, if solver
was set to "auto"
)
constol
, a double corresponding to the constol
argument
Either an object of class
"lm"
(e.g., generated by lm
), or
a list of two objects: a response vector and a design matrix. The objects
are assumed to be in that order, unless they are given the names
"X"
and "y"
to distinguish them. The design matrix passed
in a list must begin with a column of ones if an intercept is to be
included in the linear model. The design matrix passed in a list should
not contain factors, as all columns are treated 'as is'. For tests that
use ordinary least squares residuals, one can also pass a vector of
residuals in the list, which should either be the third object or be
named "e"
.
An \(n\times n\) annihilator matrix. If NULL
(the default), this will be calculated from the mainlm
object
A character corresponding to the type of ALVM to be fitted:
"cluster"
for the clustering ALVM, "spline"
for the
thin-plate spline ALVM, "linear"
for the linear ALVM,
"polynomial"
for the penalised polynomial ALVM, "basic"
for
the basic or naive ALVM, and "homoskedastic"
for the
homoskedastic error variance estimator, \(e'e/(n-p)\).
Either a character indicating how variable selection should
be conducted, or an integer vector giving indices of columns of the
predictor matrix (model.matrix
of mainlm
)
to select. The vector must include 1L
for the intercept to be
selected. If a character, it must be one of the following:
"none"
: No variable selection is conducted;
"hettest"
: Variable selection is conducted by applying a
heteroskedasticity test with each feature in turn serving as the
`deflator' variable
"cv.linear"
: Variable selection is conducted by best subset
selection on the auxiliary linear variance model (linear specification),
using squared-error loss computed under \(K\)-fold cross-validation
"cv.cluster"
: Variable selection is conducted by best subset
selection on the auxiliary linear variance model (clustering
specification), using squared-error loss computed under \(K\)-fold
cross-validation
"qgcv.linear"
: Variable selection is conducted by best subset
selection on the auxiliary linear variance model (linear specification),
using squared-error loss computed under quasi-generalised
cross-validation
"qgcv.cluster"
: Variable selection is conducted by best subset
selection on the auxiliary linear variance model (clustering
specification), using squared-error loss computed under
quasi-generalised cross-validation
Either a double of length 1 indicating the value of the
penalty hyperparameter \(\lambda\), or a character specifying the
tuning method for choosing \(\lambda\): "foldcv"
for
\(K\)-fold cross-validation (the default) or "qgcv"
for
quasi-generalised cross-validation. This argument is ignored if
model
is neither "polynomial"
nor "spline"
.
Either an integer of length 1 indicating the value of the
number of clusters \(n_c\), or a character specifying the tuning method
for choosing \(n_c\): "elbow.swd"
for the elbow method using a
sum of within-cluster distances criterion, "elbow.mwd"
for the
elbow method using a maximum within-cluster distances criterion,
"elbow.both"
for rounded average of the results of the previous
two, and "foldcv"
for \(K\)-fold cross-validation. This argument
is ignored if model
is not "cluster"
.
A list object of class "doclust"
. If set to
NULL
(the default), such an object is generated (ignored if
cluster
is FALSE
)
A character, either "L2"
or "L1"
, indicating
whether an \(L_2\) norm penalty (ridge regression) or
\(L_1\) norm penalty (LASSO) should be used with the polynomial model.
This argument is ignored if model
is not "polynomial"
.
An integer specifying the degree of polynomial to use in the
penalised polynomial ALVM; defaults to 2L
. Ignored if
model
is other than "polynomial"
. Setting d
to
1L
is not identical to setting model
to "linear"
,
because the linear ALVM does not have a penalty term in the objective
function.
A character, indicating which Quadratic Programming solver
function to use to estimate \(\gamma\). The options are
"quadprog"
, corresponding to
solve.QP.compact
from the quadprog package;
package; "quadprogXT"
, corresponding to
buildQP
from the quadprogXT package;
"roi"
, corresponding to the qpoases
solver implemented in
ROI_solve
from the ROI package with
ROI.plugin.qpoases add-on; and "osqp"
, corresponding to
solve_osqp
from the osqp package.
Alternatively, the user can specify "auto"
(the default), in which
case the function will select the solver that seems to work best for the
chosen model.
An integer corresponding to the basis dimension k
to be
passed to the [mgcv]{s}
function for fitting of a thin-plate
spline ALVM; see [mgcv]{choose.k}
for more details about
choosing the parameter and defaults. Ignored if model
is not
"spline"
.
An integer corresponding to the order m
of the penalty
to be passed to the [mgcv]{s}
function for fitting of a thin-plate
spline ALVM. If left as the default (NULL
), it will be set to
2, corresponding to 2nd derivative penalties for a cubic spline.
A double corresponding to the boundary value for the
constraint on error variances. Of course, the error variances must be
non-negative, but setting the constraint boundary to 0 can result in
zero estimates that then result in infinite weights for Feasible
Weighted Least Squares. The boundary value should thus be positive, but
small enough not to bias estimation of very small variances. Defaults to
1e-10
.
A character, either "testsetols"
or
"partitionres"
, indicating how to obtain the observed response
values for each test fold when performing \(K\)-fold cross-validation
on an ALVM. The default technique, "testsetols"
, entails fitting
a linear regression model to the test fold of observations from the
original response vector \(y\) and predictor matrix \(X\). The
squared residuals from this regression are the observed
responses that are predicted from the trained model to compute the
cross-validated squared error loss function. Under the other technique,
"partitionres"
, the squared residuals from the full
linear regression model are partitioned into training and test folds and
the squared residuals in the test fold are the observed responses that
are predicted for computation of the cross-validated loss.
An integer specifying the number of folds \(K\) to use for
cross-validation, if the \(\lambda\) and/or \(n_c\) hyperparameters
are to be tuned using cross-validation. Defaults to 5L
. One must
ensure that each test fold contains at least \(p+1\) observations if
the "testsetols"
technique is used with cross-validation, so that
there are enough degrees of freedom to fit a linear model to the test
fold.
A logical indicating whether the homoskedastic
error variance estimator \(e'e/(n-p)\) should be used if the
variable selection procedure does not select any variables. Defaults to
TRUE
.
Other arguments that can be passed to (non-exported) helper functions, namely:
greedy
, a logical passed to the functions implementing best subset
selection, indicating whether or not to use a greedy search rather than
exhaustive search for the best subset. Defaults to FALSE
, but
coerced to TRUE
unconditionally if \(p>9\).
distmetric
, a character specifying the distance metric to use in
computing distance for the clustering algorithm. Corresponds to the
method
argument of dist
and defaults to
"euclidean"
linkage
, a character specifying the linkage rule to use in
agglomerative hierarchical clustering. Corresponds to the method
argument of hclust
and defaults to
"complete"
nclust2search
, an integer vector specifying the values of
\(n_c\) to try when tuning \(n_c\) by cross-validation. Defaults to
1L:20L
alpha
, a double specifying the significance level threshold to
use when applying heteroskedasticity test for the purpose of feature
selection in an ALVM; defaults to 0.1
testname
, a character corresponding to the name of a function
that performs a heteroskedasticity test. The function must either be one
that takes a deflator
argument or breusch_pagan
.
Defaults to evans_king
The ALVM model equation is $$e\circ e = (M \circ M)L \gamma + u$$, where \(e\) is the Ordinary Least Squares residual vector, \(M\) is the annihilator matrix \(M=I-X(X'X)^{-1}X'\), \(L\) is a linear predictor matrix, \(u\) is a random error vector, \(\gamma\) is a \(p\)-vector of unknown parameters, and \(\circ\) denotes the Hadamard (elementwise) product. The construction of \(L\) depends on the method used to model or estimate the assumed heteroskedastic function \(g(\cdot)\), a continuous, differentiable function that is linear in \(\gamma\) and by which the error variances \(\omega_i\) of the main linear model are related to the predictors \(X_{i\cdot}\). This method has been developed as part of the author's doctoral research project.
Depending on the model used, the estimation method could be Inequality-Constrained Least Squares or Inequality-Constrained Ridge Regression. However, these are both special cases of Quadratic Programming. Therefore, all of the models are fitted using Quadratic Programming.
Several techniques are available for feature selection within the model.
The LASSO-type model handles feature selection via a shrinkage penalty.
For this reason, if the user calls the polynomial model with
\(L_1\)-norm penalty, it is not necessary to specify a variable
selection method, since this is handled automatically. Another feature
selection technique is to use a heteroskedasticity test that tests for
heteroskedasticity linked to a particular predictor variable (the
`deflator'). This test can be conducted with each features in turn
serving as the deflator. Those features for which the null hypothesis of
homoskedasticity is rejected at a specified significance level
alpha
are selected. A third feature selection technique is best
subset selection, where the model is fitted with all possible subsets of
features. The models are scored in terms of some metric, and the
best-performing subset of features is selected. The metric could be
squared-error loss computed under \(K\)-fold cross-validation or using
quasi-generalised cross-validation. (The quasi- prefix refers to
the fact that generalised cross-validation is, properly speaking, only
applicable to a linear fitting method, as defined by
Hastie09;textualskedastic. ALVMs are not linear fitting
methods due to the inequality constraint). Since best subset selection
requires fitting \(2^{p-1}\) models (where \(p-1\) is the number of
candidate features), it is infeasible for large \(p\). A greedy search
technique can therefore be used as an alternative, where one begins with
a null model and adds the feature that leads to the best improvement in
the metric, stopping when no new feature leads to an improvement.
The polynomial and thin-plate spline ALVMs have a penalty hyperparameter \(\lambda\) that must either be specified or tuned. \(K\)-fold cross-validation or quasi-generalised cross-validation can be used for tuning. The clustering ALVM has a hyperparameter \(n_c\), the number of clusters into which to group the observations (where error variances are assumed to be equal within each cluster). \(n_c\) can be specified or tuned. The available tuning methods are an elbow method (using a sum of within-cluster distances criterion, a maximum within-cluster distance criterion, or a combination of the two) and \(K\)-fold cross-validation.
alvm.fit
, avm.ci
mtcars_lm <- lm(mpg ~ wt + qsec + am, data = mtcars)
myalvm <- alvm.fit(mtcars_lm, model = "cluster")
Run the code above in your browser using DataLab