Cross-validations
The function cv.lmvar
carries out a k-fold cross-validation for an 'lmvar' model. For each fold, an 'lmvar'
model is fit to all observations that are not in the fold (the 'training set') and prediction errors are calculated
for the observations in the fold (the 'test set'). The prediction errors are the absolute error \(|y - \mu|\)
and its square \((y - \mu)^2\). The average prediction errors over the observations in the fold are calculated,
and the square root of the average of the squared errors is taken. Optionally, one can calculate a user-specified
function fun
for the test set and the 'lmvar' model resulting from the
training set. Optionally, one can also calculate the Kolmogorov-Smirnov (KS) distance for the test set and its p-value.
The results for the k folds are averaged over the folds and standard deviations are calculated from the k results.
User defined function
The argument fun
allows a user to specify a function for which cross-validation results
must be obtained. This function must meet the following requirements.
Carrying out a k-fold cross-validation, the function is called k times with object_t
equal to the fit
to the training set, y
equal
to the response vector of the test set, and
X_mu
and X_sigma
the design matrices of the test set.If the evaluation of fun
gives an error, cv.lmvar
will give a warning and exclude that
evaluation from the mean and the standard deviation of fun
over the k folds. If the evaluation
of fun
gives a warning, it will be ignored.
In the cross-validations, object_t
contains the design matrices of the training set as
object_t$X_mu
and object_t$X_sigma
. object_t$X_mu
was formed by taking
object$X_mu
and removing the fold-rows. In addition, columns may have been removed to make the matrix
full-rank. Therefore, object_t$X_mu
may have fewer columns than object$X_mu
. The same is true
for object_t$X_sigma
compared to object$X_sigma
.
Kolmogorov-Smirnov test
When ks_test = TRUE
, a Kolmogorov-Smirnov (KS) test is carried out for each fold. The test checks whether the
standardized residuals \((y - \mu) / \sigma\) in a fold are distributed as a standard normal distribution. The
KS-distance and the corresponding p-value are calculated for each fold. The test uses the
function ks.test
. The expectation values \(\mu\) and standard deviations \(\sigma\) are
calculated from the model matrices for the test set (the fold) and the 'lmvar' fit to the training set.
Excluding observations
The observations specified in the argument exclude
are not used to calculate the error statistics MAE
(mean absolute error), MSE (mean squared error) and the square root of MSE. They are also not used to calculate
the statistics for the user-defined function fun
. This is useful when there are a few observations
with such large residuals that they dominate the error estimates. Note that the excluded observations are not
excluded from the training sets. It is only in the calculation of the statistics of the test sets that the
observations are
excluded. They are not excluded from the KS-test: when observations have large residuals, they should have large
standard deviations as well,
to give the standardized residuals normal values.
Minimum sigma
The argument sigma_min
gives the option to enforce a minimum standard deviation. This is
useful when, in a cross-validation, a fit fails because the maximum likelihood occurs when the standard
deviation of one or more observations becomes zero.
When a minimum standard deviation is specified, all fits are carried out under the
boundary condition that the standard deviation is larger than the minimum. If sigma_min = NULL
the same value
is used as was used to create object
.
Other
The fits are carried out with the options slvr_options
stored in the 'lmvar' object object
.
However, these options can be overwritten with an explicit argument slvr_options
in the call of
cv.lmvar
. Some of the options are affected by a sigma_min
larger than zero, see lmvar
for
details.
The argument slvr_options
is a list, members of which can be a list themselves.
If members of a sublist are overwritten, the other members of the sublist remain unchanged. E.g., the
argument slvr_options = list(control = list(iterlim = 600))
will set control$iterlim
to 600 while
leaving other members of the list control
unchanged.
The number of available CPU cores is detected with detectCores
.