Computes grmsd
(generalized root mean square distance)
as variables are added to (method="addVars"
) or removed from
(method="delVars"
) an k-NN imputation model. When adding variables
the function keeps variables that strengthen imputation and
deletes that weaken the imputation the least.
The measure of model strength is grmsd between
imputed and observed Y-variables among the reference observations.
varSelection(x,y,method="addVars",yaiMethod="msn",imputeMethod="closest",
wts=NULL,nboot=20,trace=FALSE,
useParallel=if (.Platform$OS.type == "windows") FALSE else TRUE,...)
An list of class varSel
with these tags:
the call
a 2-column matrix of the mean and std dev of the mean Mahalanobis distances associated with adding or removing the variables stored as the rownames. When nboot<2, the std dev are NA
a list of the grmsd values that correspond to each bootstrap replication. The data in grmsd are based on these vectors of information.
the value of argument method
.
a set of X-Variables as used in yai
.
a set of Y-Variables as used in yai
.
if addVars
, the X-Variables are added and
if delVars
they are deleted (see details).
passed as method
to yai
.
passed as method
to impute.yai
.
passed as argument wts
to grmsd
which is used to score the alternative varialbe sets.
the number of bootstrap samples used at each variable selection step (see Details). When nboot is zero, NO bootstraping is done.
if TRUE
information at each step is output.
function link{parallel:mclapply}
from parallel
will be used if it is available for running the bootstraps. It it is
not available, link{lapply}
is used (which is the only option
on windows).
passed to link{yai}
Nicholas L. Crookston ncrookston.fs@gmail.com
This function tracks the effect on generalized root mean square distance
(see grmsd
) when variables are added or deleted one at a time.
When adding variables, the function starts with none, and keeps the single
variable that provides the smallest grmsd
. When deleting variables,
the functions starts with all X-Variables and deletes them one at a
time such that those that remain provide the smallest
grmsd
. The function uses the following steps:
Function yai
is run for all the Y-variables and candidate
X-variable(s). The result is passed to impute.yai
to get imputed
values of Y-variables. That result is passed to grmsd
to compute a
mean Mahalanobis distance for the case where the candidate variable is included
(or deleted depending on method
). However, these steps are done once
for each bootstrap replication and the resulting values are averaged to provide
an average mean Mahalanobis distance over the bootstraps.
Step one is done for each candidate X-variable forming a vector of
grmsd
values, one corresponding to the case where each candidate
is added or deleted.
When variables are being added (method="addVars"
), the variable that
is related to the smallest grmsd
is kept. When variables are being
deleted (method="delVars"
), the variable that
is related to the largest grmsd
is deleted.
Once a variable has been added or deleted, the function proceeds to select another variable for selection or deletion by considering all remaining variables.
yai
, impute.yai
, bestVars
and
grmsd
data(iris)
set.seed(12345)
x <- iris[,1:2] # Sepal.Length Sepal.Width
y <- iris[,3:4] # Petal.Length Petal.Width
vsel <- varSelection(x=x,y=y,nboot=5,useParallel=FALSE)
vsel
bestVars(vsel)
plot(vsel)
Run the code above in your browser using DataLab