distance metric to use, one of "euclidean" or "correlation"
rm.na
should NA values be imputed?
rm.nan
should NaN values be imputed?
rm.inf
should Inf values be imputed?
Value
A data matrix with missing values imputed.
Details
Uses the K-nearest neighbor algorithm, as described in Troyanskaya et
al., 2001, to impute missing values in a data matrix. Elements are
imputed row-wise, so that neighbors are
selected based on the rows which are closest in distance to the row
with missing values. There are two
choices for a distance metric, either Euclidean (the default) or a
correlation 'metric'. If the latter is selected, matrix values are
first row-normalized to mean zero and standard deviation one to select
neighbors. Values are 'un'-normalized by applying the inverse
transformation prior to returning the imputed data matrix.
References
O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie,
R. Tibshirani, D. Botstein, and R. B. Altman.
Missing value estimation methods for dna microarrays.
Bioinformatics, 17(6):520-5, 2001.
G.N. Brock, J.R. Shaffer, R.E. Blakesley, M.J. Lotz, and G.C. Tseng.
Which missing value imputation method to use in expression profiles: a
comparative study and two selection schemes.
BMC Bioinformatics, 9:12, 2008.
See Also
See the package vignette for illustration on usage.