This function offers several k-nearest neighbor methods for the imputation of missing values in compositional data.
impKNNa(
x,
method = "knn",
k = 3,
metric = "Aitchison",
agg = "median",
primitive = FALSE,
normknn = TRUE,
das = FALSE,
adj = "median"
)
Original data frame or matrix
Imputed data
Amount of imputed values
Index of the missing values in the data
Metric used
data frame or matrix
method (at the moment, only “knn” can be used)
number of nearest neighbors chosen for imputation
“Aichison” or “Euclidean”
“median” or “mean”, for the aggregation of the nearest neighbors
if TRUE, a more enhanced search for the $k$-nearest neighbors is obtained (see details)
An adjustment of the imputed values is performed if TRUE
depricated. if TRUE, the definition of the Aitchison distance, based on simple logratios of the compositional part, is used (Aitchison, 2000) to calculate distances between observations. if FALSE, a version using the clr transformation is used.
either ‘median’ (default) or ‘sum’ can be chosen for the adjustment of the nearest neighbors, see Hron et al., 2010.
Matthias Templ
The Aitchison metric
should be chosen when dealing with compositional
data, the Euclidean metric
otherwise.
If primitive
\(==\) FALSE, a sequential search for the
\(k\)-nearest neighbors is applied for every missing value where all
information corresponding to the non-missing cells plus the information in
the variable to be imputed plus some additional information is available. If
primitive
\(==\) TRUE, a search of the \(k\)-nearest neighbors
among observations is applied where in addition to the variable to be
imputed any further cells are non-missing.
If normknn
is TRUE (prefered option) the imputed cells from a nearest
neighbor method are adjusted with special adjustment factors (more details
can be found online (see the references)).
Aitchison, J., Barcelo-Vidal, C., Martin-Fernandez, J.A., Pawlowsky-Glahn, V. (2000) Logratio analysis and compositional distance, Mathematical Geology, 32(3), 271-275.
Hron, K., Templ, M., Filzmoser, P. (2010) Imputation of missing values for compositional data using classical and robust methods Computational Statistics and Data Analysis, 54 (12), 3095-3107.
impCoda
data(expenditures)
x <- expenditures
x[1,3]
x[1,3] <- NA
xi <- impKNNa(x)$xImp
xi[1,3]
Run the code above in your browser using DataLab