impute
method performs data imputation on an
MSnSet
instance using a variety of methods (see below). The
imputation and the parameters are logged into the
processingData(object)
slot.
Users should proceed with care when imputing data and take precautions
to assure that the imputation produce valid results, in particular
with naive imputations such as replacing missing values with 0.
signature(object = "MSnSet", method, ...)
object
MSnSet
instance using the method
algorithm. ...
is used to
pass parameters to the imputation function. See the respective
methods for details and additional parameters.There are two types of mechanisms resulting in missing values in LC/MSMS experiments.
MNAR features should ideally be imputed with a left-censor method,
such as QRILC
below. Conversely, it is recommended to use host
deck methods such nearest neighbours, Bayesian missing value
imputation or maximum likelihood methods when values are missing at
random.
Currently, the following imputation methods are available:
norm::imp.norm
function. See
imp.norm
for details and additional
parameters. Note that here, ...
are passed to the
em.norm
function, rather to the actual
imputation function imp.norm
.
pcaMethods::pca
functions. See
pca
for details and additional
parameters.
impute::impute.knn
function. See
impute.knn
for details and additional
parameters.
imputeLCMD::impute.QRILC
function. See impute.QRILC
for details
and additional parameters.
q = 0.01
) of the observed
values in that sample. Implemented in the
imputeLCMD::impute.MinDet
function. See
impute.MinDet
for details and additional
parameters.
q = 0.01
) of the observed values in that
sample. The standard deviation is estimated as the median of the
feature standard deviations. Note that when estimating the
standard deviation of the Gaussian distribution, only the
peptides/proteins which present more than 50% recorded values are
considered. Implemented in the imputeLCMD::impute.MinProb
function. See impute.MinProb
for details
and additional parameters.
mar
for values missing at random and
mnar
for values missing not at random, see example) on two
M[C]AR/MNAR subsets of the data (as defined by the user by a
randna
logical, of length equal to nrow(object)
).
Continuous sets NA
value at the beginning and the end of
the quantitation vectors are set to the lowest observed value in
the data or to a user defined value passed as argument k
.
Them, when a missing value is flanked by two non-missing
neighbouring values, it is imputed by the mean of its direct
neighbours. A stretch of 2 or more missing values will not be
imputed. See the example below.
The naset
MSnSet
is an real quantitative
data where quantitative values have been replaced by NA
s. See
script/naset.R
for details.
Olga Troyanskaya, Michael Cantor, Gavin Sherlock, Pat Brown, Trevor Hastie, Robert Tibshirani, David Botstein and Russ B. Altman, Missing value estimation methods for DNA microarrays Bioinformatics (2001) 17 (6): 520-525.
Oba et al., A Bayesian missing value estimation method for gene expression profile data, Bioinformatics (2003) 19 (16): 2088-2096.
Cosmin Lazar (2015). imputeLCMD: A collection of methods for left-censored missing data imputation. R package version 2.0. http://CRAN.R-project.org/package=imputeLCMD.
data(naset)
## table of missing values along the rows
table(fData(naset)$nNA)
## table of missing values along the columns
pData(naset)$nNA
## non-random missing values
notna <- which(!fData(naset)$randna)
length(notna)
notna
impute(naset, method = "min")
if (require("imputeLCMD")) {
impute(naset, method = "QRILC")
impute(naset, method = "MinDet")
}
if (require("norm"))
impute(naset, method = "MLE")
impute(naset, "mixed",
randna = fData(naset)$randna,
mar = "knn", mnar = "QRILC")
## neighbour averaging
x <- naset[1:4, 1:6]
exprs(x)[1, 1] <- NA ## min value
exprs(x)[2, 3] <- NA ## average
exprs(x)[3, 1:2] <- NA ## min value and average
## 4th row: no imputation
exprs(x)
exprs(impute(x, "nbavg"))
Run the code above in your browser using DataLab