These are simulation mechanisms to check that missing techniques perform in sensible ways. They just generate additional missings of the various types in a given dataset, according to a specific process.
simulateMissings(x, dl=NULL, knownlimit=FALSE,
MARprob=0.0, MNARprob=0.0, mnarity=0.5, SZprob=0.0)
observeWithAdditiveError(x, sigma=dl/dlf, dl=sigma*dlf, dlf=3,
keepObs=FALSE, digits=NA, obsScale=1,
class="acomp")
A dataset like x
but with some additional missings.
a dataset that should get the missings
the detection limit described in
clo
, to impose an artificial detection limit
a boolean indicating wether the actual detection limit is still known in the dataset.
the probability of occurence of 'Missings At Random' values
the probability of occurrence of 'Missings Not At Random'. The tendency is that small values have a higher probability to be missed.
a number between 0 and 1 giving the strength of the influence of the actual value in becoming a MNAR. 0 means a MAR like behavior and 1 means that it is just the smallest values that is lost
the probability to obtain a structural zero. This is done at random like a MAR.
the standard deviation of the normal distributed extra additive error
the distance from 0 at which a datum will be considered BDL
should the (closed) data without additive error be returned as an attribute?
rounding to be applied to the data with additive error (see Details)
rounding to be applied to the data with additive error (see Details). Should be a power of 10.
class of the output object
K.Gerald van den Boogaart
Without any additional parameters no missings are generated. The procedure to generate MNAR affects all variables.
Function "simulateMissings" is a multipurpose simulator, where each class of missing value is treated separately, and where detection limits are specified as thresholds.
Function "observeWithAdditiveError" simulates data within a very specific
framework, where an additive error of sd=sigma
is added to the input data
x
, and BDLs are generated if a datum is less than dfl
times
sigma
. Afterwards, the resulting data are rounded as
round(data/obsScale,digits)*obsScale
, i.e. a certain observation scale
obsScale
is chosen, and at that scale, only some digits
are kept.
This framework is typical of chemical analyses, and it generates both BDLs and
pollution/rounding of (apparently) "right" data.
van den Boogaart, K., R. Tolosana-Delgado, and M. Bren (2011). The Compositional Meaning of a Detection Limit. In Proceedings of the 4th International Workshop on Compositional Data Analysis (2011).
van den Boogaart, K.G., R. Tolosana-Delgado and M. Templ (2014) Regression with compositional response having unobserved components or below detection limit values. Statistical Modelling (in press).
See compositions.missings for more details.
compositions.missings
data(SimulatedAmounts)
x <- acomp(sa.lognormals)
xnew <- simulateMissings(x,dl=0.05,MAR=0.05,MNAR=0.05,SZ=0.05)
acomp(xnew)
plot(missingSummary(xnew))
Run the code above in your browser using DataLab