simulatemissings: Artifical simulation of various kinds of missings/polluted data

Description

These are simulation mechanisms to check that missing techniques perform in sensible ways. They just generate additional missings of the various types in a given dataset, according to a specific process.

Usage

simulateMissings(x, dl=NULL, knownlimit=FALSE,
     MARprob=0.0, MNARprob=0.0, mnarity=0.5, SZprob=0.0)
observeWithAdditiveError(x, sigma=dl/dlf, dl=sigma*dlf, dlf=3,
     keepObs=FALSE, digits=NA, obsScale=1,
     class="acomp")

Value

A dataset like x but with some additional missings.

Arguments

x: a dataset that should get the missings
dl: the detection limit described in clo, to impose an artificial detection limit
knownlimit: a boolean indicating wether the actual detection limit is still known in the dataset.
MARprob: the probability of occurence of 'Missings At Random' values
MNARprob: the probability of occurrence of 'Missings Not At Random'. The tendency is that small values have a higher probability to be missed.
mnarity: a number between 0 and 1 giving the strength of the influence of the actual value in becoming a MNAR. 0 means a MAR like behavior and 1 means that it is just the smallest values that is lost
SZprob: the probability to obtain a structural zero. This is done at random like a MAR.
sigma: the standard deviation of the normal distributed extra additive error
dlf: the distance from 0 at which a datum will be considered BDL
keepObs: should the (closed) data without additive error be returned as an attribute?
digits: rounding to be applied to the data with additive error (see Details)
obsScale: rounding to be applied to the data with additive error (see Details). Should be a power of 10.
class: class of the output object

Author

K.Gerald van den Boogaart

Details

Without any additional parameters no missings are generated. The procedure to generate MNAR affects all variables.

Function "simulateMissings" is a multipurpose simulator, where each class of missing value is treated separately, and where detection limits are specified as thresholds.

Function "observeWithAdditiveError" simulates data within a very specific framework, where an additive error of sd=sigma is added to the input data x, and BDLs are generated if a datum is less than dfl times sigma. Afterwards, the resulting data are rounded as round(data/obsScale,digits)*obsScale, i.e. a certain observation scale obsScale is chosen, and at that scale, only some digits are kept. This framework is typical of chemical analyses, and it generates both BDLs and pollution/rounding of (apparently) "right" data.

References

van den Boogaart, K., R. Tolosana-Delgado, and M. Bren (2011). The Compositional Meaning of a Detection Limit. In Proceedings of the 4th International Workshop on Compositional Data Analysis (2011).

van den Boogaart, K.G., R. Tolosana-Delgado and M. Templ (2014) Regression with compositional response having unobserved components or below detection limit values. Statistical Modelling (in press).

See compositions.missings for more details.

Examples

Run this code

data(SimulatedAmounts)
x <- acomp(sa.lognormals)
xnew <- simulateMissings(x,dl=0.05,MAR=0.05,MNAR=0.05,SZ=0.05)
acomp(xnew)
plot(missingSummary(xnew))

Run the code above in your browser using DataLab