Learn R Programming

compositions (version 2.0-0)

simulatemissings: Artifical simulation of various kinds of missings/polluted data

Description

These are simulation mechanisms to check that missing techniques perform in sensible ways. They just generate additional missings of the various types in a given dataset, according to a specific process.

Usage

simulateMissings(x, dl=NULL, knownlimit=FALSE,
     MARprob=0.0, MNARprob=0.0, mnarity=0.5, SZprob=0.0)
observeWithAdditiveError(x, sigma=dl/dlf, dl=sigma*dlf, dlf=3,
     keepObs=FALSE, digits=NA, obsScale=1,
     class="acomp")

Arguments

x

a dataset that should get the missings

dl

the detection limit described in clo, to impose an artificial detection limit

knownlimit

a boolean indicating wether the actual detection limit is still known in the dataset.

MARprob

the probability of occurence of 'Missings At Random' values

MNARprob

the probability of occurrence of 'Missings Not At Random'. The tendency is that small values have a higher probability to be missed.

mnarity

a number between 0 and 1 giving the strength of the influence of the actual value in becoming a MNAR. 0 means a MAR like behavior and 1 means that it is just the smallest values that is lost

SZprob

the probability to obtain a structural zero. This is done at random like a MAR.

sigma

the standard deviation of the normal distributed extra additive error

dlf

the distance from 0 at which a datum will be considered BDL

keepObs

should the (closed) data without additive error be returned as an attribute?

digits

rounding to be applied to the data with additive error (see Details)

obsScale

rounding to be applied to the data with additive error (see Details). Should be a power of 10.

class

class of the output object

Value

A dataset like x but with some additional missings.

Details

Without any additional parameters no missings are generated. The procedure to generate MNAR affects all variables.

Function "simulateMissings" is a multipurpose simulator, where each class of missing value is treated separately, and where detection limits are specified as thresholds.

Function "observeWithAdditiveError" simulates data within a very specific framework, where an additive error of sd=sigma is added to the input data x, and BDLs are generated if a datum is less than dfl times sigma. Afterwards, the resulting data are rounded as round(data/obsScale,digits)*obsScale, i.e. a certain observation scale obsScale is chosen, and at that scale, only some digits are kept. This framework is typical of chemical analyses, and it generates both BDLs and pollution/rounding of (apparently) "right" data.

References

van den Boogaart, K., R. Tolosana-Delgado, and M. Bren (2011). The Compositional Meaning of a Detection Limit. In Proceedings of the 4th International Workshop on Compositional Data Analysis (2011).

van den Boogaart, K.G., R. Tolosana-Delgado and M. Templ (2014) Regression with compositional response having unobserved components or below detection limit values. Statistical Modelling (in press).

See compositions.missings for more details.

See Also

compositions.missings

Examples

Run this code
# NOT RUN {
data(SimulatedAmounts)
x <- acomp(sa.lognormals)
xnew <- simulateMissings(x,dl=0.05,MAR=0.05,MNAR=0.05,SZ=0.05)
acomp(xnew)
plot(missingSummary(xnew))
# }

Run the code above in your browser using DataLab