Learn R Programming

simFrame (version 0.5.4)

contaminate: Contaminate data

Description

Generic function for contaminating data.

Usage

contaminate(x, control, …)

# S4 method for data.frame,ContControl contaminate(x, control, i)

Arguments

x

the data to be contaminated.

control

a control object of a class inheriting from the virtual class "VirtualContControl" or a character string specifying such a control class (the default being "DCARContControl").

i

an integer giving the element of the slot epsilon of control to be used as contamination level.

if control is a character string or missing, the slots of the control object may be supplied as additional arguments. See "'>DCARContControl" and "'>DARContControl" for details on the slots.

Value

A data.frame containing the contaminated data. In addition, the column ".contaminated", which consists of logicals indicating the contaminated observations, is added to the data.frame.

Methods

x = "data.frame", control = "character"

contaminate data using a control class specified by the character string control. The slots of the control object may be supplied as additional arguments.

x = "data.frame", control = "ContControl"

contaminate data as defined by the control object control.

x = "data.frame", control = "missing"

contaminate data using a control object of class "ContControl". Its slots may be supplied as additional arguments.

Details

With the control classes implemented in simFrame, contamination is modeled as a two-step process. The first step is to select observations to be contaminated, the second is to model the distribution of the outliers.

In order to extend the framework by a user-defined control class "MyContControl" (which must extend "'>VirtualContControl"), a method contaminate(x, control, i) with signature 'data.frame, MyContControl' needs to be implemented. In case the contaminated observations need to be identified at a later stage of the simulation, e.g., if conflicts with inserting missing values should be avoided, a logical indicator variable ".contaminated" should be added to the returned data set.

References

Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1--36. 10.18637/jss.v037.i03.

Alfons, A., Templ, M. and Filzmoser, P. (2010) Contamination Models in the R Package simFrame for Statistical Simulation. In Aivazian, S., Filzmoser, P. and Kharin, Y. (editors) Computer Data Analysis and Modeling: Complex Stochastic Data and Systems, volume 2, 178--181. Minsk. ISBN 978-985-476-848-9.

B<U+00E9>guin, C. and Hulliger, B. (2008) The BACON-EEM Algorithm for Multivariate Outlier Detection in Incomplete Survey Data. Survey Methodology, 34(1), 91--103.

Hulliger, B. and Schoch, T. (2009) Robust Multivariate Imputation with Survey Data. 57th Session of the International Statistical Institute, Durban.

See Also

"'>DCARContControl", "'>DARContControl", "'>ContControl", "'>VirtualContControl"

Examples

Run this code
# NOT RUN {
## distributed completely at random
data(eusilcP)
sam <- draw(eusilcP[, c("id", "eqIncome")], size = 20)

# using a control object
dcarc <- ContControl(target = "eqIncome", epsilon = 0.05,
    dots = list(mean = 5e+05, sd = 10000), type = "DCAR")
contaminate(sam, dcarc)

# supply slots of control object as arguments
contaminate(sam, target = "eqIncome", epsilon = 0.05,
    dots = list(mean = 5e+05, sd = 10000))


## distributed at random
foo <- generate(size = 10, distribution = rnorm,
    dots = list(mean = 0, sd = 2))

# using a control object
darc <- DARContControl(target = "V1",
    epsilon = 0.2, fun = function(x) x * 100)
contaminate(foo, darc)

# supply slots of control object as arguments
contaminate(foo, "DARContControl", target = "V1",
    epsilon = 0.2, fun = function(x) x * 100)
# }

Run the code above in your browser using DataLab