Learn R Programming

GSE (version 4.2-1)

simulation-tools: Data generator for simulation study on cell- and case-wise contamination

Description

Includes the data generator for the simulation study on cell- and case-wise contamination that appears on Agostinelli et al. (2014).

Usage

generate.randcorr(cond, p, tol=1e-5, maxits=100) 

generate.cellcontam(n, p, cond, contam.size, contam.prop, A=NULL)

generate.casecontam(n, p, cond, contam.size, contam.prop, A=NULL)

Value

generate.randcorr gives the random correlation matrix in dimension p and with condition number cond.

generate.cellcontam and generate.casecontam give the multivariate normal sample that is either cell-wise or case-wise contaminated as described in Agostinelli et al. (2014). The contaminated sample is returned as components of a list with components

xmultivariate normal sample with cell- or case-wise contamination.
un by p matrix of 0's and 1's with 1's correspond to an outlier. A row of 1's correspond to a case-wise outlier.
Arandom correlation matrix with a specified condition number.

Arguments

cond

desired condition number of the random correlation matrix. The correlation matrix will be used to generate multivariate normal samples in generate.cellcontam and generate.cellcontam.

tol

tolerance level for the condition number of the random correlation matrix. Default is 1e-5.

maxits

integer indicating the maximum number of iterations until the condition number of the random correlation matrix is within a tolerance level. Default is 100.

n

integer indicating the number of observations to be generated.

p

integer indicating the number of variables to be generated.

contam.size

size of cell- or case-wise contamination. For cell-wise outliers, random cells in a data matrix are replaced by contam.dist. For case-wise outliers, random cases in a data matrix are replaced by contam.dist times \(v\) where \(v\)

contam.prop

proportion of cell- or case-wise contamination.

A

correlation matrix used for generating data. If A is NULL, a random correlation matrix is generated. Default is NULL.

Author

Andy Leung andy.leung@stat.ubc.ca, Claudio Agostinelli, Ruben H. Zamar, Victor J. Yohai

Details

Details about how the correlation matrix is randomly generated and how the contaminated data is generated can be found in Agostinelli et al. (2014).

References

Agostinelli, C., Leung, A. , Yohai, V.J., and Zamar, R.H. (2014) Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination. arXiv:1406.6031[math.ST]

See Also

TSGS