scenarioCor: Simulated quantitative data according SRUW modeling

Description

The dataset consists of 2000 data points in \(R^{14}\). On the subset of relevant clustering variables \(S = \{1, 2\}\), data are distributed from a mixture of four equiprobable spherical Gaussian distributions with means \((0,0), (4,0) (0,2)\) and \((4,2)\). The subset of redundant variables is \(U =\{3-11\}\) that are explained by the subset of predictor variables \(R = \{1,2\}\). The last three variables are independent \(W = \{11, 12, 13\}\).

Arguments

Format

A data matrix with 2000 observations on 14 variables and the last column contains the labels.

scenarioCor[,1:14]: a numeric matrix containing the observations
scenarioCor[,15]: an integer vector containing the labels

Details

The subset \(U\) of redundant variables is simulated as follows :

\(x^{U} = (0,0, 0.4, 0.8, ..., 2) + x^{S} b + \varepsilon\), with \(\varepsilon \sim N(0_9, \Omega)\)

The subset \(W\) of independent variables is simulated as follows :

\(x^{W} \sim N((3.2, 3.6, 4), I_3)\)

For more details on the regression coefficients \(b\) and the covariance matrix \(\Omega\) see Maugis et al.(2009).

References

Maugis, C., Celeux, G., and Martin-Magniette, M. L., 2009. "Variable selection in model-based clustering: A general variable role modeling". Computational Statistics and Data Analysis, vol. 53/11, pp. 3872-3882.

Examples

Run this code

# NOT RUN {
data(scenarioCor)
# }

Run the code above in your browser using DataLab