Learn R Programming

capushe (version 1.1.2)

datacapushe: datacapushe

Description

A dataframe example for the capushe package based on a simulated Gaussian mixture dataset in \(\R^3\).

Usage

data(datacapushe)

Arguments

Format

A data frame with 50 rows (models) and the following 4 variables:

model

a character vector

: model names.
pen

a numeric vector

: model penalty shape values.
complexity

a numeric vector

: model complexity values.
contrast

a numeric vector

: model contrast values.

Details

The simulated dataset is composed of \(n=1000\) observations in \(\R^3\). It consists of an equiprobable mixture of three large "bubble" groups centered at \(\nu_1=(0,0,0)\), \(\nu_2=(6,0,0)\) and \(\nu_3=(0,6,0)\) respectively. Each bubble group \(j\) is simulated from a mixture of seven components according to the following density distribution:

\(x\in\R^3\rightarrow 0.4\Phi(x|\mu_1+\nu_j,I_3)+\sum_{k=2}^70.1\Phi(x|\mu_k+\nu_j,0.1I_3)\)

with \(\mu_1=(0,0,0)\), \(\mu_2=(0,0,1.5)\), \(\mu_3=(0,1.5,0)\), \(\mu_4=(1.5,0,0,)\), \(\mu_5=(0,0,-1.5)\), \(\mu_6=(0,-1.5,0)\) and \(\mu_7=(-1.5,0,0,)\). Thus the distribution of the dataset is actually a \(21\)-component Gaussian mixture.

A model collection of spherical Gaussian mixtures is considered and the dataframe datacapushe contains the maximum likelihood estimations for each of these models. The number of free parameters of each model is used for the complexity values and \(pen_{shape}\) is defined by this complexity divided by \(n\).

datapartialcapushe and datavalidcapushe can be used to run the validation function. datapartialcapushe only contains the models with less than \(21\) components. datavalidcapushe contains three models with \(30\), \(40\) and \(50\) components respectively.

References

Article: Baudry, J.-P., Maugis, C. and Michel, B. (2011) Slope heuristics: overview and implementation. Statistics and Computing, to appear. doi: 10.1007/ s11222-011-9236-1

Examples

Run this code
data(datacapushe)
capushe(datacapushe,n=1000)
## BIC, DDSE and Djump all three select the true model
plot(capushe(datacapushe))
## Validation:
data(datapartialcapushe)
capushepartial=capushe(datapartialcapushe)
data(datavalidcapushe)
validation(capushepartial,datavalidcapushe) ## The slope heuristics should not 
## be applied for datapartialcapushe.

Run the code above in your browser using DataLab