The simulated dataset is composed of \(n=1000\) observations in \(\R^3\). It
consists of an equiprobable mixture of three large "bubble" groups centered at
\(\nu_1=(0,0,0)\), \(\nu_2=(6,0,0)\) and \(\nu_3=(0,6,0)\) respectively. Each
bubble group \(j\) is simulated from a mixture of seven components according
to the following density distribution:
\(x\in\R^3\rightarrow 0.4\Phi(x|\mu_1+\nu_j,I_3)+\sum_{k=2}^70.1\Phi(x|\mu_k+\nu_j,0.1I_3)\)
with \(\mu_1=(0,0,0)\), \(\mu_2=(0,0,1.5)\), \(\mu_3=(0,1.5,0)\), \(\mu_4=(1.5,0,0,)\),
\(\mu_5=(0,0,-1.5)\), \(\mu_6=(0,-1.5,0)\) and \(\mu_7=(-1.5,0,0,)\). Thus the
distribution of the dataset is actually a \(21\)-component Gaussian mixture.
A model collection of spherical Gaussian mixtures is considered and the dataframe
datacapushe
contains the maximum likelihood estimations for each of these models.
The number of free parameters of each model is used for the complexity values and \(pen_{shape}\)
is defined by this complexity divided by \(n\).
datapartialcapushe
and datavalidcapushe
can be used to run the
validation
function. datapartialcapushe
only
contains the models with less than \(21\) components. datavalidcapushe
contains three models with \(30\), \(40\) and \(50\) components respectively.