The general purpose of the package is to discover, or explain, group structures in multivariate data sets with unknown
(cluster analysis or clustering) or known class discriminant analysis or classification). It is an exploratory data
analysis tool for solving clustering and classification problems. But it can also be regarded as a semi-parametric tool
to estimate densities with Gaussian mixture distributions and multinomial distributions.
Mathematically, mixture probability density function (pdf) \(f\) is a weighted sum of \(K\) components densities:
$$
f({\bf x}_i|\theta) = \sum_{k=1}^{K}p_kh({\bf x}_i|\lambda_k)
$$
where \(h(.|{\lambda}_k)\) denotes a \(d\)-dimensional distribution parametrized by \(\lambda_k\).
The parameters are the mixing proportions \(p_k\) and the component of the distribution \(\lambda_k\).
In the Gaussian case, \(h\) is the density of a Gaussian distribution with mean \(\mu_k\) and variance
matrix \(\Sigma_k\), and thus \(\lambda_k = (\mu_k,\Sigma_k)\).
In the qualitative case, \(h\) is a multinomial distribution and \(\lambda_k=(a_k,\epsilon_k)\) is the parameter
of the distribution.
Estimation of the mixture parameters is performed either through maximum likelihood via the EM
(Expectation Maximization, Dempster et al. 1977), the SEM (Stochastic EM, Celeux and Diebolt 1985) algorithm
or through classification maximum likelihood via the CEM algorithm (Clustering EM, Celeux and Govaert 1992).
These three algorithms can be chained to obtain original fitting strategies (e.g. CEM then EM with results of CEM) to use
advantages of each of them in the estimation process. As mixture problems usually have multiple relative maxima,
the program will produce different results, depending on the initial estimates supplied by the user. If the user does
not input his own initial estimates, some initial estimates procedures are proposed (random centers for instance).
It is possible to constrain some input parameters. For example, dispersions can be equal between classes, etc.
In the Gaussian case, fourteen models are implemented. They are based on the eigenvalue decomposition, are most generally
used. They depend on constraints on the variance matrix such as same variance matrix between clusters, spherical variance
matrix... and they are suitable for data sets in any dimension.
In the qualitative case, five multinomial models are available. They are based on a reparametrization of the multinomial
probabilities.
In both cases, the models and the number of clusters can be chosen by different criteria:
BIC (Bayesian Information Criterion), ICL (Integrated Completed Likelihood, a classification version of BIC),
NEC (Entropy Criterion), or Cross-Validation (CV).