Mclust: Model-Based Clustering

Description

Clustering via EM initialized by hierarchical clustering for parameterized Gaussian mixture models. The number of clusters and the clustering model is chosen to maximize the BIC.

Usage

Mclust(data, minG, maxG)

Arguments

data

A numeric vector, matrix, or data frame of observations. Categorical variables are not allowed. If a matrix or data frame, rows correspond to observations and columns correspond to variables.

minG

An integer vector specifying the minimum number of mixture components (clusters) to be considered. The default is 1 component.

maxG

An integer vector specifying the maximum number of mixture components (clusters) to be considered. The default is 9 components.

Value

A list representing the best model (according to BIC) for the given range of numbers of clusters. The following components are included:
BICA matrix giving the BIC value for each model (rows) and number of clusters (columns).
bicA scalar giving the optimal BIC value.
modelNameThe MCLUST name for the best model according to BIC.
classificationThe classification corresponding to the optimal BIC value.
uncertaintyThe uncertainty in the classification corresponding to the optimal BIC value.
muFor multidimensional models, a matrix whose columns are the means of each group in the best model. For one-dimensional models, a vector whose entries are the means for each group in the best model.
sigmaFor multidimensional models, a three dimensional array in which sigma[,,k] gives the covariance for the kth group in the best model. For one-dimensional models, either a scalar giving a common variance for the groups or a vector whose entries are the variances for each group in the best model.
proThe mixing probabilities for each component in the best model.
zA matrix whose [i,k]th entry is the probability that observation i belongs to the k component in the model. The optimal classification is derived from this, chosing the class to be the one giving the maximum probability.
loglikThe log likelihood for the data under the best model.

Details

The following models are compared in Mclust: "E" for spherical, equal variance (one-dimensional) "V" for spherical, variable variance (one-dimensional) "EII": spherical, equal volume "VII": spherical, unequal volume "EEI": diagonal, equal volume, equal shape "VVI": diagonal, varying volume, varying shape "EEE": ellipsoidal, equal volume, shape, and orientation "VVV": ellipsoidal, varying volume, shape, and orientation Mclust is intended to combine EMclust and its summary in a simiplified one-step model-based clustering function. The latter provide more flexibility including choice of models.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association 97:611-631. See http://www.stat.washington.edu/mclust. C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density estimation and discriminant analysis. Technical Report, Department of Statistics, University of Washington. See http://www.stat.washington.edu/mclust.

Examples

Run this code

data(iris)
irisMatrix <- as.matrix(iris[,1:4])
irisClass <- iris[,5]
irisMclust <- Mclust(irisMatrix)

plot(irisMclust,irisMatrix)

Run the code above in your browser using DataLab