emclust: BIC from hierarchical clustering followed by EM for several parameterized Gaussian mixture models.

Description

Bayesian Information Criterion for various models and numbers of clusters computed from hierarchical clustering followed by EM for several parameterizations of Gaussian mixture models possibly with Poisson noise.

Usage

emclust(data, nclus, modelid, k, equal=F, noise, Vinv)

Arguments

data

matrix of observations.

nclus

An integer vector specifying the numbers of clusters for which the BIC is to be calculated. Default: 1:9 without noise; 0:9 with noise.

modelid

A vector of character strings indicating the models to be fitted. The allowed values or modelid and their interpretation are as follows: "EI" : uniform spherical, "VI" : spherical, "EEE" : uniform varian

If k is specified, the hierarchical clustering phase will use a sample of size k of the data in the initial hierarchical clustering phase. The default is to use the entire data set.

equal

Logical variable indicating whether or not the mixing proportions are equal in the model. The default is to assume they are unequal.

noise

A logical vector of length equal to the number of observations in the data, whose elements indicate an initial estimate of noise (indicated by T) in the data. By default, emclust fits Gaussian mixture models in which it is assum

Vinv

An estimate of the inverse hypervolume of the data region (needed only if noise is specified). Default : determined by function hypvol

Value

Bayesian Information Criterion for the six mixture models and specified numbers of clusters. Auxiliary information returned as attributes.

NOTE

The hierarchical clustering phase uses the unconstrained model. The reciprocal condition estimate returned as an attribute ranges in value between 0 and 1. The closer this estimate is to zero, the more likely it is that the corresponding EM result (and BIC) are contaminated by roundoff error.

References

C. Fraley and A. E. Raftery, How many clusters? Which clustering method? Answers via model-based cluster analysis.Technical Report No. 329, Dept. of Statistics, U. of Washington (February 1998).

R. Kass and A. E. Raftery, Bayes Factors. Journal of the American Statistical Association90:773-795 (1995).

Examples

Run this code

data(iris)
bicvals _ emclust(iris[,1:4], nclus=1:3, modelid=c("VVV","EEV","VEV"))

data(chevron)
noisevec _ rep(0, nrow(chevron))
noisevec[chevron[,2]>60] _ 1
bicvals _ emclust(chevron, noise=noisevec)
sumry _ summary(bicvals, chevron)
plot(chevron, col=ztoc(sumry$z), pch=ztoc(sumry$z))

Run the code above in your browser using DataLab