emE: EM algorithm starting with E-step for a parameterized MVN mixture model.

Description

Implements the EM algorithm for a parameterized MVN mixture model, starting with the expectation step.

Usage

emE(data, mu, sigmasq, pro, eps, tol, itmax, equalPro, warnSingular,
    Vinv, ...)
emV(data, mu, sigmasq, pro, eps, tol, itmax, equalPro, warnSingular,
    Vinv, ...)
emEII(data, mu, sigmasq, pro, eps, tol, itmax, equalPro, warnSingular,
      Vinv, ...)
emVII(data, mu, sigmasq, pro, eps, tol, itmax, equalPro, warnSingular,
      Vinv, ...)
emEEI(data, mu, decomp, pro, eps, tol, itmax, equalPro, warnSingular,
      Vinv, ...)
emVEI(data, mu, decomp, pro, eps, tol, itmax, equalPro, warnSingular,
      Vinv, ...)
emEVI(data, mu, decomp, pro, eps, tol, itmax, equalPro, warnSingular,
      Vinv, ...)
emVVI(data, mu, decomp, pro, eps, tol, itmax, equalPro, warnSingular,
      Vinv, ...)
emEEE(data, mu, Sigma, pro, eps, tol, itmax, equalPro, warnSingular,
      Vinv, ...)
emEEV(data, mu, decomp, pro, eps, tol, itmax, equalPro, warnSingular,
      Vinv, ...)
emVEV(data, mu, decomp, pro, eps, tol, itmax, equalPro, warnSingular,
      Vinv, ...)
emVVV(data, mu, sigma, pro, eps, tol, itmax, equalPro, warnSingular,
      Vinv, ...)

Arguments

data

A numeric vector, matrix, or data frame of observations. Categorical variables are not allowed. If a matrix or data frame, rows correspond to observations and columns correspond to variables.

The mean for each component. If there is more than one component, mu is a matrix whose columns are the means of the components.

sigmasq

for the one-dimensional models ("E", "V") and spherical models ("EII", "VII"). This is either a vector whose kth component is the variance for the kth component in the mixture model ("V" and "VII"), or a scalar giving the com

decomp

for the diagonal models ("EEI", "VEI", "EVI", "VVI") and some ellipsoidal models ("EEV", "VEV"). This is a list described in more detail in cdens.

Sigma

for the equal variance model "EEE". A d by d matrix giving the common covariance for all components of the mixture model.

sigma

for the unconstrained variance model "VVV". A d by d by G matrix array whose [,,k]th entry is the covariance matrix for the kth component of the mixture model.

...

An argument giving the variance that takes one of the following forms: [object Object],[object Object],[object Object],[object Object],[object Object],The form of the variance specification is the same as for the output for the em,

pro

Mixing proportions for the components of the mixture. There should one more mixing proportion than the number of MVN components if the mixture model includes a Poisson noise term.

eps

A scalar tolerance for deciding when to terminate computations due to computational singularity in covariances. Smaller values of eps allow computations to proceed nearer to singularity. The default is .Mclust$eps

tol

A scalar tolerance for relative convergence of the loglikelihood values. The default is .Mclust$tol.

itmax

An integer limit on the number of EM iterations. The default is .Mclust$itmax.

equalPro

A logical value indicating whether or not the components in the model are present in equal proportions. The default is .Mclust$equalPro.

warnSingular

A logical value indicating whether or not a warning should be issued whenever a singularity is encountered. The default is .Mclust$warnSingular.

Vinv

An estimate of the reciprocal hypervolume of the data region. The default is determined by applying function hypvol to the data. Used only when pro includes an additional mixing proportion for a noise component.

Value

A list including the following components:
zA matrix whose [i,k]th entry is the conditional probability of the ith observation belonging to the kth component of the mixture.
loglikThe logliklihood for the data in the mixture model.
muA matrix whose kth column is the mean of the kth component of the mixture model.
sigmaFor multidimensional models, a three dimensional array in which the [,,k]th entry gives the the covariance for the kth group in the best model.
For one-dimensional models, either a scalar giving a common variance for the groups or a vector whose entries are the variances for each group in the best model.
proA vector whose kth component is the mixing proportion for the kth component of the mixture model.
modelNameCharacter string identifying the model.
Attributes:
- "info": Information on the iteration.
- "warn": An appropriate warning if problems are encountered in the computations.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association 97:611-631. See http://www.stat.washington.edu/mclust. C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density estimation and discriminant analysis. Technical Report, Department of Statistics, University of Washington. See http://www.stat.washington.edu/mclust.

Details

This function can be used with an indirect or list call using do.call, allowing the output of e.g. mstep to be passed without the need to specify individual parameters as arguments.

Examples

Run this code

data(iris)
irisMatrix <- as.matrix(iris[,1:4])
irisClass <- iris[,5]

msEst <- mstepEEE(data = irisMatrix, z = unmap(irisClass))
names(msEst)

emEEE(data = irisMatrix, mu = msEst$mu, pro = msEst$pro,
cholSigma = msEst$cholSigma)
do.call("emEEE", c(list(data=irisMatrix), msEst)) ## alternative call

Run the code above in your browser using DataLab