EMmixlcd: Estimate the mixture proportions and component densities using EM algorithm

Description

Uses EM algorithm to estimate the mixture proportions and the component densities. The output is an object of class "lcdmix" which contains mixture proportions at each observation and all the information of the estimated component densities.

Usage

EMmixlcd( x, k = 2, y, props, epsratio=10^-6, max.iter=50,
            epstheta=10^-8, verbose=-1 )

Value

An object of class "lcdmix", with the following components:

x: Data copied from input (may be reordered)
logf: An \(n \times k\) maxtrix of the log of the maximum likelihood estimate, evaluated at the observation points for each component.
props: Vector containing the estimated proportions of components
niter: Number of iterations of the EM algorithm
lcdloglik: The log-likelihood after the final iteration

Arguments

x

Data in \(R^d\), in the form of an \(n \times d\) numeric matrix

k

The number of components, equals 2 by default

y

An \(n \times k\) numeric matrix giving the starting values for the EM algorithm. If none given, a hierachical Gaussian clustering model is used. To reduce the computational burden while allowing sufficient flexibility for the EM algorithm, it is recommended to leave this argument unspecified.

props

Vector of length \(k\) containing the starting value of proportions. If none given, a hierachical Gaussian clustering model is used. To reduce the computational burden while allowing sufficient flexibility for the EM algorithm, it is recommended to leave this argument unspecified.

epsratio

EM algorithm will terminate if the increase in the proportion of the likelihood is less than this specified ratio. Default value is \(10^{-6}\).

max.iter

The maximum number of iterations for the EM algorithm

epstheta

\(epstheta/n\) is the thresold of the weight below which data point is discarded from the cluster. This quantity is introduced to increase the computational efficiency and stability.

verbose

-1: (default) prints nothing
0: prints warning messages
\(>0\): prints summary information every \(n\) iterations

Author

Yining Chen

Madeleine Cule

Robert B. Gramacy

Richard Samworth

Details

An introduction to the Em algorithm can be found in McLachlan and Krishnan (1997). Briefly, given the current estimates of the mixture proportions and component densities, we first update the estimates of the mixture prroportions. We then update the estimates of the component densities by using mlelcd. In fact, the incorporation of the weights in the maximization process in mlelcd presents no additional complication.

In our case, because of the computational intensity of the method, we first cluster the points according to ta hierarchical Gaussian clustering model and then iterate the EM algorithm until the increase in the proportion of the likelihood is less than a pre-specified quantity at each step.

More technical details can be found in Cule, Samworth and Stewart(2010)

References

Cule, M. L., Samworth, R. J., and Stewart, M. I. (2010) Maximum likelihood estimation of a log-concave density, Journal of the Royal Statistical Society, Series B, 72(5) p.545-607.

McLachlan, G. J. and Krishnan, T. (1997) The EM Algorithm and Extensions, New York: Wiley.

Examples

Run this code

##Simple bivariate normal data
  set.seed( 1 )
  n = 15
  d = 2
  props=c( 0.6, 0.4 )
  shift=2
  x <- matrix( rnorm( n*d ), ncol = d )
  shiftvec <- ifelse( runif( n ) > props[ 1 ], 0, shift )
  x[,1] <- x[,1] + shiftvec
  EMmixlcd( x, k = 2, max.iter = 2)

Run the code above in your browser using DataLab