Learn R Programming

norMmix (version 0.2-0)

norMmixMLE: Maximum Likelihood Estimation for Multivariate Normal Mixtures

Description

Direct Maximum Likelihood Estimation (MLE) for multivariate normal mixture models "norMmix". Starting from a clara (package cluster) clustering plus one M-step by default, or alternatively from the default start of (package) mclust, perform direct likelihood maximization via optim().

Usage

norMmixMLE(x, k,
           model = c("EII", "VII", "EEI", "VEI", "EVI",
                     "VVI", "EEE", "VEE", "EVV", "VVV"),
           initFUN = claraInit,
           ll = c("nmm", "mvt"),
           keep.optr = TRUE, keep.data = keep.optr,
           method = "BFGS", maxit = 100, trace = 2,
           optREPORT = 10, reltol = sqrt(.Machine$double.eps),
 	   ...)

claraInit(x, k, samples = 128, sampsize = ssClara2kL, trace) mclVVVinit(x, k, ...)

ssClara2kL(n, k, p)

Value

norMmixMLE returns an object of class

"norMmixMLE" which is a list with components

norMmix

the "norMmix" object corresponding to the specified model and the fitted (MLE) parameter vector.

optr

(if keep.optr is true:) the [r]eturn value of optimization, i.e., currently, optim().

npar

the number of free parameters, a function of \((p, k, model)\).

n

the sample size, i.e., the number of observations or rows of x.

cond

the result of (the hidden function) parcond(..), that is the ratio of sample size over parameter count.

x

(if keep.optr is true:) the \(n \times p\) data matrix.

Arguments

x

numeric [n x p] matrix

k

positive number of components

model

a character string, specifying the model (for the k covariance matrices) to be assumed.

initFUN

a function, that takes arguments x and k and returns a clustering index; a vector of length \(p = \)ncol(x), with entries in 1:k.

ll

a string specifying the method to be used for the likelihood computation; the default, "nmm" uses llnorMmix(), whereas "mvt" uses llmvtnorm() which is based on the MV normal density from package mvtnorm.

keep.optr, keep.data

logical, each indicating of the optimization result (from optim(), currently), or the data x respectively, should be saved as part of the result (function ‘value’, see also below).

method, maxit, optREPORT, reltol

arguments for tuning the optimizer optim(*, method=method, control = list(...)).

trace
in norMmixMLE():

passed to optim(*, control=..), see above.

in claraInit():

a non-negative integer indicating how much clara() calls should be traced.

...
in norMmixMLE():

passed to optim(*, control=..), see above.

in mclVVVinit():

further arguments passed to (package mclust) function hcVVV().

samples

the number of subsamples to take in clara(), package cluster, see its help.

sampsize

the sample size to take in clara(), package cluster. Here, can be a positive integer or, as by default, a function with arguments (n,k,p).

n,p

matrix dimensions nrow(x) and ncol(x).

Details

By default, initFUN=claraInit, uses clara() and one M-step from EM-algorithm to initialize parameters after that uses general optimizer optim() to calculate the MLE.

To silence the output of norMmixMLE, set optREPORT very high and trace to 0. For details on output behavior, see the "details" section of optim.

Examples

Run this code
MW214
set.seed(105)
x <- rnorMmix(1000, MW214)

## Fitting, assuming to know the true model (k=6, "VII")
fm1  <- norMmixMLE(x, k = 6, model = "VII", initFUN=claraInit)
fm1 # {using print.norMmixMLE() method}
fm1M <- norMmixMLE(x, k = 6, model = "VII", initFUN=mclVVVinit)

## Fitting "wrong" overparametrized model: typically need more iterations:
fmW <- norMmixMLE(x, k = 7, model = "VVV", maxit = 200, initFUN=claraInit)
## default maxit=100 is often too small    ^^^^^^^^^^^


x <- rnorMmix(2^12, MW51)
fM5 <- norMmixMLE(x, k = 4) # k = 3 is sufficient
fM5
c(logLik = logLik(fM5), AIC = AIC(fM5), BIC = BIC(fM5))
plot(fM5, show.x=FALSE)
plot(fM5, lwd=3, pch.data=".")

# this takes several seconds
 fM5big <- norMmixMLE(x, model = "VVV", k = 4, maxit = 300) # k = 3 is sufficient
 summary(warnings())
 fM5big ; c(logLik = logLik(fM5big), AIC = AIC(fM5big), BIC = BIC(fM5big))
 plot(fM5big, show.x=FALSE)

Run the code above in your browser using DataLab