cwm: Fit for the Generalized Linear Mixed CWM

Description

Maximum likelihood fitting of the generalized linear mixed cluster-weighted model by the EM algorithm.

Usage

cwm(formulaY, familyY=gaussian, data,link, Xnorm=NULL, modelXnorm=NULL, Xbin=NULL,
  Xbtrials=NULL, Xpois=NULL, Xmult=NULL, k=1:3, initialization=c("random.soft", 
  "random.hard", "kmeans", "mclust", "manual"), start.z=NULL, seed=NULL, maxR=1,
  iter.max=1000, threshold=1.0e-04, parallel=FALSE)

ICget(object, criteria)
bestmodel(object, criteria, k=NULL, modelXnorm=NULL)
modelget(object, criteria="BIC", k=NULL, modelXnorm=NULL)
## S3 method for class 'cwm':
summary(object, criteria="BIC", k=NULL, modelXnorm=NULL, concomitant=FALSE, 
  digits = getOption("digits")-2, ...)

Arguments

formulaY

an optional object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted.

familyY

the distribution used for $Y|x$ in each mixture component; it can be:

"gaussian"with default"link=identity"
"poisson"with default"link=log"
"binomial"with defau

data

an optional data.frame, list, or environment with the variables

link

a specification for the model link function to be used. See link argument in family.

Xnorm, Xbin, Xpois, Xmult

an optional matrix containing variables to be used for marginalization having normal, binomial, Poisson, multinomial distributions.

modelXnorm

an optional vector of character strings indicating the parsimonious models to be fitted. The default is c("E", "V") for a single continuous covariate, and c("EII", "VII", "EEI", "VEI", "EVI", "VVI", "EEE", "VEE", "EVE", "EEV", "VVE", "

Xbtrials

an optional vector containing the number of trials for each column in Xbin. If omitted, the maximum of each column in Xbin is chosen.

an optional vector containing the numbers of mixture components to be tried. Default value is 1:3.

initialization

an optional character string. It sets the initialization strategy for the EM-algorithm. It can be:

"random.soft"
"random.hard"
"kmeans"
"mclust"
"manual"<

start.z

matrix of soft or hard classification: it is used only if initialization="manual".

seed

an optional scalar. It sets the seed for the random number generator, when random initializations are used; if NULL, current seed is not changed. Default value is NULL.

maxR

number of initializations to be tried.Default value is 1.

iter.max

an optional scalar. It sets the maximum number of iterations in the EM-algorithm. Default value is 200.

threshold

an optional scalar. It sets the threshold for Aitken acceleration procedure. Default value is 1.0e-04.

parallel

When TRUE, the package parallel is used for parallel computation. When several models are estimated, computational time is reduced. The number of cores to use may be s

object

a class cwm object.

concomitant

When TRUE, concomitant variables parameters are displayed. Default is FALSE

digits

integer used for number formatting.

criteria

an optional character string. It sets the information criteria to consider; supported values are: "AIC", "AICc", "AICu", "AIC3", "AWE", "BIC", "CAIC", "ICL". Default value is "BIC".

...

additional arguments affecting the summary produced.

Value

This function returns a class cwm object, which is a list of values related to the model selected. It contains:
callan object of class call.
formulaYan object of class formula containing a symbolic description of the model fitted.
familyYthe distribution used for $Y|x$ in each mixture component.
dataa data.frame with the variables needed to use formulaY.
concomitanta list containing Xnorm, Xbin, Xpois, Xmult.
Xbtrialsnumber of trials used for Xbin.
modelsa list; each element is related to one of the models fitted. Each element is a list and contains:
- posterior
{posterior probabilities}
iter
{number of iterations performed in EM algorithm}
k
{number of (fitted) mixture components.}
size
{estimated size of the groups.}
cluster
{classification vector}
loglik
{final log-likelihood value}
df
{overall number of estimated parameters}
prior
{weights for the mixture components}
IC
{list containing values of the information criteria }
converged
{logical; TRUE if EM algorithm converged}
GLModels
{a list; each element is related to a mixture component. Each element is a list and contains:}
- model
{a "glm" class object.}
sigma
{estimated local scale parameters of $Y|x$, when familyY is gaussian or t}
t_df
{estimated degrees of freedom of the t distribution, when familyY is t}
nuY
{estimated shape parameter, when familyY is Gamma. The gamma distribution is parameterized according to McCullagh, P. and Nelder, J. 1989, p. 30}

item

concomitant {a list with estimated concomitant variables parameters for each mixture component}
normal.mu
normal.Sigma
normal.model
multinomial.model
multinomial.probs
poisson.lambda
binomial.p

itemize

normal.dnorm, multinomial.dmulti, poisson.dpois, binomial.dbin

code

Xbin

Details

When several models have been estimated, methods bestmodel, summary and print consider the models with the best information criteria in criteria, among those with k groups and modelXnorm parsimonious model. If criteria is missing, the model with best BIC is returned. The modelget method returns a cwm object containing the best model according to a single criterion in .

References

Ingrassia, S., Minotti, S. C., and Vittadini, G. (2012). Local Statistical Modeling via the Cluster-Weighted Approach with Elliptical Distributions. Journal of Classification, 29(3), 363-401. Ingrassia, S., Minotti, S. C., and Punzo, A. (2014). Model-based clustering via linear cluster-weighted models. Computational Statistics and Data Analysis, 71, 159-182. Ingrassia, S., Punzo, A., and Vittadini, G. (2015). The Generalized Linear Mixed Cluster-Weighted Model. Journal of Classification, 32(forthcoming) McCullagh, P. and Nelder, J. (1989). Generalized Linear Models. Chapman & Hall, Boca Raton, 2nd edition Punzo, A. (2014). Flexible Mixture Modeling with the Polynomial Gaussian Cluster-Weighted Model. Statistical Modelling, 14(3), 257-291.

Examples

Run this code

data("students")
str(students)
attach(students)

# mixture of Gaussian distributions
res <- cwm(Xnorm=HEIGHT, k=1:3, initialization="kmeans")
summary(res)
plot(res)

# mixture of Gaussian regressions
res2 <- cwm(HEIGHT ~ HEIGHT.F, k=1:3, initialization="mclust")
summary(res2)
plot(res2)

Run the code above in your browser using DataLab