Learn R Programming

tclust (version 2.0-5)

tclustIC: Performs cluster analysis by calling tclust for different number of groups k and restriction factors c

Description

Computes the values of BIC (MIXMIX), ICL (MIXCLA) or CLA (CLACLA), for different values of k (number of groups) and different values of c (restriction factor), for a prespecified level of trimming (the last two letters in the name stand for 'Information Criterion').

Usage

tclustIC(
  x,
  kk = 1:5,
  cc = c(1, 2, 4, 8, 16, 32, 64, 128),
  alpha = 0.05,
  whichIC = c("ALL", "MIXMIX", "MIXCLA", "CLACLA"),
  parallel = FALSE,
  n.cores = -1,
  trace = FALSE,
  ...
)

Value

The functions print() and summary() are used to obtain and print a summary of the results. The function returns an S3 object of type tclustIC containing the following components:

  • call the matched call

  • kk a vector containing the values of k (number of components) which have been considered. This vector is identical to the optional argument kk (default is kk=1:5.

  • cc a vector containing the values of c (values of the restriction factor) which have been considered. This vector is identical to the optional argument cc (defalt is cc=c(1, 2, 4, 8, 16, 32, 64, 128).

  • alpha trimming level

  • whichIC Information criteria used

  • CLACLA a matrix of size length(kk)-times-length(cc) containinig the value of the penalized classification likelihood. This output is present only if whichIC="CLACLA" or whichIC="ALL".

  • IDXCLA a matrix of lists of size length(kk)-times-length(cc) containinig the assignment of each unit using the classification model. This output is present only if whichIC="CLACLA" or whichIC="ALL".

  • MIXMIX a matrix of size length(kk)-times-length(cc) containinig the value of the penalized mixtrue likelihood. This output is present only if whichIC="MIXMIX" or whichIC="ALL".

  • IDXMIX a matrix of lists of size length(kk)-times-length(cc) containinig the assignment of each unit using the classification model. This output is present only if whichIC="MIXMIX" or whichIC="ALL".

  • MIXCLA a matrix of size length(kk)-times-length(cc) containinig the value of the ICL criterion. This output is present only if whichIC="MIXCLA" or whichIC="ALL".

Arguments

x

A matrix or data frame of dimension n x p, containing the observations (row-wise).

kk

an integer vector specifying the number of mixture components (clusters) for which the information criteria are be calculated. By default kk=1:5.

cc

an vector specifying the values of the restriction factor which have to be considered. By default cc=c(1, 2, 4, 8, 16, 32, 64, 128).

alpha

The proportion of observations to be trimmed.

whichIC

A character value which specifies which information criteria must be computed for each k (number of groups) and each value of the restriction factor c. Possible values for whichIC are:

  • "MIXMIX": a mixture model is fitted and for computing the information criterion the mixture likelihood is used. This option corresponds to the use of the Bayesian Information criterion (BIC). In output just the matrix MIXMIX is given.

  • "MIXCLA": a mixture model is fitted but to compute the information criterion the classification likelihood is used. This option corresponds to the use of the Integrated Complete Likelihood (ICL). In the output just the matrix MIXCLA is given.

  • "CLACLA": everything is based on the classification likelihood. This information criterion will be called CLA. In the output just the matrix CLACLA is given.

  • "ALL": both classification and mixture likelihood are used. In this case all three information criteria CLA, ICL and BIC are computed. In the output all three matrices MIXMIX, MIXCLA and CLACLA are given.

parallel

A logical value, specifying whether the calls to tclust should be done in parallel.

n.cores

The number of cores to use when paralellizing, only taken into account if parallel=TRUE.

trace

Whether to print intermediate results. Default is trace=FALSE.

...

Further arguments (as e.g. restr), passed to tclust

References

Cerioli, A., Garcia-Escudero, L.A., Mayo-Iscar, A. and Riani M. (2017). Finding the Number of Groups in Model-Based Clustering via Constrained Likelihoods, Journal of Computational and Graphical Statistics, pp. 404-416, https://doi.org/10.1080/10618600.2017.1390469.

See Also

tclust

Examples

Run this code

 #--- EXAMPLE 1 ------------------------------------------
 # \donttest{
 data(geyser2)
 (out <- tclustIC(geyser2, whichIC="MIXMIX", alpha=0.1))
 summary(out)
 ## Find the smallest value inside the table and write the corresponding
 ## values of k (number of groups) and c (restriction factor)
 inds <- which(out$MIXMIX == min(out$MIXMIX), arr.ind=TRUE)
 vals <- out$MIXMIX[inds]
 cat("\nThe smallest value of the IC is ", vals, 
     " and takes place for k=", out$kk[inds[1]], " and c=",   
     out$cc[inds[2]], "\n")
 # }

 #--- EXAMPLE 2 ------------------------------------------
 # \donttest{
 data(flea)
 Y <- as.matrix(flea[, 1:(ncol(flea)-1)])    # select only the numeric variables
 rownames(Y) <- 1:nrow(Y)
 head(Y)

 (out <- tclustIC(Y, whichIC="CLACLA", alpha=0.1))
 summary(out)
 ## Find the smallest value inside the table and write the corresponding
 ## values of k (number of groups) and c (restriction factor)
 inds <- which(out$CLACLA == min(out$CLACLA), arr.ind=TRUE)
 vals <- out$CLACLA[inds]
 cat("\nThe Smallest value of the IC is ", vals, 
     " and takes place for k=", out$kk[inds[1]], " and c=",   
     out$cc[inds[2]], "\n")
 # }

 #--- EXAMPLE 3 ------------------------------------------
 # \donttest{
 data(swissbank)
 (out <- tclustIC(swissbank, whichIC="ALL"))
 
 plot(out)  ##  --> selecting k=3, c=128
 
 ##  the selected model
 plot(tclust(swissbank, k = 3, alpha = 0.1, restr.fact = 128))
 
 # }

Run the code above in your browser using DataLab