nomprox: Hierarchical Cluster Analysis for Nominal Data Based on a Proximity Matrix

Description

The nomprox() function performs hierarchical cluster analysis in situations when the proximity (dissimilarity) matrix was calculated externally. For instance, in a different R package, in an own-created function, or in other software. It offers three linkage methods that can be used for categorical data. The obtained clusters can be evaluated by seven evaluation indices, see (Sulc et al., 2018).

Usage

nomprox(diss, data = NULL, method = "average", clu.high = 6, eval = TRUE)

Arguments

diss

A proximity matrix or a dist object calculated from the dataset defined in a parameter data.

data

A data.frame or a matrix with cases in rows and variables in colums.

method

A character string defining the clustering method. The following methods can be used: "average", "complete", "single".

clu.high

A numeric value expressing the maximal number of cluster for which the cluster memberships variables are produced.

eval

A logical operator; if TRUE, evaluation of clustering results is performed.

Value

The function returns a list with up to three components:

The mem component contains cluster membership partitions for the selected numbers of clusters in the form of a list.

The eval component contains seven evaluation criteria in as vectors in a list. Namely, Within-cluster mutability coefficient (WCM), Within-cluster entropy coefficient (WCE), Pseudo F Indices based on the mutability (PSFM) and the entropy (PSFE), Bayessian (BIC) and Akaike (AIC) information criteria for categorical data and the BK index. To see them all in once, the form of a data.frame is more appropriate.

The opt component is present in the output together with the eval component. It displays the optimal number of clusters for the evaluation criteria from the eval component, except for WCM and WCE, where the optimal number of clusters is based on the elbow method.

References

Sulc Z., Cibulkova J., Prochazka J., Rezankova H. (2018). Internal Evaluation Criteria for Categorical Data in Hierarchical Clustering: Optimal Number of Clusters Determination, Metodoloski Zveski, 15(2), p. 1-20.

Examples

Run this code

# NOT RUN {
# sample data
data(data20)

# computation of a dissimilarity matrix using the iof similarity measure
diss.matrix <- iof(data20)

# creating an object with results of hierarchical clustering 
hca.object <- nomprox(diss = diss.matrix, data = data20, method = "complete",
 clu.high = 5, eval = TRUE)


# }

Run the code above in your browser using DataLab