randIndex: Calculates Rand type Indices to compare two partitions

Description

Calculates Rand type Indices to compare two partitions

Usage

randIndex(c1, c2 = NULL, noisecluster = NULL)

Value

A list with Rand type indexes:

AR Adjusted Rand index. A number between -1 and 1. The adjusted Rand index is the corrected-for-chance version of the Rand index.
RI Rand index (unadjusted). A number between 0 and 1. Rand index computes the fraction of pairs of objects for which both classification methods agree. RI ranges from 0 (no pair classified in the same way under both clusterings) to 1 (identical clusterings).
MI Mirkin's index. A number between 0 and 1. Mirkin's index computes the percentage of pairs of objects for which both classification methods disagree. MI=1-RI.
HI Hubert index. A number between -1 and 1. HI index is equal to the fraction of pairs of objects for which both classification methods agree minus the fraction of pairs of objects for which both classification methods disagree. HI= RI-MI.

Arguments

c1: labels of the first partition or contingency table. A numeric vector or factor containining the class labels of the first partition or a 2-dimensional numeric matrix which contains the cross-tabulation of cluster assignments.
c2: labels of the second partition. A numeric vector or a factor containining the class labels of the second partition. The length of the vector c2 must be equal to the length of the vector c1. The second parameter is required only if c1 is not a 2-dimensional numeric matrix.
noisecluster: label or number associated to the 'noise class' or 'noise level'. Number or character label which denotes the points which do not belong to any cluster. These points are not takern into account for the computation of the Rand type indexes. The default is to consider all points.

Examples

Run this code

##  1. randindex with the contingency table as input.
T <- matrix(c(1, 1, 0, 1, 2, 1, 0, 0, 4), nrow=3)
(ARI <- randIndex(T))

##  2. randindex with the two vectors as input.
c <- matrix(c(1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3), ncol=2, byrow=TRUE)
## c1 = numeric vector containing the labels of the first partition
c1 <- c[,1]
## c2 = numeric vector containing the labels of the second partition
c2 <- c[,2]

(ARI <- randIndex(c1,c2))

##  3. Compare ARI for iris data (true classification against tclust classification)
library(tclust)
c1 <- iris$Species  # first partition c1 is the true partition
out <- tclust(iris[, 1:4], k=3, alpha=0, restr.fact=100)
c2 <- out$cluster   # second partition c2 is the output of tclust clustering procedure

randIndex(c1,c2)

##  4. Compare ARI for iris data (exclude unassigned units from tclust).

c1 <- iris$Species      # first partition c1 is the true partition
out <- tclust(iris[,1:4], k=3, alpha=0.1, restr.fact=100)
c2 <- out$cluster       #  second partition c2 is the output of tclust clustering procedure

## Units inside c2 which contain number 0 are referred to trimmed observations
noisecluster <- 0
randIndex(c1, c2, noisecluster=0)

Run the code above in your browser using DataLab