Last chance! 50% off unlimited learning
Sale ends in
purity
and entropy
respectively compute the purity and the entropy of a
clustering given a priori known classes.The purity and entropy measure the ability of a clustering method, to recover known classes (e.g. one knows the true class labels of each sample), that are applicable even when the number of cluster is different from the number of known classes. Kim et al. (2007) used these measures to evaluate the performance of their alternate least-squares NMF algorithm.
purity(x, y, ...) entropy(x, y, ...)
## S3 method for class 'NMFfitXn,ANY':
purity(x, y, method = "best",
...)
## S3 method for class 'NMFfitXn,ANY':
entropy(x, y, method = "best",
...)
predict
, which gives the cluster membership
for each sample.x
is a contingency table.'best'
or
'mean'
to compute the best or mean purity
respectively.the entropy (i.e. a single numeric value)
The purity of the clustering with respect to the known
categories is given by:
where:
The purity is therefore a real number in $[0,1]$. The larger the purity, the better the clustering performance.
The entropy of the clustering with respect to the known
categories is given by:
where:
The smaller the entropy, the better the clustering performance.
sparseness
# roxygen generated flag
options(R_CHECK_RUNNING_EXAMPLES_=TRUE)
# generate a synthetic dataset with known classes: 50 features, 18 samples (5+5+8)
n <- 50; counts <- c(5, 5, 8);
V <- syntheticNMF(n, counts)
cl <- unlist(mapply(rep, 1:3, counts))
# perform default NMF with rank=2
x2 <- nmf(V, 2)
purity(x2, cl)
entropy(x2, cl)
# perform default NMF with rank=2
x3 <- nmf(V, 3)
purity(x3, cl)
entropy(x3, cl)
Run the code above in your browser using DataLab