Learn R Programming

dendextend (version 1.3.0)

FM_index_permutation: Calculating Fowlkes-Mallows Index under H0

Description

Calculating Fowlkes-Mallows index under the null hypothesis of no relation between the clusterings (random order of the items labels).

Usage

FM_index_permutation(A1_clusters, A2_clusters, warn = dendextend_options("warn"), ...)

Arguments

A1_clusters
a numeric vector of cluster grouping (numeric) of items, with a name attribute of item name for each element from group A1. These are often obtained by using some k cut on a dendrogram.
A2_clusters
a numeric vector of cluster grouping (numeric) of items, with a name attribute of item name for each element from group A2. These are often obtained by using some k cut on a dendrogram.
warn
logical (default from dendextend_options("warn") is FALSE). Set if warning are to be issued, it is safer to keep this at TRUE, but for keeping the noise down, the default is FALSE.
...
Ignored (passed to FM_index_R/FM_index_profdpm).

Value

The Fowlkes-Mallows index between two vectors of clustering groups. Under H0. (a double without attr)

References

Fowlkes, E. B.; Mallows, C. L. (1 September 1983). "A Method for Comparing Two Hierarchical Clusterings". Journal of the American Statistical Association 78 (383): 553.

http://en.wikipedia.org/wiki/Fowlkes-Mallows_index

See Also

cor_bakers_gamma, FM_index_profdpm, FM_index_R, FM_index

Examples

Run this code

## Not run: 
# 
# set.seed(23235)
# ss <- TRUE # sample(1:150, 10 )
# hc1 <- hclust(dist(iris[ss,-5]), "com")
# hc2 <- hclust(dist(iris[ss,-5]), "single")
# # dend1 <- as.dendrogram(hc1)
# # dend2 <- as.dendrogram(hc2)
# #    cutree(dend1)   
# 
# # small k
# A1_clusters <- cutree(hc1, k=3) # will give a right tailed distribution
# # large k
# A1_clusters <- cutree(hc1, k=50) # will give a discrete distribution
# # "medium" k
# A1_clusters <- cutree(hc1, k=25) # gives almost the normal distribution!
# A2_clusters <- A1_clusters
# 
# R <- 10000
# set.seed(414130)
# FM_index_H0 <- replicate(R, FM_index_permutation(A1_clusters, A2_clusters)) # can take 10 sec
# plot(density(FM_index_H0), main = "FM Index distribution under H0\n (10000 permutation)")
# abline(v = mean(FM_index_H0), col = 1, lty = 2)
# # The permutation distribution is with a heavy right tail:
# library(psych)
# skew(FM_index_H0) # 1.254
# kurtosi(FM_index_H0) # 2.5427
# 
# mean(FM_index_H0); var(FM_index_H0)
# the_FM_index <- FM_index(A1_clusters, A2_clusters)
# the_FM_index
# our_dnorm <- function(x) {
#    dnorm(x, mean = attr(the_FM_index, "E_FM"), 
#          sd = sqrt(attr(the_FM_index, "V_FM")))
# }
# # our_dnorm(0.35)
# curve(our_dnorm,
#       col = 4,
#       from = -1,to=1,n=R,add = TRUE)
# abline(v = attr(the_FM_index, "E_FM"), col = 4, lty = 2)
# 
# legend("topright", legend = c("asymptotic", "permutation"), fill = c(4,1))
# ## End(Not run)

Run the code above in your browser using DataLab