An open pan-genome means there will always be new gene clusters observed as long as new genomes
are being sequenced. This may sound controversial, but in a pragmatic view, an open pan-genome indicates
that the number of new gene clusters to be observed in future genomes is ‘large’ (but not literally
infinite). Opposite, a closed pan-genome indicates we are approaching the end of new gene clusters.
This function is based on a Heaps law approach suggested by Tettelin et al (2008). The Heaps law model
is fitted to the number of new gene clusters observed when genomes are ordered in a random way. The model
has two parameters, an intercept and a decay parameter called alpha. If alpha>1.0 the
pan-genome is closed, if alpha<1.0 it is open.
The number of permutations, n.perm, should be as large as possible, limited by computation time.
The default value of 100 is certainly a minimum.
Word of caution: The Heaps law assumes independent sampling. If some of the genomes in the data set
form distinct sub-groups in the population, this may affect the results of this analysis severely.