The size of a pan-genome is the number of gene clusters in it, both those observed and those
not yet observed.
The input pan.matrix is a Panmat
object, i.e. it is a matrix with one row for each
genome and one column for each observed gene cluster in the pan-genome. See panMatrix
for how to construct such objects.
The number of observed gene clusters is simply the number of columns in pan.matrix. The
number of gene clusters not yet observed is estimated by the Chao lower bound estimator (Chao, 1987).
This is based solely on the number of clusters observed in 1 and 2 genomes. It is a very simple and
conservative estimator, i.e. it is more likely to be too small than too large.