The size of a pan-genome is the number of gene clusters in it, both those observed and those
not yet observed.
The input pan.matrix is a a matrix with one row for each
genome and one column for each observed gene cluster in the pan-genome. See panMatrix
for how to construct this.
The number of observed gene clusters is simply the number of columns in pan.matrix. The
number of gene clusters not yet observed is estimated by the Chao lower bound estimator (Chao, 1987).
This is based solely on the number of clusters observed in 1 and 2 genomes. It is a very simple and
conservative estimator, i.e. it is more likely to be too small than too large.