The pan-matrix is a central data structure for pan-genomic analysis. It is a matrix with
one row for each genome in the study, and one column for each gene cluster. Cell [i,j]
contains an integer indicating how many members genome i has in cluster j.
The input clustering
must be a named integer vector with one element for each sequence in the study,
typically produced by either bClust
or dClust
. The name of each element
is a text identifying every sequence. The value of each element indicates the cluster, i.e. those
sequences with identical values are in the same cluster. IMPORTANT: The name of each sequence must
contain the genome_id for each genome, i.e. they must of the form GID111_seq1, GID111_seq2,...
where the GIDxxx part indicates which genome the sequence belongs to. See panPrep
for details.
The rows of the pan-matrix is named by the genome_id for every genome. The columns are just named
Cluster_x where x is an integer copied from clustering.