A Panpca
object contains the results of a principal component analysis on a pan-matrix,
see panpca
for details.The plotScores
gives a visual overview of how the genomes are positioned relative to
each other in the pan-genome space. The score-matrix of a Panpca
has one row for each genome.
The original pan-matrix also has one row for each genome. Two genomes can be compared by their
corresponding rows in the pan-matrix, but can also be compared by their rows in the score-matrix,
and the latter matrix has (much) fewer columns designed to contain maximum of the original data
variation. A plot of the scores will give an approximate overview of how the genomes are located
relative to each other.
The plotLoadings
gives a visual overview of how the gene clusters affect the principal
components. The loadings is a matrix with one row for each of the original non-core gene clusters
(core gene clusters have no variation across genomes). Clusters located close to the origin have
little impact. Clusters far from the origin has high impact, indicating they separate groups of genomes.
These two plots together can reveal information about the pan-genome: The score-plot shows if genomes
are grouped/separated, and the loading-plot can then tell you which gene clusters have high impact on
this grouping/separation.
The arguments x and y can be used to plot other components than component 1 and 2
(which is always the most informative). In some cases more components are needed to establish a
good picture, i.e. the explained variance is low for component 1 and 2 (see plot.Panpca
for more on explained variance). It is quite common to plot component 1 versus 2, then 1 versus 3
and finally 2 versus 3.
The argument show.labels can be used to turn off the display of labels, only markers (dots)
will appear.
In plotScores
you can specify alternative labels in labels. By default, the
GID-tag is used for each genome. You can supply a vector of alternative labels. The labels may be
in any order, but the vector must be named by the GID-tags, i.e. each element in labels must
have a name which is a valid GID-tag for some genome. This is necessary to ensure the alternative
labels are placed correctly in the score-space.
There is no alternative labelling of loading-plots, since the gene clusters lack a GID-tag-like system.
You can, however, change the gene cluster names by editing the column names of the pan-matrix directly
before you do the panpca
.
You may color each label/marker individually. In plotScores
you can again supply a vector
of colors, and name every element with a GID-tag to make certain they are used correctly. In
plotLoadings
you can supply a vector of colors, but you must arrange them in proper
order yourself.
Additional arguments are passed on to text
if show.labels=TRUE and to
points
if show.labels=FALSE.