plotScores: Plotting scores and loadings in a `Panpca` object

Description

Creates informative plots for a principal component analysis of a pan-matrix.

Usage

plotScores(pan.pca, x = 1, y = 2, show.labels = TRUE, labels = NULL, col = "black", pch = 16, ...)
plotLoadings(pan.pca, x = 1, y = 2, show.labels = TRUE, col = "black", pch = 16, ...)

Arguments

pan.pca

A Panpca object, see panpca for details.

The component to display along the horizontal axis.

The component to display along the vertical axis.

show.labels

Logical indicating if labels should be displayed.

labels

Alternative labels to use in the score-plot, see below.

col

Colors for the points/labels, see below.

pch

Marker type, see points.

...

Additional arguments passed on to points or text (if labels are specified).

Details

A Panpca object contains the results of a principal component analysis on a pan-matrix, see panpca for details.

The plotScores gives a visual overview of how the genomes are positioned relative to each other in the pan-genome space. The score-matrix of a Panpca has one row for each genome. The original pan-matrix also has one row for each genome. Two genomes can be compared by their corresponding rows in the pan-matrix, but can also be compared by their rows in the score-matrix, and the latter matrix has (much) fewer columns designed to contain maximum of the original data variation. A plot of the scores will give an approximate overview of how the genomes are located relative to each other.

The plotLoadings gives a visual overview of how the gene clusters affect the principal components. The loadings is a matrix with one row for each of the original non-core gene clusters (core gene clusters have no variation across genomes). Clusters located close to the origin have little impact. Clusters far from the origin has high impact, indicating they separate groups of genomes.

These two plots together can reveal information about the pan-genome: The score-plot shows if genomes are grouped/separated, and the loading-plot can then tell you which gene clusters have high impact on this grouping/separation.

The arguments x and y can be used to plot other components than component 1 and 2 (which is always the most informative). In some cases more components are needed to establish a good picture, i.e. the explained variance is low for component 1 and 2 (see plot.Panpca for more on explained variance). It is quite common to plot component 1 versus 2, then 1 versus 3 and finally 2 versus 3.

The argument show.labels can be used to turn off the display of labels, only markers (dots) will appear.

In plotScores you can specify alternative labels in labels. By default, the GID-tag is used for each genome. You can supply a vector of alternative labels. The labels may be in any order, but the vector must be named by the GID-tags, i.e. each element in labels must have a name which is a valid GID-tag for some genome. This is necessary to ensure the alternative labels are placed correctly in the score-space.

There is no alternative labelling of loading-plots, since the gene clusters lack a GID-tag-like system. You can, however, change the gene cluster names by editing the column names of the pan-matrix directly before you do the panpca.

You may color each label/marker individually. In plotScores you can again supply a vector of colors, and name every element with a GID-tag to make certain they are used correctly. In plotLoadings you can supply a vector of colors, but you must arrange them in proper order yourself.

Additional arguments are passed on to text if show.labels=TRUE and to points if show.labels=FALSE.

Examples

Run this code

# Loading a Panmat object in the micropan package
data(list=c("Mpneumoniae.blast.panmat","Mpneumoniae.domain.panmat"),package="micropan")
ppca.blast <- panpca(Mpneumoniae.blast.panmat)

# Plotting scores and loadings
plotScores(ppca.blast) # A score-plot
plotLoadings(ppca.blast) # A loading plot

# Plotting score with alternative labels and colors
data(list="Mpneumoniae.table",package="micropan")
labels <- Mpneumoniae.table$Strain
names(labels) <- Mpneumoniae.table$GID.tag
cols <- Mpneumoniae.table$Color
names(cols) <- Mpneumoniae.table$GID.tag
plotScores(ppca.blast,labels=labels,col=cols)

Run the code above in your browser using DataLab

Description

Usage

Arguments

Details

See Also

Examples