bga(dataset, classvec, type = "coa", ...)
"plot"(x, axis1=1, axis2=2, arraycol=NULL, genecol="gray25", nlab=10, genelabels= NULL, ...)
matrix
, data.frame
,
ExpressionSet
or
marrayRaw-class
.
If the input is gene expression data in a matrix
or data.frame
. The
rows and columns are expected to contain the variables (genes) and cases (array samples)
respectively.
factor
or vector
which describes the classes in the training dataset.bga
. The output from bga
or
bga.suppl
. It contains the projection coordinates from bga
,
the \$ls, \$co or \$li coordinates to be plotted.getcol
, for each classes
of cases (microarray samples) on the array (case) plot. genecol is the colour of the
points for each variable (genes) on gene plot.genelabels=NULL
the row.names
of input matrix dataset
will be used.bga
performs a between group analysis on the input dataset. This function
calls bca
. The input format of the dataset
is verified using array2ade4
. Between group analysis is a supervised method for sample discrimination and class prediction.
BGA is carried out by ordinating groups (sets of grouped microarray samples), that is,
groups of samples are projected into a reduced dimensional space. This is most easily
done using PCA or COA, of the group means. The choice of PCA, COA is defined by the parameter type
.
The user must define microarray sample groupings in advance. These groupings are defined using
the input classvec
, which is a factor
or vector
.
Cross-validation and testing of bga results:
bga results should be validated using one leave out jack-knife cross-validation using
bga.jackknife
and by projecting a blind test datasets onto the bga axes
using suppl
.
bga
and suppl
are combined in bga.suppl
which requires input of both a training and test dataset.
It is important to ensure that the selection of cases for a training and test set are not biased, and
generally many cross-validations should be performed. The function randomiser
can be used to randomise the selection of training and test samples.
Plotting and visualising bga results:
1D plots, show one axis only:
1D graphs can be plotted using between.graph
and
graph1D
. between.graph
is used for plotting the cases,
and required both the co-ordinates of the cases (\$ls) and their centroids (\$li). It accepts an object bga
.
graph1D
can be used to plot either cases (microarrays) or variables (genes) and only requires
a vector of coordinates.
2D plots:
Use plot.bga
to plot results from bga
. plot.bga calls the functions
plotarrays
to draw an xy plot of cases (\$ls).
plotgenes
, is used to draw an xy plot of the variables (genes).
plotgenes
, is used to draw an xy plot of the variables (genes).
3D plots:
3D graphs can be generated using do3D
and html3D
.
html3D
produces a web page in which a 3D plot can be interactively rotated, zoomed,
and in which classes or groups of cases can be easily highlighted.
Analysis of the distribution of variance among axes:
It is important to know which cases (microarray samples) are discriminated by the axes.
The number of axes or principal components from a bga
will equal the number of classes - 1
,
that is length(levels(classvec))-1.
The distribution of variance among axes is described in the eigenvalues (\$eig) of the bga
analysis.
These can be visualised using a scree plot, using scatterutil.eigen
as it done in plot.bga
.
It is also useful to visualise the principal components from a using a bga
or principal components analysis
dudi.pca
, or correspondence analysis dudi.coa
using a
heatmap. In MADE4 the function heatplot
will plot a heatmap with nicer default colours.
Extracting list of top variables (genes):
Use topgenes
to get list of variables or cases at the ends of axes. It will return a list
of the top n variables (by default n=5) at the positive, negative or both ends of an axes.
sumstats
can be used to return the angle (slope) and distance from the origin of a list of
coordinates.
For more details see Culhane et al., 2002 and http://bioinf.ucd.ie/research/BGA.
bga
,
suppl
, suppl.bga
, bca
,
bga.jackknife
data(khan)
if (require(ade4, quiet = TRUE)) {
khan.bga<-bga(khan$train, classvec=khan$train.classes)
}
khan.bga
plot(khan.bga, genelabels=khan$annotation$Symbol)
# Provide a view of the principal components (axes) of the bga
heatplot(khan.bga$bet$ls, dend="none")
Run the code above in your browser using DataLab