cia: Coinertia analysis: Explore the covariance between two datasets

Description

Performs CIA on two datasets as described by Culhane et al., 2003. Used for meta-analysis of two or more datasets.

Usage

cia(df1, df2, cia.nf=2, cia.scan=FALSE, nsc=TRUE,...)
"plot"(x, nlab = 10, axis1 = 1, axis2 = 2, genecol = "gray25",  genelabels1 = rownames(ciares$co), genelabels2 = rownames(ciares$li), ...)

Arguments

df1

The first dataset. A matrix, data.frame, ExpressionSet or marrayRaw-class. If the input is gene expression data in a matrix or data.frame. The rows and columns are expected to contain the variables (genes) and cases (array samples) respectively.

df2

The second dataset. A matrix, data.frame, ExpressionSet or marrayRaw-class. If the input is gene expression data in a matrix or data.frame. The rows and columns are expected to contain the variables (genes) and cases (array samples) respectively.

cia.nf

Integer indicating the number of coinertia analysis axes to be saved. Default value is 2.

cia.scan

Logical indicating whether the coinertia analysis eigenvalue (scree) plot should be shown so that the number of axes, cia.nf can be selected interactively. Default value is FALSE.

nsc

A logical indicating whether coinertia analysis should be performed using two non-symmetric correspondence analyses dudi.nsc. The default=TRUE is highly recommended. If FALSE, COA dudi.coa will be performed on df1, and row weighted COA dudi.rwcoa will be performed on df2 using the row weights from df1.

An object of class cia, containing the CIA projected coordinates to be plotted.

nlab

Numeric. An integer indicating the number of variables (genes) to be labelled on plots.

axis1

Integer, the column number for the x-axis. The default is 1.

axis2

Integer, the column number for the y-axis. The default is 2.

genecol

Character, the colour of genes (variables). The default is "gray25".

genelabels1, genelabels2

A vector of variables labels, by default the row.names of each input matrix df1, and df2 are used.

...

further arguments passed to or from other methods.

Value

call: list of input arguments, df1 and df2
coinertia: A object of class "coinertia", sub-class dudi. See coinertia
coa1: Returns an object of class "coa" or "nsc", with sub-class dudi. See dudi.coa or dudi.nsc
coa2: Returns an object of class "coa" or "nsc", with sub-class dudi. See dudi.coa or dudi.nsc

Details

CIA has been successfully applied to the cross-platform comparison (meta-analysis) of microarray gene expression datasets (Culhane et al., 2003). Please refer to this paper and the vignette for help in interpretation of the output from CIA.

Co-inertia analysis (CIA) is a multivariate method that identifies trends or co-relationships in multiple datasets which contain the same samples. That is the rows or columns of the matrix have to be weighted similarly and thus must be "matchable". In cia, it is assumed that the analysis is being performed on the microarray cases, and thus the columns will be matched between the 2 datasets. Thus please ensure that the order of cases (the columns) in df1 and df2 are equivalent before performing CIA.

CIA simultaneously finds ordinations (dimension reduction diagrams) from the datasets that are most similar. It does this by finding successive axes from the two datasets with maximum covariance. CIA can be applied to datasets where the number of variables (genes) far exceeds the number of samples (arrays) such is the case with microarray analyses.

cia calls coinertia in the ADE4 package. For more information on coinertia analysis please refer to coinertia and several recent reviews (see below).

In the paper by Culhane et al., 2003, the datasets df1 and df2 are transformed using COA and Row weighted COA respectively, before coinertia analysis. It is now recommended to perform non symmetric correspondence analysis (NSC) rather than correspondence analysis (COA) on both datasets.

The RV coefficient

In the results, in the object cia returned by the analysis, \$coinertia\$RV gives the RV coefficient. This is a measure of global similarity between the datasets, and is a number between 0 and 1. The closer it is to 1 the greater the global similarity between the two datasets.

Plotting and visualising cia results

plot.cia draws 3 plots.

The first plot uses S.match.col to plots the projection (normalised scores \$mY and \$mX) of the samples from each dataset onto the one space. Cases (microarray samples) from one dataset are represented by circles, and cases from the second dataset are represented by arrow tips. Each circle and arrow is joined by a line, where the length of the line is proportional to the divergence between the gene expression profiles of that sample in the two datasets. A short line shows good agreement between the two datasets.

The second two plots call plot.genes are show the projection of the variables (genes, \$li and \$co) from each dataset in the new space. It is important to note both the direction of project of Variables (genes) and cases (microarray samples). Variables and cases that are projected in the same direction from the origin have a positive correlation (ie those genes are upregulated in those microarray samples)

Please refer to the help on bga for further discussion on graphing and visualisation functions in MADE4.

References

Culhane AC, et al., 2003 Cross platform comparison and visualisation of gene expression data using co-inertia analysis. BMC Bioinformatics. 4:59

Examples

Run this code

data(NCI60)
print("This will take a few minutes, please wait...")

if (require(ade4, quiet = TRUE)) {
	# Example data are "G1_Ross_1375.txt" and "G5_Affy_1517.txt"
	coin <- cia(NCI60$Ross, NCI60$Affy)
}
attach(coin)
summary(coin)
summary(coin$coinertia)
# $coinertia$RV will give the RV-coefficient, the greater (scale 0-1) the better   
cat(paste("The RV coefficient is a measure of global similarity between the datasets.\n",
"The two datasets analysed are very similar. ",
"The RV coefficient of this coinertia analysis is: ", coin$coinertia$RV,"\n", sep= ""))
plot(coin)
plot(coin, classvec=NCI60$classes[,2], clab=0, cpoint=3)

Run the code above in your browser using DataLab