CCorA: Canonical Correlation Analysis

Description

Canonical correlation analysis, following Brian McArdle's unpublished graduate course notes, plus improvements to allow the calculations in the case of very sparse and collinear matrices.

Usage

CCorA(Y, X, stand.Y=FALSE, stand.X=FALSE, nperm = 0, ...)
## S3 method for class 'CCorA':
biplot(x, xlabs, which = 1:2, ...)

Arguments

left matrix.

right matrix.

stand.Y

logical; should Y be standardized?

stand.X

logical; should X be standardized?

nperm

numeric; Number of permutations to evaluate the significance of Pillai's trace

CCoaR result object

xlabs

Row labels. The default is to use row names, NULL uses row numbers instead, and NA suppresses plotting row names completely

which

1 plots Y reseults, and 2 plots X1 results

...

Other arguments passed to functions. biplot.CCorA passes graphical arguments to biplot and biplot.default, CCorA curr

Value

Function CCorA returns a list containing the following components:
PillaiPillai's trace statistic = sum of canonical eigenvalues.
EigenValuesCanonical eigenvalues. They are the squares of the canonical correlations.
CanCorrCanonical correlations.
Mat.ranksRanks of matrices Y and X1 (possibly after controlling for X2).
RDA.RsquaresBimultivariate redundancy coefficients (R-squares) of RDAs of Y|X1 and X1|Y.
RDA.adj.RsqRDA.Rsquares adjusted for n and number of explanatory variables.
AAScores of Y variables in Y biplot.
BBScores of X1 variables in X1 biplot.
CyObject scores in Y biplot.
CxObject scores in X1 biplot.

concept

ordination

Details

Canonical correlation analysis (Hotelling 1936) seeks linear combinations of the variables of Y that are maximally correlated to linear combinations of the variables of X. The analysis estimates the relationships and displays them in graphs.

Algorithmic notes:

All data matrices are replaced by their PCA object scores, computed by SVD.
The blunt approach would be to read the three matrices, compute the covariance matrices, then the matrixS12 %*% inv(S22) %*% t(S12) %*% inv(S11). Its trace is Pillai's trace statistic.
This approach may fail, however, when there is heavy multicollinearity in very sparse data matrices, as it is the case in 4th-corner inflated data matrices for example. The safe approach is to replace all data matrices by their PCA object scores.
Inversion bysolveis avoided. Computation of inverses is done bySVD(svd) in most cases.
Regression byOLSis also avoided. Regression residuals are computed byQRdecomposition (qr).

The biplot function can produce two biplots, each for the left matrix and right matrix solutions. The function passes all arguments to biplot.default, and you should consult its help page for configuring biplots.

References

Hotelling, H. 1936. Relations between two sets of variates. Biometrika 28: 321-377.

Examples

Run this code

# Example using random numbers
mat1 <- matrix(rnorm(60),20,3)
mat2 <- matrix(rnorm(100),20,5)
CCorA(mat1, mat2)

# Example using intercountry life-cycle savings data, 50 countries
data(LifeCycleSavings)
pop <- LifeCycleSavings[, 2:3]
oec <- LifeCycleSavings[, -(2:3)]
out <- CCorA(pop, oec)
out
biplot(out, xlabs = NA)

Run the code above in your browser using DataLab