rcc: Regularized Canonical Correlation Analysis

Description

The function performs the regularized extension of the Canonical Correlation Analysis to seek correlations between two data matrices.

Usage

rcc(X, Y, ncomp = 2, lambda1 = 0, lambda2 = 0)

Arguments

numeric matrix or data frame $(n \times p)$, the observations on the $X$ variables. NAs are allowed.

numeric matrix or data frame $(n \times q)$, the observations on the $Y$ variables. NAs are allowed.

ncomp

the number of components to include in the model. Default to 2.

lambda1, lambda2

a not negative real. The regularization parameter for the X and Y data. Defaults to lambda1=lambda2=0.

Value

rcc returns a object of class "rcc", a list that contains the following components:
Xthe original $X$ data.
Ythe original $Y$ data.
lambdaa vector containing the regularization parameters.
cora vector containing the canonical correlations.
loadingslist containing the estimated loadings for the $X$ and $Y$ canonical variates.
variateslist containing the canonical variates.
nameslist containing the names to be used for individuals and variables.

encoding

latin1

Details

The main purpose of Canonical Correlations Analysis (CCA) is the exploration of sample correlations between two sets of variables $X$ and $Y$ observed on the same individuals (experimental units) whose roles in the analysis are strictly symmetric. The cancor function performs the core of computations but additional tools are required to deal with data sets highly correlated (nearly collinear), data sets with more variables than units by example. The rcc function, the regularized version of CCA, is one way to deal with this problem by including a regularization step in the computations of CCA. Such a regularization in this context was first proposed by Vinod (1976), then developped by Leurgans et al. (1993). It consists in the regularization of the empirical covariances matrices of $X$ and $Y$ by adding a multiple of the matrix identity, that is, Cov$(X)+ \lambda_1 I$ and Cov$(Y)+ \lambda_2 I$. When lambda1=0 and lambda2=0, rcc perform classic CCA, if posible. The estimation of the missing values can be performed by the reconstitution of the data matrix using the nipals function. Otherwise, missing values are handled by casewise deletion in the rcc function.

References

Leurgans, S. E., Moyeed, R. A. and Silverman, B. W. (1993). Canonical correlation analysis when the data are curves. Journal of the Royal Statistical Society. Series B 55, 725-740. Vinod, H. D. (1976). Canonical ridge and econometrics of joint production. Journal of Econometrics 6, 129-137.

Examples

Run this code

## Classic CCA
data(linnerud)
X <- linnerud$exercise
Y <- linnerud$physiological
linn.res <- rcc(X, Y)

## Regularized CCA
data(nutrimouse)
X <- nutrimouse$lipid
Y <- nutrimouse$gene
nutri.res <- rcc(X, Y, ncomp = 3, lambda1 = 0.064, lambda2 = 0.008)

Run the code above in your browser using DataLab