MultiCCA.permute: Select tuning parameters for sparse multiple canonical correlation analysis using the penalized matrix decomposition.

Description

This function can be used to automatically select tuning parameters for sparse multiple CCA. This is the analog of sparse CCA, when >2 data sets are available. Each data set may have features of type="standard" or type="ordered" (e.g. CGH data). Assume that there are K data sets, called $X1,...,XK$.

The tuning parameters are selected using a permutation scheme. For each candidate tuning parameter value, the following is performed: (1) Repeat the following n times, for n large: (a) The samples in $(X1,...,XK)$ are randomly permuted to obtain data sets $(X1*,...,XK*)$. (b) Sparse multiple CCA is run on the permuted data sets $(X1*,...,XK*)$ to get canonical variates $(w1*,...,wK*)$. (c) Record $t* = sum_(i t)$; that is, the fraction of permuted totals that exceed the total on the real data. Then, choose the tuning parameter value that gives the smallest value in Step 4.

This function only selets tuning parameters for the FIRST sparse multiple CCA factors.

Usage

MultiCCA.permute(xlist, penalties, ws=NULL,
type="standard", nperms=10, niter=3, trace=TRUE, standardize=TRUE)

Arguments

xlist

A list of length K, where K is the number of data sets on which to perform sparse multiple CCA. Data set k should be a matrix of dimension $n x p_k$ where $p_k$ is the number of features in data set k.

penalties

The penalty terms to be considered in the cross-validation. If the same penalty term is desired for each data set, then this should be a vector of length equal to the number of penalty terms to be considered. If different penalty terms are des

type

A K-vector containing elements "standard" or "ordered" - or a single value. If a single value, then it is assumed that all elements are the same (either "standard" or "ordered"). If columns of v are ordered (e.g. CGH spots ordered along

niter

How many iterations should be performed each time CCA is called? Default is 3, since an approximate estimate of u and v is acceptable in this case, and otherwise this function can be quite time-consuming.

A list of length K; the kth element contanis the first ncomponents columns of the v matrix of the SVD of Xk. If NULL, then the SVD of Xk will be computed inside this function. However, if you plan to run this function multiple times, then save a

trace

Print out progress?

nperms

How many times should the data be permuted? Default is 25. A large value of nperms is very important here, since the formula for computing the z-statistics requires a standard deviation estimate for the correlations obtained via permutation, w

standardize

Should the columns of X and Z be centered (to have mean zero) and scaled (to have standard deviation 1)? Default is TRUE.

Value

zstatThe vector of z-statistics, one per element of penalties.
pvalsThe vector of p-values, one per element of penalties.
bestpenaltiesThe best set of penalties (the one with the highest zstat).
corsThe value of $sum_(j
corpermsThe nperms values of $sum_(j
ws.initInitial values used for ws in sparse multiple CCA algorithm.

Details

Note that $x1,...,xK$ must have same number of rows. This function performs just a one-dimensional search in tuning parameter space.

References

Witten, DM and Tibshirani, R and T Hastie (2008) A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Submitted.

Examples

Run this code

# See examples in MultiCCA function

Run the code above in your browser using DataLab