Learn R Programming

PMA (version 1.0.4)

MultiCCA.permute: Select tuning parameters for sparse multiple canonical correlation analysis using the penalized matrix decomposition.

Description

This function can be used to automatically select tuning parameters for sparse multiple CCA. This is the analog of sparse CCA, when >2 data sets are available. Each data set may have features of type="standard" or type="ordered" (e.g. CGH data). Assume that there are K data sets, called $X1,...,XK$.

The tuning parameters are selected using a permutation scheme. For each candidate tuning parameter value, the following is performed: (1) Repeat the following n times, for n large: (a) The samples in $(X1,...,XK)$ are randomly permuted to obtain data sets $(X1*,...,XK*)$. (b) Sparse multiple CCA is run on the permuted data sets $(X1*,...,XK*)$ to get canonical variates $(w1*,...,wK*)$. (c) Record $t* = sum_(i t)$; that is, the fraction of permuted totals that exceed the total on the real data. Then, choose the tuning parameter value that gives the smallest value in Step 4.

This function only selets tuning parameters for the FIRST sparse multiple CCA factors.

Usage

MultiCCA.permute(xlist, penalties, ws=NULL,
type="standard", nperms=10, niter=3, trace=TRUE, standardize=TRUE)

Arguments

xlist
A list of length K, where K is the number of data sets on which to perform sparse multiple CCA. Data set k should be a matrix of dimension $n x p_k$ where $p_k$ is the number of features in data set k.
penalties
The penalty terms to be considered in the cross-validation. If the same penalty term is desired for each data set, then this should be a vector of length equal to the number of penalty terms to be considered. If different penalty terms are des
type
A K-vector containing elements "standard" or "ordered" - or a single value. If a single value, then it is assumed that all elements are the same (either "standard" or "ordered"). If columns of v are ordered (e.g. CGH spots ordered along
niter
How many iterations should be performed each time CCA is called? Default is 3, since an approximate estimate of u and v is acceptable in this case, and otherwise this function can be quite time-consuming.
ws
A list of length K; the kth element contanis the first ncomponents columns of the v matrix of the SVD of Xk. If NULL, then the SVD of Xk will be computed inside this function. However, if you plan to run this function multiple times, then save a
trace
Print out progress?
nperms
How many times should the data be permuted? Default is 25. A large value of nperms is very important here, since the formula for computing the z-statistics requires a standard deviation estimate for the correlations obtained via permutation, w
standardize
Should the columns of X and Z be centered (to have mean zero) and scaled (to have standard deviation 1)? Default is TRUE.

Value

  • zstatThe vector of z-statistics, one per element of penalties.
  • pvalsThe vector of p-values, one per element of penalties.
  • bestpenaltiesThe best set of penalties (the one with the highest zstat).
  • corsThe value of $sum_(j
  • corpermsThe nperms values of $sum_(j
  • ws.initInitial values used for ws in sparse multiple CCA algorithm.

Details

Note that $x1,...,xK$ must have same number of rows. This function performs just a one-dimensional search in tuning parameter space.

References

Witten, DM and Tibshirani, R and T Hastie (2008) A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Submitted.

See Also

MultiCCA, CCA.permute, CCA

Examples

Run this code
# See examples in MultiCCA function

Run the code above in your browser using DataLab