ccaGrid: (Robust) CCA via alternating series of grid searches

Description

Perform canoncial correlation analysis via projection pursuit based on alternating series of grid searches in two-dimensional subspaces of each data set, with a focus on robust and nonparametric methods.

Usage

ccaGrid(
  x,
  y,
  k = 1,
  method = c("spearman", "kendall", "quadrant", "M", "pearson"),
  control = list(...),
  nIterations = 10,
  nAlternate = 10,
  nGrid = 25,
  select = NULL,
  tol = 1e-06,
  standardize = TRUE,
  fallback = FALSE,
  seed = NULL,
  ...
)
CCAgrid(
  x,
  y,
  k = 1,
  method = c("spearman", "kendall", "quadrant", "M", "pearson"),
  maxiter = 10,
  maxalter = 10,
  splitcircle = 25,
  select = NULL,
  zero.tol = 1e-06,
  standardize = TRUE,
  fallback = FALSE,
  seed = NULL,
  ...
)

Value

An object of class "cca" with the following components:

cor: a numeric vector giving the canonical correlation measures.
A: a numeric matrix in which the columns contain the canonical vectors for x.
B: a numeric matrix in which the columns contain the canonical vectors for y.
centerX: a numeric vector giving the center estimates used in standardization of x.
centerY: a numeric vector giving the center estimates used in standardization of y.
scaleX: a numeric vector giving the scale estimates used in standardization of x.
scaleY: a numeric vector giving the scale estimates used in standardization of y.
call: the matched function call.

Arguments

x, y: each can be a numeric vector, matrix or data frame.
k: an integer giving the number of canonical variables to compute.
method: a character string specifying the correlation functional to maximize. Possible values are "spearman" for the Spearman correlation, "kendall" for the Kendall correlation, "quadrant" for the quadrant correlation, "M" for the correlation based on a bivariate M-estimator of location and scatter with a Huber loss function, or "pearson" for the classical Pearson correlation (see corFunctions).
control: a list of additional arguments to be passed to the specified correlation functional. If supplied, this takes precedence over additional arguments supplied via the ... argument.
nIterations, maxiter: an integer giving the maximum number of iterations.
nAlternate, maxalter: an integer giving the maximum number of alternate series of grid searches in each iteration.
nGrid, splitcircle: an integer giving the number of equally spaced grid points on the unit circle to use in each grid search.
select: optional; either an integer vector of length two or a list containing two index vectors. In the first case, the first integer gives the number of variables of x to be randomly selected for determining the order of the variables of y in the corresponding series of grid searches, and vice versa for the second integer. In the latter case, the first list element gives the indices of the variables of x to be used for determining the order of the variables of y, and vice versa for the second integer (see “Details”).
tol, zero.tol: a small positive numeric value to be used for determining convergence.
standardize: a logical indicating whether the data should be (robustly) standardized.
fallback: logical indicating whether a fallback mode for robust standardization should be used. If a correlation functional other than the Pearson correlation is maximized, the first attempt for standardizing the data is via median and MAD. In the fallback mode, variables whose MADs are zero (e.g., dummy variables) are standardized via mean and standard deviation. Note that if the Pearson correlation is maximized, standardization is always done via mean and standard deviation.
seed: optional initial seed for the random number generator (see .Random.seed). This is only used if select specifies the numbers of variables of each data set to be randomly selected for determining the order of the variables of the respective other data set.
...: additional arguments to be passed to the specified correlation functional. Currently, this is only relevant for the M-estimator. For Spearman, Kendall and quadrant correlation, consistency at the normal model is always forced.

Author

Andreas Alfons

Details

The algorithm is based on alternating series of grid searches in two-dimensional subspaces of each data set. In each grid search, nGrid grid points on the unit circle in the corresponding plane are obtained, and the directions from the center to each of the grid points are examined. In the first iteration, equispaced grid points in the interval \([-\pi/2, \pi/2)\) are used. In each subsequent iteration, the angles are halved such that the interval \([-\pi/4, \pi/4)\) is used in the second iteration and so on. If only one data set is multivariate, the algorithm simplifies to iterative grid searches in two-dimensional subspaces of the corresponding data set.

In the basic algorithm, the order of the variables in a series of grid searches for each of the data sets is determined by the average absolute correlations with the variables of the respective other data set. Since this requires to compute the full \((p \times q)\) matrix of absolute correlations, where \(p\) denotes the number of variables of x and \(q\) the number of variables of y, a faster modification is available as well. In this modification, the average absolute correlations are computed over only a subset of the variables of the respective other data set. It is thereby possible to use randomly selected subsets of variables, or to specify the subsets of variables directly.

Note that also the data sets are ordered according to the maximum average absolute correlation with the respective other data set to ensure symmetry of the algorithm.

For higher order canonical correlations, the data are first transformed into suitable subspaces. Then the alternate grid algorithm is applied to the reduced data and the results are back-transformed to the original space.

Examples

Run this code

data("diabetes")
x <- diabetes$x
y <- diabetes$y

## Spearman correlation
ccaGrid(x, y, method = "spearman")

## Pearson correlation
ccaGrid(x, y, method = "pearson")

Run the code above in your browser using DataLab