Compute the maximum correlation between two data sets via projection pursuit based on alternating series of grid searches in two-dimensional subspaces of each data set, with a focus on robust and nonparametric methods.
maxCorGrid(
x,
y,
method = c("spearman", "kendall", "quadrant", "M", "pearson"),
control = list(...),
nIterations = 10,
nAlternate = 10,
nGrid = 25,
select = NULL,
tol = 1e-06,
standardize = TRUE,
fallback = FALSE,
seed = NULL,
...
)
An object of class "maxCor"
with the following components:
a numeric giving the maximum correlation estimate.
numeric; the weighting vector for x
.
numeric; the weighting vector for y
.
a numeric vector giving the center estimates used in
standardization of x
.
a numeric vector giving the center estimates used in
standardization of y
.
a numeric vector giving the scale estimates used in
standardization of x
.
a numeric vector giving the scale estimates used in
standardization of y
.
the matched function call.
each can be a numeric vector, matrix or data frame.
a character string specifying the correlation functional to
maximize. Possible values are "spearman"
for the Spearman
correlation, "kendall"
for the Kendall correlation, "quadrant"
for the quadrant correlation, "M"
for the correlation based on a
bivariate M-estimator of location and scatter with a Huber loss function, or
"pearson"
for the classical Pearson correlation (see
corFunctions
).
a list of additional arguments to be passed to the specified
correlation functional. If supplied, this takes precedence over additional
arguments supplied via the ...
argument.
an integer giving the maximum number of iterations.
an integer giving the maximum number of alternate series of grid searches in each iteration.
an integer giving the number of equally spaced grid points on the unit circle to use in each grid search.
optional; either an integer vector of length two or a list
containing two index vectors. In the first case, the first integer gives
the number of variables of x
to be randomly selected for determining
the order of the variables of y
in the corresponding series of grid
searches, and vice versa for the second integer. In the latter case, the
first list element gives the indices of the variables of x
to be used
for determining the order of the variables of y
, and vice versa for
the second integer (see “Details”).
a small positive numeric value to be used for determining convergence.
a logical indicating whether the data should be (robustly) standardized.
logical indicating whether a fallback mode for robust standardization should be used. If a correlation functional other than the Pearson correlation is maximized, the first attempt for standardizing the data is via median and MAD. In the fallback mode, variables whose MADs are zero (e.g., dummy variables) are standardized via mean and standard deviation. Note that if the Pearson correlation is maximized, standardization is always done via mean and standard deviation.
optional initial seed for the random number generator (see
.Random.seed
). This is only used if select
specifies
the numbers of variables of each data set to be randomly selected for
determining the order of the variables of the respective other data set.
additional arguments to be passed to the specified correlation functional.
Andreas Alfons
The algorithm is based on alternating series of grid searches in
two-dimensional subspaces of each data set. In each grid search,
nGrid
grid points on the unit circle in the corresponding plane are
obtained, and the directions from the center to each of the grid points are
examined. In the first iteration, equispaced grid points in the interval
\([-\pi/2, \pi/2)\) are used. In each subsequent
iteration, the angles are halved such that the interval
\([-\pi/4, \pi/4)\) is used in the second iteration and so
on. If only one data set is multivariate, the algorithm simplifies
to iterative grid searches in two-dimensional subspaces of the corresponding
data set.
In the basic algorithm, the order of the variables in a series of grid
searches for each of the data sets is determined by the average absolute
correlations with the variables of the respective other data set. Since
this requires to compute the full \((p \times q)\) matrix of
absolute correlations, where \(p\) denotes the number of variables of
x
and \(q\) the number of variables of y
, a faster
modification is available as well. In this modification, the average
absolute correlations are computed over only a subset of the variables of
the respective other data set. It is thereby possible to use randomly
selected subsets of variables, or to specify the subsets of variables
directly.
Note that also the data sets are ordered according to the maximum average absolute correlation with the respective other data set to ensure symmetry of the algorithm.
A. Alfons, C. Croux and P. Filzmoser (2016) Robust maximum association between data sets: The R Package ccaPP. Austrian Journal of Statistics, 45(1), 71--79.
A. Alfons, C. Croux and P. Filzmoser (2016) Robust maximum association estimators. Journal of the American Statistical Association, 112(517), 435--445.
maxCorProj
, ccaGrid
,
corFunctions
data("diabetes")
x <- diabetes$x
y <- diabetes$y
## Spearman correlation
maxCorGrid(x, y, method = "spearman")
maxCorGrid(x, y, method = "spearman", consistent = TRUE)
## Pearson correlation
maxCorGrid(x, y, method = "pearson")
Run the code above in your browser using DataLab