Computes Yanai's Generalized Coefficient of Determination for the
similarity of the subspaces spanned by a subset of
variables (specified by indices
) and a subset of the
full-data set's Principal Components (specified by pcindices
).
Input data is expected in the form of a (co)variance or
correlation matrix. If a non-square matrix is given, it is assumed to
be a data matrix, and its correlation matrix is used as input. The
number of variables (k) and of PCs (q) does not have to be the same.
Yanai's GCD is defined as:
$$GCD = \frac{\mathrm{tr}(P_v\cdot P_c)}{\sqrt{k\cdot q}}$$
where \(P_v\) and \(P_c\) are the matrices of orthogonal
projections on the subspaces spanned by the k-variable subset and by
the q-Principal Component subset, respectively.
This definition is equivalent to:
$$GCD = \frac{1}{\sqrt{k q}} \sum\limits_{i}(r_m)_i^2$$
where \((r_m)_i\) stands for the multiple correlation between the
i
-th Principal Component and the k-variable subset, and the sum
is carried out over the q PCs (i=1,...,q) selected.
These definitions are also equivalent to the expression used in the
code, which only requires the covariance (or correlation) matrix of
the data under consideration.
The fact that indices
can be a matrix or 3-d array allows for
the computation of the GCD values of subsets produced by the search
functions anneal
, genetic
and
improve
(whose output option $subsets
are
matrices or 3-d arrays), using a different criterion (see the example
below).