dcov(x, y, index = 1.0)
dcor(x, y, index = 1.0)
DCOR(x, y, index = 1.0)
dcov
returns the sample distance covariance and
dcor
returns the sample distance correlation.
DCOR
returns a list with elementsdcov
and dcor
or DCOR
compute distance
covariance and distance correlation statistics.
DCOR
is a self-contained R function returning a list of
statistics. dcor
execution is faster than DCOR
(see examples).
The sample sizes (number of rows) of the two samples must
agree, and samples must not contain missing values. Arguments
x
, y
can optionally be dist
objects;
otherwise these arguments are treated as data.
Distance correlation is a new measure of dependence between random
vectors introduced by Szekely, Rizzo, and Bakirov (2007).
For all distributions with finite first moments, distance
correlation $\mathcal R$ generalizes the idea of correlation in two
fundamental ways:
(1) $\mathcal R(X,Y)$ is defined for $X$ and $Y$ in arbitrary dimension.
(2) $\mathcal R(X,Y)=0$ characterizes independence of $X$ and
$Y$.
Distance correlation satisfies $0 \le \mathcal R \le 1$, and
$\mathcal R = 0$ only if $X$ and $Y$ are independent. Distance
covariance $\mathcal V$ provides a new approach to the problem of
testing the joint independence of random vectors. The formal
definitions of the population coefficients $\mathcal V$ and
$\mathcal R$ are given in (SRB 2007). The definitions of the
empirical coefficients are as follows.
The empirical distance covariance $\mathcal{V}_n(\mathbf{X,Y})$
with index 1 is
the nonnegative number defined by
$$\mathcal{V}^2_n (\mathbf{X,Y}) = \frac{1}{n^2} \sum_{k,\,l=1}^n
A_{kl}B_{kl}$$
where $A_{kl}$ and $B_{kl}$ are
$$A_{kl} = a_{kl}-\bar a_{k.}- \bar a_{.l} + \bar a_{..}$$
$$B_{kl} = b_{kl}-\bar b_{k.}- \bar b_{.l} + \bar b_{..}.$$
Here
$$a_{kl} = \|X_k - X_l\|_p, \quad b_{kl} = \|Y_k - Y_l\|_q, \quad
k,l=1,\dots,n,$$
and the subscript .
denotes that the mean is computed for the
index that it replaces. Similarly,
$\mathcal{V}_n(\mathbf{X})$ is the nonnegative number defined by
$$\mathcal{V}^2_n (\mathbf{X}) = \mathcal{V}^2_n (\mathbf{X,X}) =
\frac{1}{n^2} \sum_{k,\,l=1}^n
A_{kl}^2.$$
The empirical distance correlation $\mathcal{R}_n(\mathbf{X,Y})$ is
the square root of
$$\mathcal{R}^2_n(\mathbf{X,Y})=
\frac {\mathcal{V}^2_n(\mathbf{X,Y})}
{\sqrt{ \mathcal{V}^2_n (\mathbf{X}) \mathcal{V}^2_n(\mathbf{Y})}}.$$
See dcov.test
for a test of multivariate independence
based on the distance covariance statistic.dcov.test
dcor.ttest
x <- iris[1:50, 1:4]
y <- iris[51:100, 1:4]
dcov(x, y)
dcov(dist(x), dist(y)) #same thing
## C implementation
dcov(x, y, 1.5)
dcor(x, y, 1.5)
.dcov(dist(x), dist(y), 1.5)
## R implementation
DCOR(x, y, 1.5)
## compare speed of R version and C version
set.seed(111)
## R version
system.time(replicate(1000, DCOR(x, y)))
set.seed(111)
## C version
system.time(replicate(1000, .dcov(x, y)))
Run the code above in your browser using DataLab