It estimates the Distance Correlation coefficient (dcorr) for a continuous predicted-observed dataset.
dcorr(data = NULL, obs, pred, tidy = FALSE, na.rm = TRUE)
an object of class numeric
within a list
(if tidy = FALSE) or within a
data frame
(if tidy = TRUE).
(Optional) argument to call an existing data frame containing the data.
Vector with observed values (numeric).
Vector with predicted values (numeric).
logical operator (TRUE/FALSE) to decide the type of return. TRUE returns a data.frame, FALSE returns a list (default).
Logic argument to remove rows with missing values (NA). Default is na.rm = TRUE.
The dcorr function is a wrapper for the dcor
function
from the energy-package. See Rizzo & Szekely (2022). The distance
correlation (dcorr) coefficient is a novel measure of dependence
between random vectors introduced by Szekely et al. (2007).
The dcorr is characterized for being symmetric, which is relevant for the predicted-observed case (PO).
For all distributions with finite first moments, distance correlation \(\mathcal R\) generalizes the idea of correlation in two fundamental ways:
(1) \(\mathcal R(P,O)\) is defined for \(P\) and \(O\) in arbitrary dimension.
(2) \(\mathcal R(P,O)=0\) characterizes independence of \(P\) and \(O\).
Distance correlation satisfies \(0 \le \mathcal R \le 1\), and \(\mathcal R = 0\) only if \(P\) and \(O\) are independent. Distance covariance \(\mathcal V\) provides a new approach to the problem of testing the joint independence of random vectors. The formal definitions of the population coefficients \(\mathcal V\) and \(\mathcal R\) are given in Szekely et al. (2007).
The empirical distance correlation \(\mathcal{R}_n(\mathbf{P,O})\) is the square root of $$ \mathcal{R}^2_n(\mathbf{P,O})= \frac {\mathcal{V}^2_n(\mathbf{P,O})} {\sqrt{ \mathcal{V}^2_n (\mathbf{P}) \mathcal{V}^2_n(\mathbf{O})}}. $$
For the formula and more details, see online-documentation and the energy-package
Szekely, G.J., Rizzo, M.L., and Bakirov, N.K. (2007).
Measuring and testing dependence by correaltion of distances. Annals of Statistics, Vol. 35(6): 2769-2794.
tools:::Rd_expr_doi("10.1214/009053607000000505").
Rizzo, M., and Szekely, G. (2022).
energy: E-Statistics: Multivariate Inference via the Energy of Data.
R package version 1.7-10.
https://CRAN.R-project.org/package=energy.
eval_tidy
, defusing-advanced
dcor
, energy
# \donttest{
set.seed(1)
P <- rnorm(n = 100, mean = 0, sd = 10)
O <- P + rnorm(n=100, mean = 0, sd = 3)
dcorr(obs = P, pred = O)
# }
Run the code above in your browser using DataLab