Compute the line-of-organic correlation (LOC) (Helsel and others, 2020, sec. 10.2.2, p. 280). The LOC is estimated by both L-moments and product moments. The LOC has other names in the literature including reduced major axis and line of diagonal correlation. When describing a functional relations between two variables without trying to predict one from the other, LOC is more appropriate than ordinary least squares (OLS).
The LOC is a regression line whose slope is computed by the ratio between respective variations of the predictor variable and the response variable. The intercept of the line is computed such that the line passes through the familiar arithmetic mean (first L-moment) (\(\lambda_1\)) each for the two variables. Relative variation is readily computed by the ratio of standard deviations or for more robust and less biased estimation by the ratio of the L-variations (second L-moment) (\(\lambda_2\)) of the two variables.
The \(\lambda_2\) is generically based on the so-called Gini mean difference statistic (GMD) (\(\mathcal{G}\)) by \(\lambda_2 = \mathcal{G}/2\) (gini.mean.diff
). Incidentally for the normal distribution, the well-known standard deviation is the product \(\lambda_2\sqrt{\pi}\) (see also lmomnor
). Mathematically, GMD is defined as the linear combination
$$\mathcal{G} = \frac{2}{n(n-1)}\sum_{i=1}^n (2i - n - 1) x_{i:n}\mbox{,}$$
where \(x_{i:n}\) are the sample ascending order statistics.
Returning to the need to estimate the LOC slope, algebra shows the slope is the ratio of the \(\mathcal{G}\) values as
$$m = \mathrm{sign[} \rho \mathrm{]}\cdot\frac{\sum_{i=1}^n (2i - n - 1) X_{i:n}}{\sum_{i=1}^n (2i - n - 1) Y_{i:n}}\mbox{,}$$
where \(X_{i:n}\) is an ordered (ascending) vector of random variable \(X\), \(Y_{i:n}\) is an ordered (ascending) vector of random variable \(Y\), and the slope sign can be computed by a correlation coefficient sign (Pearson R, Kendall Tau [computationally slowest], Spearman Rho would all work [implemented for the function, \(\rho\)]). For applications, it is critical that the correlation coefficient is computed using the original correlated-ordering of \(X\) and \(Y\) and not after individual vector sorting that is needed for the GMD (L-moments). A developer, therefore, must be cognizant of the placement in code when the two variables are sorted to the order statistics for \(\mathcal{G}\) computations.
The LOC intercept is given by algebra by
$$b = \frac{1}{n}\biggl(\sum_{i=1}^n X_{i:n} - m \cdot \sum_{i=1}^n Y_{i:n}\biggr)\mbox{.}$$
Helsel and others (2020, p. 281) enumerate some advantages to the use of the LOC: (1) it minimizes errors in both x
and y
directions, (2) it provides a single line regardless of which variable (x or y) is used as the response variable, and (3) its cumulative distribution function of the predictions, including the variance and probabilities, is correct (meaning not compressed as in OLS). The LOC is particularly useful for modeling the intrinsic functional relation between two variables, both of which are measured with error and (or) when neither variable is considered an independent variable appropriate to predict the other.