lorenz: Lorenz Curve Based Thresholds and Partitions

Description

Lorenz curve based thresholds and partitions.

Usage

lorenz(x, n = rep(1, length(x)), na.last = TRUE)
# S3 method for lorenz
quantile(x, probs = seq(0, 1, 0.25),
    type = c("L", "p"), ...)
iquantile(x, ...)
# S3 method for lorenz
iquantile(x, values,
    type = c("L", "p"),...)
# S3 method for lorenz
plot(x, type = c("L", "x"),
    tangent = NA, h = NA, v = NA, ...)
# S3 method for summary.lorenz
print(x, digits, ...)
# S3 method for lorenz
summary(object, ...)

Value

lorenz returns an object of class lorenz. It is a matrix with m+1 rows (m = length(x)) and 3 columns (p, L, x).

The quantile method finds values of x_i corresponding to quantiles L_i or p_i (depending on the type argument). The iquantile (inverse quantile) method finds quantiles of L_i or p_i corresponding to values of x_i.

The plot method draws a Lorenz curve. Because the object is a matrix, lines

and points will work for adding multiple lines.

The summary method returns characteristics of the Lorenz curve.

Arguments

x: a vector of nonnegative numbers for lorenz, or an object to plot or summarized.
n: a vector of frequencies, must be same length as x.
na.last: logical, for controlling the treatment of NAs. If TRUE, missing values in the data are put last; if FALSE, they are put first; if NA, they are removed (see order).
probs: numeric vector of probabilities with values in [0,1], as in quantile.
values: numeric vector of values for which the corresponding population quantiles are to be returned.
type: character. For the plot method it indicates whether to plot the cumulative distribution quantiles ("L") or ordered but not-cumulated values ("x"). For the quantile and iquantile methods it indicates which of the quantiles ("L" or "p") to use.
tangent: color value for the Lorenz-curve tangent when plotted. The default NA value omits the tangent from the plot.
h: color value for the horizontal line for the Lorenz-curve tangent when plotted. The default NA value omits the horizontal line from the plot.
v: color value for the vertical line for the Lorenz-curve tangent when plotted. The default NA value omits the vertical line from the plot.
digits: numeric, number of significant digits in output.
object: object to summarize.
...: other arguments passed to the underlying functions.

Author

Peter Solymos <psolymos@gmail.com>

Details

The Lorenz curve is a continuous piecewise linear function representing the distribution of abundance (income, or wealth). Cumulative portion of the population: \(p_i = i / m\) (\(i=1,...,m\)), vs. cumulative portion of abundance: \(L_i = \sum_{j=1}^{i} x_j * n_j / \sum_{j=1}^{n} x_j * n_j\). where \(x_i\) are indexed in non-decreasing order (\(x_i <= x_{i+1}\)). By convention, p_0 = L_0 = 0. n can represent unequal frequencies.

The following charactersitics of the Lorenz curve are calculated: "t": index where tangent (slope 1) touches the curve; "x[t]", "p[t]", and "L[t]" are values corresponding to index t, x_t is the unmodified input. "S": Lorenz asymmetry coefficient (\(S = p_t + L_t\)), \(S = 1\) indicates symmetry. "G": Gini coefficient, 0 is perfect equality, values close to 1 indicate high inequality. "J": Youden index is the (largest) distance between the anti-diagonal and the curve, distance is largest at the tangent point (\(J = max(p - L) = p_t - L_t\)).

References

Damgaard, C., & Weiner, J. (2000): Describing inequality in plant size or fecundity. Ecology 81:1139--1142. <doi:10.2307/177185>

Schisterman, E. F., Perkins, N. J., Liu, A., & Bondell, H. (2005): Optimal cut-point and its corresponding Youden index to discriminate individuals using pooled blood samples. Epidemiology 16:73--81. <doi:10.1097/01.ede.0000147512.81966.ba>

Youden, W. J. (1950): Index for rating diagnostic tests. Cancer 3:32--5. <doi:10.1002/1097-0142(1950)3:1<32::AID-CNCR2820030106>3.0.CO;2-3>

Examples

Run this code

set.seed(1)
x <- c(rexp(100, 10), rexp(200, 1))

l <- lorenz(x)
head(l)
tail(l)
summary(l)
summary(unclass(l))

(q <- c(0.05, 0.5, 0.95))
(p_i <- quantile(l, probs=q, type="p"))
iquantile(l, values=p_i, type="p")
(p_i <- quantile(l, probs=q, type="L"))
iquantile(l, values=p_i, type="L")

op <- par(mfrow=c(2,1))
plot(l, lwd=2, tangent=2, h=3, v=4)
abline(0, 1, lty=2, col="grey")
abline(1, -1, lty=2, col="grey")
plot(l, type="x", lwd=2, h=3, v=4)
par(op)

## Lorenz-tangent approach to binarize a multi-level problem
n <- 100
g <- as.factor(sort(sample(LETTERS[1:4], n, replace=TRUE, prob=4:1)))
x <- rpois(n, exp(as.integer(g)))
mu <- aggregate(x, list(g), mean)
(l <- lorenz(mu$x, table(g)))
(s <- summary(l))

plot(l)
abline(0, 1, lty=2)
lines(rep(s["p[t]"], 2), c(s["p[t]"], s["L[t]"]), col=2)

Run the code above in your browser using DataLab