Learn R Programming

opticut (version 0.1-3)

lorenz: Lorenz Curve Based Thresholds and Partitions

Description

Lorenz curve based thresholds and partitions.

Usage

lorenz(x, n = rep(1, length(x)), na.last = TRUE)

# S3 method for lorenz quantile(x, probs = seq(0, 1, 0.25), type = c("L", "p"), ...) iquantile(x, ...) # S3 method for lorenz iquantile(x, values, type = c("L", "p"),...)

# S3 method for lorenz plot(x, type = c("L", "x"), tangent = NA, h = NA, v = NA, ...)

# S3 method for summary.lorenz print(x, digits, ...) # S3 method for lorenz summary(object, ...)

Value

lorenz returns an object of class lorenz. It is a matrix with m+1 rows (m = length(x)) and 3 columns (p, L, x).

The quantile method finds values of x_i corresponding to quantiles L_i or p_i (depending on the type argument). The iquantile (inverse quantile) method finds quantiles of L_i or p_i corresponding to values of x_i.

The plot method draws a Lorenz curve. Because the object is a matrix, lines

and points will work for adding multiple lines.

The summary method returns characteristics of the Lorenz curve.

Arguments

x

a vector of nonnegative numbers for lorenz, or an object to plot or summarized.

n

a vector of frequencies, must be same length as x.

na.last

logical, for controlling the treatment of NAs. If TRUE, missing values in the data are put last; if FALSE, they are put first; if NA, they are removed (see order).

probs

numeric vector of probabilities with values in [0,1], as in quantile.

values

numeric vector of values for which the corresponding population quantiles are to be returned.

type

character. For the plot method it indicates whether to plot the cumulative distribution quantiles ("L") or ordered but not-cumulated values ("x"). For the quantile and iquantile methods it indicates which of the quantiles ("L" or "p") to use.

tangent

color value for the Lorenz-curve tangent when plotted. The default NA value omits the tangent from the plot.

h

color value for the horizontal line for the Lorenz-curve tangent when plotted. The default NA value omits the horizontal line from the plot.

v

color value for the vertical line for the Lorenz-curve tangent when plotted. The default NA value omits the vertical line from the plot.

digits

numeric, number of significant digits in output.

object

object to summarize.

...

other arguments passed to the underlying functions.

Author

Peter Solymos <psolymos@gmail.com>

Details

The Lorenz curve is a continuous piecewise linear function representing the distribution of abundance (income, or wealth). Cumulative portion of the population: \(p_i = i / m\) (\(i=1,...,m\)), vs. cumulative portion of abundance: \(L_i = \sum_{j=1}^{i} x_j * n_j / \sum_{j=1}^{n} x_j * n_j\). where \(x_i\) are indexed in non-decreasing order (\(x_i <= x_{i+1}\)). By convention, p_0 = L_0 = 0. n can represent unequal frequencies.

The following charactersitics of the Lorenz curve are calculated: "t": index where tangent (slope 1) touches the curve; "x[t]", "p[t]", and "L[t]" are values corresponding to index t, x_t is the unmodified input. "S": Lorenz asymmetry coefficient (\(S = p_t + L_t\)), \(S = 1\) indicates symmetry. "G": Gini coefficient, 0 is perfect equality, values close to 1 indicate high inequality. "J": Youden index is the (largest) distance between the anti-diagonal and the curve, distance is largest at the tangent point (\(J = max(p - L) = p_t - L_t\)).

References

Damgaard, C., & Weiner, J. (2000): Describing inequality in plant size or fecundity. Ecology 81:1139--1142. <doi:10.2307/177185>

Schisterman, E. F., Perkins, N. J., Liu, A., & Bondell, H. (2005): Optimal cut-point and its corresponding Youden index to discriminate individuals using pooled blood samples. Epidemiology 16:73--81. <doi:10.1097/01.ede.0000147512.81966.ba>

Youden, W. J. (1950): Index for rating diagnostic tests. Cancer 3:32--5. <doi:10.1002/1097-0142(1950)3:1<32::AID-CNCR2820030106>3.0.CO;2-3>

See Also

Examples

Run this code
set.seed(1)
x <- c(rexp(100, 10), rexp(200, 1))

l <- lorenz(x)
head(l)
tail(l)
summary(l)
summary(unclass(l))

(q <- c(0.05, 0.5, 0.95))
(p_i <- quantile(l, probs=q, type="p"))
iquantile(l, values=p_i, type="p")
(p_i <- quantile(l, probs=q, type="L"))
iquantile(l, values=p_i, type="L")

op <- par(mfrow=c(2,1))
plot(l, lwd=2, tangent=2, h=3, v=4)
abline(0, 1, lty=2, col="grey")
abline(1, -1, lty=2, col="grey")
plot(l, type="x", lwd=2, h=3, v=4)
par(op)

## Lorenz-tangent approach to binarize a multi-level problem
n <- 100
g <- as.factor(sort(sample(LETTERS[1:4], n, replace=TRUE, prob=4:1)))
x <- rpois(n, exp(as.integer(g)))
mu <- aggregate(x, list(g), mean)
(l <- lorenz(mu$x, table(g)))
(s <- summary(l))

plot(l)
abline(0, 1, lty=2)
lines(rep(s["p[t]"], 2), c(s["p[t]"], s["L[t]"]), col=2)

Run the code above in your browser using DataLab