getBreaks: Compute break points for categorizing (semi-)continuous variables

Description

Compute break points for categorizing continuous or semi-continuous variables using (weighted) quantiles. This is a utility function that is useful for writing custom wrapper functions such as simEUSILC.

Usage

getBreaks(
  x,
  weights = NULL,
  zeros = TRUE,
  lower = NULL,
  upper = NULL,
  equidist = TRUE,
  probs = NULL,
  strata = NULL
)

Value

A numeric vector of break points.

Arguments

x: a numeric vector to be categorized.
weights: an optional numeric vector containing sample weights.
zeros: a logical indicating whether x is semi-continuous, i.e., contains a considerable amount of zeros. See “Details” on how this affects the behavior of the function.
lower, upper: optional numeric values specifying lower and upper bounds other than minimum and maximum of x, respectively.
equidist: a logical indicating whether the (positive) break points should be equidistant or whether there should be refinements in the lower and upper tail (see “Details”).
probs: a numeric vector of probabilities with values in \([0, 1]\) giving quantiles to be used as (positive) break points. If supplied, this is preferred over equidist.
strata: an optional vector specifying a strata variable (e.g household ids). if specified, the mean of x (and also of weights if specified) is computed within each strata before calculating the breaks.

Author

Andreas Alfons and Bernhard Meindl

Details

If equidist is TRUE, the behavior is as follows. If zeros is TRUE as well, the 0%, 10%, ..., 90% quantiles of the negative values and the 10%, 20%, ..., 100% of the positive values are computed. These quantiles are then used as break points together with 0. If zeros is not TRUE, on the other hand, the 0%, 10%, ..., 100% quantiles of all values are used.

If equidist is not TRUE, the behavior is as follows. If zeros is not TRUE, the 1%, 5%, 10%, 20%, 40%, 60%, 80%, 90%, 95% and 99% quantiles of all values are used for the inner part of the data (instead of the equidistant 10%, ..., 90% quantiles). If zeros is TRUE, these quantiles are only used for the positive values while the quantiles of the negative values remain equidistant.

Note that duplicated values among the quantiles are discarded and that the minimum and maximum are replaced with lower and upper, respectively, if these are specified.

The (weighted) quantiles are computed with the function quantileWt.

Examples

Run this code


data(eusilcS)

# semi-continuous variable, positive break points equidistant
getBreaks(eusilcS$netIncome, weights=eusilcS$rb050)

# semi-continuous variable, positive break points not equidistant
getBreaks(eusilcS$netIncome, weights=eusilcS$rb050,
    equidist = FALSE)

Run the code above in your browser using DataLab