Learn R Programming

Hmisc (version 5.2-2)

cut2: Cut a Numeric Variable into Intervals

Description

cut2 is a function like cut but left endpoints are inclusive and labels are of the form [lower, upper), except that last interval is [lower,upper]. If cuts are given, will by default make sure that cuts include entire range of x. Also, if cuts are not given, will cut x into quantile groups (g given) or groups with a given minimum number of observations (m). Whereas cut creates a category object, cut2 creates a factor object. m is not guaranteed but is a target.

cutGn guarantees that the grouped variable will have a minimum of m observations in any group. This is done by an exhaustive algorithm that runs fast due to being coded in Fortran.

Usage

cut2(x, cuts, m=150, g, levels.mean=FALSE, digits, minmax=TRUE,
oneval=TRUE, onlycuts=FALSE, formatfun=format, ...)

cutGn(x, m, what=c('mean', 'factor', 'summary', 'cuts', 'function'), rcode=FALSE)

Value

a factor variable with levels of the form [a,b) or formatted means (character strings) unless onlycuts is TRUE in which case a numeric vector is returned

Arguments

x

numeric vector to classify into intervals

cuts

cut points

m

desired minimum number of observations in a group. The algorithm does not guarantee that all groups will have at least m observations.

g

number of quantile groups

levels.mean

set to TRUE to make the new categorical vector have levels attribute that is the group means of x instead of interval endpoint labels

digits

number of significant digits to use in constructing levels. Default is 3 (5 if levels.mean=TRUE)

minmax

if cuts is specified but min(x)<min(cuts) or max(x)>max(cuts), augments cuts to include min and max x

oneval

if an interval contains only one unique value, the interval will be labeled with the formatted version of that value instead of the interval endpoints, unless oneval=FALSE

onlycuts

set to TRUE to only return the vector of computed cuts. This consists of the interior values plus outer ranges.

formatfun

formatting function, supports formula notation (if rlang is installed)

...

additional arguments passed to formatfun

what

specifies the kind of vector values to return from cutGn, the default being like 'levels.mean' of cut2. Specify 'summary' to return a numeric 3-column matrix that summarizes the intervals satisfying the m requirement. Use what='cuts' to only return the vector of computed cutpoints. To create a function that will recode the variable in play using the same intervals as computed by cutGn, specify what='function'. This function will have a what argument to allow the user to decide later whether to recode into interval means or into a factor variable.

rcode

set to TRUE to run the cutgn algorithm in R. This is useful for speed comparisons with the default compiled code.

See Also

cut, quantile, combine.levels

Examples

Run this code
set.seed(1)
x <- runif(1000, 0, 100)
z <- cut2(x, c(10,20,30))
table(z)
table(cut2(x, g=10))      # quantile groups
table(cut2(x, m=50))      # group x into intevals with at least 50 obs.

table(cutGn(x, m=50, what='factor'))
f <- cutGn(x, m=50, what='function')
f
f(c(-1, 2, 10), what='mean')
f(c(-1, 2, 10), what='factor')
if (FALSE) {
  x <- round(runif(200000), 3)
  system.time(a <- cutGn(x, m=20))              # 0.02s
  system.time(b <- cutGn(x, m=20, rcode=TRUE))  # 1.51s
  identical(a, b)
}

Run the code above in your browser using DataLab