bin_data: Map a vector of numeric values into bins


Takes a vector of values and bin parameters and maps each value to an ordered factor whose levels are a set of bins like [0,1), [1,2), [2,3).

Values may be provided as a vector or via a pair of parameters - a data.table object and the name of the column to bin.


bin_data(x = NULL, binCol = NULL, bins = 10, binType = "explicit",
  boundaryType = "lcro]", returnDT = FALSE)



A vector of values or a data.table object


A column of dt specifying the values to bin

  • integer specifying the number of bins to generate

  • numeric vector specifying sequential bin boundaries {(x0, x1), (x1, x2), ..., (xn-1, xn)}

  • 2-column data.frame/data.table each row defines a bin

  • "explicit" interpret bins as they are given

  • "quantile" interpret bins as quantiles (empty quantile bins will be discarded)

  • "lcro]" bins are [left-closed, right-open) except for last bin which is [left-closed, right-closed]

  • "lcro)" bins are [left-closed, right-open)

  • "[lorc" bins are (left-open, right-closed] except for first bin which is [left-closed, right-closed]

  • "(lorc" bins are (left-open, right-closed]


If FALSE, return an ordered factor of bins corresponding to the values given, else return a data.table object which includes all bins and values (makes a copy of data.table object if given)


This function can return two different types of output, depending on whether returnDT is TRUE or FALSE.

If returnDT=FALSE, returns an ordered factor vector of bins like [1, 2), [-3,-2), ... corresponding to the values which were binned and whose levels correspond to all the generated bins. (Note that empty bins may be present as unused factor levels).

If returnDT=TRUE, returns a data.table object with all values and all bins (including empty bins). If dt is provided instead of vals, a full copy of dt is created and merged with the set of generated bins.


Run this code
iris.dt <- data.table(iris)

# custom bins
bin_data(iris.dt, binCol="Sepal.Length", bins=c(4, 5, 6, 7, 8))

# 10 equally spaced bins
bin_data(iris$Petal.Length, bins=10, returnDT=TRUE)

# make the last bin [left-closed, right-open)
bin_data(c(0,0,1,2), bins=2, boundaryType="lcro)", returnDT=TRUE)

# bin values by quantile
bin_data(c(0,0,0,0,1,2,3,4), bins=4, binType="quantile", returnDT=TRUE)

# }

