Learn R Programming

quest (version 0.2.0)

winsors: Winsorize Numeric Data

Description

winsors winsorizes numeric data by recoding extreme values as a user identified boundary value, which is defined by z-score units. The to.na argument provides the option of recoding the extreme values as missing.

Usage

winsors(
  data,
  vrb.nm,
  z.min = -3,
  z.max = 3,
  rtn.int = FALSE,
  to.na = FALSE,
  suffix = "_win"
)

Value

data.frame of winsorized data with extreme values recoded as either the boundary values or NA and colnames = paste0(vrb.nm, suffix).

Arguments

data

data.frame of data.

vrb.nm

character vector of colnames from data specifying the variables.

z.min

numeric vector of length 1 specifying the lower boundary value in z-score units.

z.max

numeric vector of length 1 specifying the upper boundary value in z-score units.

rtn.int

logical vector of length 1 specifying whether the recoded values should be rounded to the nearest integer. This can be useful when working with count data and decimal values are impossible.

to.na

logical vector of length 1 specifying whether the extreme values should be recoded to NA rather than winsorized to the boundary values.

suffix

character vector of length 1 specifying the string to append to the end of the colnames in the return object.

See Also

winsor winsor # psych package

Examples

Run this code

# winsorize
lapply(X = quakes[c("mag","stations")], FUN = table)
new <- winsors(quakes, vrb.nm = names(quakes))
lapply(X = new, FUN = table)

# recode as NA
vecNA(quakes)
new <- winsors(quakes, vrb.nm = names(quakes), to.na = TRUE)
vecNA(new)

# rtn.int = TRUE
winsors(data = cars, vrb.nm = names(cars), z.min = -2, z.max = 2, rtn.int = FALSE)
winsors(data = cars, vrb.nm = names(cars), z.min = -2, z.max = 2, rtn.int = TRUE)

Run the code above in your browser using DataLab