Learn R Programming

DescTools (version 0.99.56)

Winsorize: Winsorize (Replace Extreme Values by Less Extreme Ones)

Description

Winsorizing a vector means that a predefined quantum of the smallest and/or the largest values are replaced by less extreme values. Thereby the substitute values are the most extreme retained values.

Usage

Winsorize(x, val = quantile(x, probs = c(0.05, 0.95), na.rm = FALSE))

Value

A vector of the same length as the original data x containing the winsorized data.

Arguments

x

a numeric vector to be winsorized.

val

the low border, all values being lower than this will be replaced by this value. The default is set to the 5%-quantile of x.

Author

Andri Signorell andri@signorell.net

Details

The winsorized vector is obtained by

$$g(x) = \left\{\begin{array}{ll} -c &\textup{for } x \le c\\ x &\textup{for } |x| < c\\ c &\textup{for } x \ge c \end{array}\right. $$

You may also want to consider standardizing (possibly robustly) the data before you perform a winsorization.

See Also

winsorize from the package robustHD contains an option to winsorize multivariate data

scale, RobScale

Examples

Run this code


library(DescTools)

## generate data
set.seed(9128)
x <- round(runif(100) * 100, 1)

(d.frm <- DescTools::Sort(data.frame(
  x, 
  default   = Winsorize(x), 
  quantile  = Winsorize(x, quantile(x, probs=c(0.1, 0.8), na.rm = FALSE)), 
  fixed_val = Winsorize(x, val=c(15, 85)),
  fixed_n   = Winsorize(x, val=c(Small(x, k=3)[3], Large(x, k=3)[1])),
  closest   = Winsorize(x, val=unlist(Closest(x, c(30, 70)))) 
)))[c(1:10, 90:100), ]

# use Large and Small, if a fix number of values should be winsorized (here k=3)

PlotLinesA(SetNames(d.frm, rownames=NULL), lwd=2, col=Pal("Tibco"), 
           main="Winsorized Vector")
z <- 0:10
# twosided (default):
Winsorize(z, val=c(2,8))

# onesided:
# ... replace all values > 8 with 8
Winsorize(z, val=c(min(z), 8))
# ... replace all values < 4 with 4
Winsorize(z, val=c(4, max(z)))

Run the code above in your browser using DataLab