Learn R Programming

DescTools (version 0.99.19)

Winsorize: Winsorize

Description

Clean data by means of winsorization, i.e., by shrinking outlying observations to the border of the main part of the data.

Usage

Winsorize(x, minval = NULL, maxval = NULL, probs = c(0.05, 0.95), na.rm = FALSE)

Arguments

x
a numeric vector to be winsorized.
minval
the low border, all values being lower than this will be replaced by this value. The default is set to the 5%-quantile of x.
maxval
the high border, all values being larger than this will be replaced by this value. The default is set to the 95%-quantile of x.
probs
numeric vector of probabilities with values in [0,1] as used in quantile.
na.rm
should NAs be omitted to calculate the quantiles? Note that NAs in x are preserved and left unchanged anyway.

Value

A vector of the same length as the original data x containing the winsorized data.

Details

Consider standardizing (possibly robust) the data before winsorizing. See scale, RobScale

See Also

Winsorize library(robustHD) contains an option to winsorize multivariate data

Examples

Run this code
## generate data
set.seed(1234)     # for reproducibility
x <- rnorm(10)     # standard normal
x[1] <- x[1] * 10  # introduce outlier

## Winsorize data
x
Winsorize(x)

# use Large and Small, if a fix number of values should be winsorized (here k=3):
Winsorize(x, minval=tail(Small(x, k=3), 1), maxval=head(Large(x, k=3), 1))

Run the code above in your browser using DataLab