A checkFunction to be called from check
that identifies outlier values
in a numeric/integer/Date variable by use of the Turkey Boxplot method (consistent witht the
boxplot
function).
identifyOutliersTBStyle(v, nMax = 10, maxDecimals = 2)
A checkResult
with three entires:
$problem
(a logical indicating whether outliers were found),
$message
(a message describing which values are outliers) and
$problemValues
(the outlier values).
A numeric, integer or Date variable to check.
The maximum number of problematic values to report.
Default is 10
. Set to Inf
if all problematic values are to be included
in the outputted message, or to 0
for no output.
A positive integer or Inf
. Number of decimals used when
printing numerical values in the data summary and in problematic values from the
data checks. If Inf
, no rounding is performed.
Outliers are defined in the style of Turkey Boxplots (consistent with the
boxplot
function), i.e. as values that are smaller than the 1st quartile minus
the inter quartile range (IQR) or greater than the third quartile plus the IQR.
For Date variables, the calculations are done on their raw numeric format (as
obtained by using unclass
), after which they are translated back to Dates.
Note that no rounding is performed for Dates, no matter the value of maxDecimals
.
check
, allCheckFunctions
,
checkFunction
, checkResult
identifyOutliersTBStyle(c(1:10, 200, 200, 700))
Run the code above in your browser using DataLab