Learn R Programming

xray (version 0.2)

anomalies: Analyze a dataset and search for anomalies

Description

If any anomalous columns are found, they are reported as a warning and returned in a data.frame. To interpret the output, we are getting these anomalies:

  • NA values: NA

  • 0 values: Zero

  • Blank strings: Blank

  • Infinite numbers: Inf

Usage

anomalies(data_analyze, anomaly_threshold = 0.8, distinct_threshold = 2)

Arguments

data_analyze

a data frame or tibble to analyze

anomaly_threshold

the minimum percentage of anomalous rows for the column to be problematic

distinct_threshold

the minimum amount of distinct values the column has to have to not be problematic, usually you want to keep this at it's default value.

Details

All of these value are reported in columns prefixed by q (quantity), indicating the rows with the anomaly, and p (percentage), indicating percent of total rows with the anomaly.

And, also any columns with only one distinct value, which means the column doesn't bring information to the table (If all rows are equal, why bother having that column?). We report the number of distinct values in qDistinct.

Examples

Run this code
# NOT RUN {
library(xray)
anomalies(mtcars, anomaly_threshold=0.5)

# }

Run the code above in your browser using DataLab