Learn R Programming

recipes (version 1.1.0)

check_class: Check variable class

Description

check_class creates a specification of a recipe check that will check if a variable is of a designated class.

Usage

check_class(
  recipe,
  ...,
  role = NA,
  trained = FALSE,
  class_nm = NULL,
  allow_additional = FALSE,
  skip = FALSE,
  class_list = NULL,
  id = rand_id("class")
)

Value

An updated version of recipe with the new check added to the sequence of any existing operations.

Arguments

recipe

A recipe object. The check will be added to the sequence of operations for this recipe.

...

One or more selector functions to choose variables for this check. See selections() for more details.

role

Not used by this check since no new variables are created.

trained

A logical for whether the selectors in ... have been resolved by prep().

class_nm

A character vector that will be used in inherits to check the class. If NULL the classes will be learned in prep. Can contain more than one class.

allow_additional

If TRUE a variable is allowed to have additional classes to the one(s) that are checked.

skip

A logical. Should the check be skipped when the recipe is baked by bake()? While all operations are baked when prep() is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when using skip = TRUE as it may affect the computations for subsequent operations.

class_list

A named list of column classes. This is NULL until computed by prep().

id

A character string that is unique to this check to identify it.

Tidying

When you tidy() this check, a tibble with columns terms (the selectors or variables selected) and value (the type) is returned.

Case weights

The underlying operation does not allow for case weights.

Details

This function can check the classes of the variables in two ways. When the class argument is provided it will check if all the variables specified are of the given class. If this argument is NULL, the check will learn the classes of each of the specified variables in prep. Both ways will break bake if the variables are not of the requested class. If a variable has multiple classes in prep, all the classes are checked. Please note that in prep the argument strings_as_factors defaults to TRUE. If the train set contains character variables the check will be break bake when strings_as_factors is TRUE.

See Also

Other checks: check_cols(), check_missing(), check_new_values(), check_range()

Examples

Run this code
library(dplyr)
data(Sacramento, package = "modeldata")

# Learn the classes on the train set
train <- Sacramento[1:500, ]
test <- Sacramento[501:nrow(Sacramento), ]
recipe(train, sqft ~ .) %>%
  check_class(everything()) %>%
  prep(train, strings_as_factors = FALSE) %>%
  bake(test)

# Manual specification
recipe(train, sqft ~ .) %>%
  check_class(sqft, class_nm = "integer") %>%
  check_class(city, zip, type, class_nm = "factor") %>%
  check_class(latitude, longitude, class_nm = "numeric") %>%
  prep(train, strings_as_factors = FALSE) %>%
  bake(test)

# By default only the classes that are specified
#   are allowed.
x_df <- tibble(time = c(Sys.time() - 60, Sys.time()))
x_df$time %>% class()
if (FALSE) {
recipe(x_df) %>%
  check_class(time, class_nm = "POSIXt") %>%
  prep(x_df) %>%
  bake_(x_df)
}

# Use allow_additional = TRUE if you are fine with it
recipe(x_df) %>%
  check_class(time, class_nm = "POSIXt", allow_additional = TRUE) %>%
  prep(x_df) %>%
  bake(x_df)

Run the code above in your browser using DataLab