default_recipe_blueprint: Default recipe blueprint

Description

This pages holds the details for the recipe preprocessing blueprint. This is the blueprint used by default from mold() if x is a recipe.

Usage

default_recipe_blueprint(
  intercept = FALSE,
  allow_novel_levels = FALSE,
  fresh = TRUE,
  strings_as_factors = TRUE,
  composition = "tibble"
)
# S3 method for recipe
mold(x, data, ..., blueprint = NULL)

Value

For default_recipe_blueprint(), a recipe blueprint.

Arguments

intercept: A logical. Should an intercept be included in the processed data? This information is used by the process function in the mold and forge function list.
allow_novel_levels: A logical. Should novel factor levels be allowed at prediction time? This information is used by the clean function in the forge function list, and is passed on to scream().
fresh: Should already trained operations be re-trained when prep() is called?
strings_as_factors: Should character columns be converted to factors when prep() is called?
composition: Either "tibble", "matrix", or "dgCMatrix" for the format of the processed predictors. If "matrix" or "dgCMatrix" are chosen, all of the predictors must be numeric after the preprocessing method has been applied; otherwise an error is thrown.
x: An unprepped recipe created from recipes::recipe().
data: A data frame or matrix containing the outcomes and predictors.
...: Not used.
blueprint: A preprocessing blueprint. If left as NULL, then a default_recipe_blueprint() is used.

Mold

When mold() is used with the default recipe blueprint:

It calls recipes::prep() to prep the recipe.
It calls recipes::juice() to extract the outcomes and predictors. These are returned as tibbles.
If intercept = TRUE, adds an intercept column to the predictors.

Forge

When forge() is used with the default recipe blueprint:

It calls shrink() to trim new_data to only the required columns and coerce new_data to a tibble.
It calls scream() to perform validation on the structure of the columns of new_data.
It calls recipes::bake() on the new_data using the prepped recipe used during training.
It adds an intercept column onto new_data if intercept = TRUE.

Examples

Run this code

library(recipes)

# ---------------------------------------------------------------------------
# Setup

train <- iris[1:100, ]
test <- iris[101:150, ]

# ---------------------------------------------------------------------------
# Recipes example

# Create a recipe that logs a predictor
rec <- recipe(Species ~ Sepal.Length + Sepal.Width, train) %>%
  step_log(Sepal.Length)

processed <- mold(rec, train)

# Sepal.Length has been logged
processed$predictors

processed$outcomes

# The underlying blueprint is a prepped recipe
processed$blueprint$recipe

# Call forge() with the blueprint and the test data
# to have it preprocess the test data in the same way
forge(test, processed$blueprint)

# Use `outcomes = TRUE` to also extract the preprocessed outcome!
# This logged the Sepal.Length column of `new_data`
forge(test, processed$blueprint, outcomes = TRUE)

# ---------------------------------------------------------------------------
# With an intercept

# You can add an intercept with `intercept = TRUE`
processed <- mold(rec, train, blueprint = default_recipe_blueprint(intercept = TRUE))

processed$predictors

# But you also could have used a recipe step
rec2 <- step_intercept(rec)

mold(rec2, iris)$predictors

# ---------------------------------------------------------------------------
# Matrix output for predictors

# You can change the `composition` of the predictor data set
bp <- default_recipe_blueprint(composition = "dgCMatrix")
processed <- mold(rec, train, blueprint = bp)
class(processed$predictors)

# ---------------------------------------------------------------------------
# Non standard roles

# If you have custom recipes roles, they are assumed to be required at
# `bake()` time when passing in `new_data`. This is an assumption that both
# recipes and hardhat makes, meaning that those roles are required at
# `forge()` time as well.
rec_roles <- recipe(train) %>%
  update_role(Sepal.Width, new_role = "predictor") %>%
  update_role(Species, new_role = "outcome") %>%
  update_role(Sepal.Length, new_role = "id") %>%
  update_role(Petal.Length, new_role = "important")

processed_roles <- mold(rec_roles, train)

# The custom roles will be in the `mold()` result in case you need
# them for modeling.
processed_roles$extras

# And they are in the `forge()` result
forge(test, processed_roles$blueprint)$extras

# If you remove a column with a custom role from the test data, then you
# won't be able to `forge()` even though this recipe technically didn't
# use that column in any steps
test2 <- test
test2$Petal.Length <- NULL
try(forge(test2, processed_roles$blueprint))

# Most of the time, if you find yourself in the above scenario, then we
# suggest that you remove `Petal.Length` from the data that is supplied to
# the recipe. If that isn't an option, you can declare that that column
# isn't required at `bake()` time by using `update_role_requirements()`
rec_roles <- update_role_requirements(rec_roles, "important", bake = FALSE)
processed_roles <- mold(rec_roles, train)
forge(test2, processed_roles$blueprint)

Run the code above in your browser using DataLab