For a recipe with at least one preprocessing step, estimate the required parameters from a training set that can be later applied to other data sets.
prep(x, ...)# S3 method for recipe
prep(x, training = NULL, fresh = FALSE, verbose = FALSE,
retain = FALSE, stringsAsFactors = TRUE, ...)
an object
further arguments passed to or from other methods (not currently used).
A data frame or tibble that will be used to estimate parameters for preprocessing.
A logical indicating whether already trained steps should be
re-trained. If TRUE
, you should pass in a data set to the argument
training
.
A logical that controls wether progress is reported as steps are executed.
A logical: should the preprocessingcessed training set be saved
into the template
slot of the recipe after training? This is a good
idea if you want to add more steps later but want to avoid re-training
the existing steps.
A logical: should character columns be converted to
factors? This affects the preprocessingcessed training set (when
retain = TRUE
) as well as the results of bake.recipe
.
A recipe whose step objects have been updated with the required
quantities (e.g. parameter estimates, model objects, etc). Also, the
term_info
object is likely to be modified as the steps are
executed.
Given a data set, this function estimates the required quantities and statistics required by any steps.
prep()
returns an updated recipe with the estimates.
Note that missing data handling is handled in the steps; there is no global
na.rm
option at the recipe-level or in prep()
.
Also, if a recipe has been trained using prep()
and then steps
are added, prep()
will only update the new steps. If
fresh = TRUE
, all of the steps will be (re)estimated.
As the steps are executed, the training
set is updated. For example,
if the first step is to center the data and the second is to scale the
data, the step for scaling is given the centered data.