step_interact
can create interactions between
variables. It is primarily intended for numeric data;
categorical variables should probably be converted to dummy
variables using step_dummy()
prior to being used for
interactions.
Unlike other step functions, the terms
argument should
be a traditional R model formula but should contain no inline
functions (e.g. log
). For example, for predictors
A
, B
, and C
, a formula such as
~A:B:C
can be used to make a three way interaction
between the variables. If the formula contains terms other than
interactions (e.g. (A+B+C)^3
) only the interaction terms
are retained for the design matrix.
The separator between the variables defaults to "_x_
" so
that the three way interaction shown previously would generate a
column named A_x_B_x_C
. This can be changed using the
sep
argument.
When dummy variables are created and are used in interactions,
selectors can help specify the interactions succinctly. For
example, suppose a factor column X
gets converted to dummy
variables x_2
, x_3
, ..., x_6
using step_dummy()
. If
you wanted an interaction with numeric column z
, you could
create a set of specific interaction effects (e.g.
x_2:z + x_3:z
and so on) or you could use
starts_with("z_"):z
. When prep()
evaluates this step,
starts_with("z_")
resolves to (x_2 + x_3 + x_4 + x_5 + x6)
so that the formula is now (x_2 + x_3 + x_4 + x_5 + x6):z
and
all two-way interactions are created.