Creates a specification of a recipe step that will derive sparse principal components from one or more numeric variables.
step_spca(
recipe,
...,
num_comp = 5,
sparsity = 0,
num_var = NULL,
shrinkage = 1e-06,
center = TRUE,
scale = TRUE,
max_iter = 200,
tol = 0.001,
replace = TRUE,
prefix = "SPCA",
role = "predictor",
skip = FALSE,
id = recipes::rand_id("spca")
)tunable.step_spca(x, ...)
recipe object to which the step will be added.
one or more selector functions to choose which variables will be
used to compute the components. See selections
for
more details. These are not currently used by the tidy
method.
number of components to derive. The value of num_comp
will be constrained to a minimum of 1 and maximum of the number of original
variables when prep
is run.
sparsity (L1 norm) penalty for each component or
number of variables with non-zero component loadings. Larger sparsity
values produce more zero loadings. Argument sparsity
is ignored if
num_var
is given. The argument value may be a single number
applied to all components or a vector of component-specific numbers.
numeric shrinkage (quadratic) penalty for the components to improve conditioning; larger values produce more shrinkage of component loadings toward zero.
logicals indicating whether to mean center and standard deviation scale the original variables prior to deriving components, or functions or names of functions for the centering and scaling.
maximum number of algorithm iterations allowed.
numeric tolerance for the convergence criterion.
logical indicating whether to replace the original variables.
character string prefix added to a sequence of zero-padded integers to generate names for the resulting new variables.
analysis role that added step variables should be assigned. By default, they are designated as model predictors.
logical indicating whether to skip the step when the recipe is
baked. While all operations are baked when prep
is
run, some operations may not be applicable to new data (e.g. processing
outcome variables). Care should be taken when using skip = TRUE
as
it may affect the computations for subsequent operations.
unique character string to identify the step.
step_spca
object.
Function step_spca
creates a new step whose class is of
the same name and inherits from step_lincomp
, adds it to the
sequence of existing steps (if any) in the recipe, and returns the updated
recipe. For the tidy
method, a tibble with columns terms
(selectors or variables selected), weight
of each variable loading in
the components, and name
of the new variable names; and with
attribute pev
containing the proportions of explained variation.
Sparse principal components analysis (SPCA) is a variant of PCA in which the original variables may have zero loadings in the linear combinations that form the components.
Zou H, Hastie T and Tibshirani R (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15(2):265--286.
# NOT RUN {
library(recipes)
rec <- recipe(rating ~ ., data = attitude)
spca_rec <- rec %>%
step_spca(all_predictors(), num_comp = 5, sparsity = 1)
spca_prep <- prep(spca_rec, training = attitude)
spca_data <- bake(spca_prep, attitude)
pairs(spca_data, lower.panel = NULL)
tidy(spca_rec, number = 1)
tidy(spca_prep, number = 1)
# }
Run the code above in your browser using DataLab