step_spca: Sparse Principal Components Analysis Variable Reduction

Description

Creates a specification of a recipe step that will derive sparse principal components from one or more numeric variables.

Usage

step_spca(
  recipe,
  ...,
  num_comp = 5,
  sparsity = 0,
  num_var = NULL,
  shrinkage = 1e-06,
  center = TRUE,
  scale = TRUE,
  max_iter = 200,
  tol = 0.001,
  replace = TRUE,
  prefix = "SPCA",
  role = "predictor",
  skip = FALSE,
  id = recipes::rand_id("spca")
)
tunable.step_spca(x, ...)

Arguments

recipe

recipe object to which the step will be added.

...

one or more selector functions to choose which variables will be used to compute the components. See selections for more details. These are not currently used by the tidy method.

num_comp

number of components to derive. The value of num_comp will be constrained to a minimum of 1 and maximum of the number of original variables when prep is run.

sparsity, num_var

sparsity (L1 norm) penalty for each component or number of variables with non-zero component loadings. Larger sparsity values produce more zero loadings. Argument sparsity is ignored if num_var is given. The argument value may be a single number applied to all components or a vector of component-specific numbers.

shrinkage

numeric shrinkage (quadratic) penalty for the components to improve conditioning; larger values produce more shrinkage of component loadings toward zero.

center, scale

logicals indicating whether to mean center and standard deviation scale the original variables prior to deriving components, or functions or names of functions for the centering and scaling.

max_iter

maximum number of algorithm iterations allowed.

tol

numeric tolerance for the convergence criterion.

replace

logical indicating whether to replace the original variables.

prefix

character string prefix added to a sequence of zero-padded integers to generate names for the resulting new variables.

role

analysis role that added step variables should be assigned. By default, they are designated as model predictors.

skip

logical indicating whether to skip the step when the recipe is baked. While all operations are baked when prep is run, some operations may not be applicable to new data (e.g. processing outcome variables). Care should be taken when using skip = TRUE as it may affect the computations for subsequent operations.

unique character string to identify the step.

step_spca object.

Value

Function step_spca creates a new step whose class is of the same name and inherits from step_lincomp, adds it to the sequence of existing steps (if any) in the recipe, and returns the updated recipe. For the tidy method, a tibble with columns terms (selectors or variables selected), weight of each variable loading in the components, and name of the new variable names; and with attribute pev containing the proportions of explained variation.

Details

Sparse principal components analysis (SPCA) is a variant of PCA in which the original variables may have zero loadings in the linear combinations that form the components.

References

Zou H, Hastie T and Tibshirani R (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15(2):265--286.

Examples

Run this code

# NOT RUN {
library(recipes)

rec <- recipe(rating ~ ., data = attitude)
spca_rec <- rec %>%
  step_spca(all_predictors(), num_comp = 5, sparsity = 1)
spca_prep <- prep(spca_rec, training = attitude)
spca_data <- bake(spca_prep, attitude)

pairs(spca_data, lower.panel = NULL)

tidy(spca_rec, number = 1)
tidy(spca_prep, number = 1)

# }

Run the code above in your browser using DataLab