Usage
h2o.prcomp(training_frame, x, k, model_id, ignore_const_cols = TRUE,
max_iterations = 1000, transform = c("NONE", "DEMEAN", "DESCALE",
"STANDARDIZE"), pca_method = c("GramSVD", "Power", "Randomized", "GLRM"),
use_all_factor_levels = FALSE, compute_metrics = TRUE,
impute_missing = FALSE, seed, max_runtime_secs = 0)
Arguments
training_frame
An H2OFrame object containing the
variables in the model.
x
(Optional) A vector containing the data columns on which SVD operates.
k
The number of principal components to be computed. This must be
between 1 and min(ncol(training_frame), nrow(training_frame)) inclusive.
model_id
(Optional) The unique hex key assigned to the
resulting model. Automatically generated if none is provided.
ignore_const_cols
A logical value indicating whether or not to ignore
all the constant columns in the training frame.
max_iterations
The maximum number of iterations to run each power
iteration loop. Must be between 1 and 1e6 inclusive.
transform
A character string that indicates how the training data
should be transformed before running PCA. Possible values are "NONE":
for no transformation, "DEMEAN": for subtracting the mean of each
column, "DESCALE": for dividing by the standard deviation of ea
pca_method
A character string that indicates how PCA should be calculated.
Possible values are "GramSVD": distributed computation of the Gram matrix
followed by a local SVD using the JAMA package, "Power": computation of
the SVD using the power iteration method, "Ra
use_all_factor_levels
(Optional) A logical value indicating whether all
factor levels should be included in each categorical column expansion.
If FALSE, the indicator column corresponding to the first factor level
of every categorical variable will be dropped. Defaults to FALS
compute_metrics
(Optional) A logical value indicating whether to compute
metrics on the training data, which requires additional calculation time.
Only used if pca_method = "GLRM". Defaults to TRUE.
impute_missing
(Optional) A logical value indicating whether missing values
should be imputed with the mean of the corresponding column. This is necessary
if too many entries are NA when using methods like GramSVD. Defaults to FALSE.
seed
(Optional) Random seed used to initialize the right singular vectors
at the beginning of each power method iteration.
max_runtime_secs
Maximum allowed runtime in seconds for model training. Use 0 to disable.