Usage
h2o.glm(x, y, data, key = "", family, link, nfolds = 0, alpha = 0.5, nlambda = -1,
lambda.min.ratio = -1, lambda = 1e-5, epsilon = 1e-4, standardize = TRUE,
prior, variable_importances = FALSE, use_all_factor_levels = FALSE, tweedie.p =
ifelse(family == 'tweedie', 1.5, as.numeric(NA)), iter.max = 100,
higher_accuracy = FALSE, lambda_search = FALSE, return_all_lambda = FALSE,
max_predictors = -1)
Arguments
x
A vector containing the names of the predictors in the model.
y
The name of the response variable in the model.
data
An H2OParsedData
object containing the variables in the model.
key
(Optional) The unique hex key assigned to the resulting model. If none is given, a key will automatically be generated.
family
A description of the error distribution and corresponding link function to be used in the model. Currently, Gaussian, binomial, Poisson, gamma, and Tweedie are supported. When a model is specified as Tweedie, users must also specify the appropriate Tweedi
link
(Optional) The link function relates the linear predictor to the distribution function. Default is the canonical link for the specified family. The full list of supported links:
gaussian: identity, log, inverse
binomial: logit, log
poisson: log, ident
nfolds
(Optional) Number of folds for cross-validation.
alpha
(Optional) The elastic-net mixing parameter, which must be in [0,1]. The penalty is defined to be $$P(\alpha,\beta) = (1-\alpha)/2||\beta||_2^2 + \alpha||\beta||_1 = \sum_j [(1-\alpha)/2 \beta_j^2 + \alpha|\beta_j|]$$ so alpha=1
is the lasso
nlambda
The number of lambda
values when performing a search.
lambda.min.ratio
Smallest value for lambda
as a fraction of lambda.max
, the entry value, which is the smallest value for which all coefficients in the model are zero.
lambda
The shrinkage parameter, which multiplies $P(\alpha,\beta)$ in the objective. The larger lambda
is, the more the coefficients are shrunk toward zero (and each other).
epsilon
(Optional) Number indicating the cutoff for determining if a coefficient is zero.
standardize
(Optional) Logical value indicating whether the data should be standardized (set to mean = 0, variance = 1) before running GLM.
prior
(Optional) Prior probability of class 1. Only used if family = "binomial"
. When omitted, prior will default to the frequency of class 1 in the response column.
variable_importances
(Optional) A logical value either TRUE or FALSE to indicate whether the variable importances should be computed. Compute variable importances for input features. NOTE: If use_all_factor_levels is off the importance of the base level will NOT be shown.
use_all_factor_levels
(Optional) A logical value either TRUE or FALSE to indicate whether all factor levels should be used. By default, first factor level is skipped from the possible set of predictors. Set this flag if you want use all of the levels. Needs sufficient regulari
tweedie.p
(Optional) The index of the power variance function for the tweedie distribution. Only used if family = "tweedie"
.
iter.max
(Optional) Maximum number of iterations allowed.
higher_accuracy
(Optional) A logical value indicating whether to use line search. This will cause the algorithm to run slower, so generally, it should only be set to TRUE if GLM does not converge otherwise.
lambda_search
(Optional) A logical value indicating whether to conduct a search over the space of lambda values, starting from lambda_max. When this is set to TRUE, lambda
will be interpreted as lambda_min.
return_all_lambda
(Optional) A logical value indicating whether to return every model built during the lambda search. Only used if lambda_search = TRUE
. If return_all_lambda = FALSE
, then only the model corresponding to the optimal lambda will be
max_predictors
(Optional) When lambda_search = TRUE
, the algorithm will stop training if the number of predictors exceeds this value. Ignored when lambda_search = FALSE
or max_predictors = -1
.