Usage
## Default method:
h2o.glm(x, y, data, family, nfolds = 10, alpha = 0.5, lambda = 1e-5, epsilon = 1e-4,
standardize = TRUE, prior, tweedie.p = ifelse(family == 'tweedie', 1.5,
as.numeric(NA)), thresholds, iter.max, higher_accuracy, lambda_search, version = 2)
## Import to a ValueArray object:
h2o.glm.VA(x, y, data, family, nfolds = 10, alpha = 0.5, lambda = 1e-5, epsilon = 1e-4,
standardize = TRUE, prior, tweedie.p = ifelse(family == 'tweedie', 1.5,
as.numeric(NA)), thresholds = ifelse(family == 'binomial', seq(0, 1, 0.01),
as.numeric(NA)))
## Import to a FluidVecs object:
h2o.glm.FV(x, y, data, family, nfolds = 10, alpha = 0.5, lambda = 1e-5, epsilon = 1e-4,
standardize = TRUE, prior, tweedie.p = ifelse(family == 'tweedie', 1.5,
as.numeric(NA)), iter.max = 100, higher_accuracy = FALSE, lambda_search = FALSE)
Arguments
x
A vector containing the names of the predictors in the model.
y
The name of the response variable in the model.
data
An H2OParsedDataVA
(version = 1
) or H2OParsedData
(version = 2
) object containing the variables in the model.
family
A description of the error distribution and corresponding link function to be used in the model. Currently, Gaussian, binomial, Poisson, gamma, and Tweedie are supported. When a model is specified as Tweedie, users must also specify the appropriate Tweedi
nfolds
(Optional) Number of folds for cross-validation. The default is 10.
alpha
(Optional) The elastic-net mixing parameter, which must be in [0,1]. The penalty is defined to be $$P(\alpha,\beta) = (1-\alpha)/2||\beta||_2^2 + \alpha||\beta||_1 = \sum_j [(1-\alpha)/2 \beta_j^2 + \alpha|\beta_j|]$$ so alpha=1
is the lasso
lambda
The shrinkage parameter, which multiples $P(\alpha,\beta)$ in the objective. The larger lambda
is, the more the coefficients are shrunk toward zero (and each other).
epsilon
(Optional) Number indicating the cutoff for determining if a coefficient is zero.
standardize
(Optional) Logical value indicating whether the data should be standardized (set to mean = 0, variance = 1) before running GLM.
prior
(Optional) Prior probability of class 1. Only used if family = "binomial"
. When omitted, prior will default to the frequency of class 1 in the response column.
tweedie.p
(Optional) The index of the power variance function for the tweedie distribution. Only used if family = "tweedie"
.
thresholds
(Optional) Degree to which to weight the sensitivity (the proportion of correctly classified 1's) and specificity (the proportion of correctly classified 0s). The default option is joint optimization for the overall classification rate. Changing this will
iter.max
(Optional) Maximum number of iterations allowed.
higher_accuracy
(Optional) A logical value indicating whether to use line search. This will cause the algorithm to run slower, so generally, it should only be set to TRUE if GLM does not converge otherwise.
lambda_search
(Optional) A logical value indicating whether to onduct a search over the space of lambda values, starting from lambda_max. When this is set to TRUE, lambda
will be interpreted as lambda_min.
version
(Optional) The version of GLM to run. If version = 1
, this will run the more stable ValueArray implementation, while version = 2
runs the faster, but still beta stage FluidVecs implementation.