h2o.glm(x, y, training_frame, model_id = NULL, validation_frame = NULL, nfolds = 0, seed = -1, keep_cross_validation_predictions = FALSE, keep_cross_validation_fold_assignment = FALSE, fold_assignment = c("AUTO", "Random", "Modulo", "Stratified"), fold_column = NULL, ignore_const_cols = TRUE, score_each_iteration = FALSE, offset_column = NULL, weights_column = NULL, family = c("gaussian", "binomial", "quasibinomial", "multinomial", "poisson", "gamma", "tweedie"), tweedie_variance_power = 0, tweedie_link_power = 1, solver = c("AUTO", "IRLSM", "L_BFGS", "COORDINATE_DESCENT_NAIVE", "COORDINATE_DESCENT"), alpha = NULL, lambda = NULL, lambda_search = FALSE, early_stopping = TRUE, nlambdas = -1, standardize = TRUE, missing_values_handling = c("MeanImputation", "Skip"), compute_p_values = FALSE, remove_collinear_columns = FALSE, intercept = TRUE, non_negative = FALSE, max_iterations = -1, objective_epsilon = -1, beta_epsilon = 1e-04, gradient_epsilon = -1, link = c("family_default", "identity", "logit", "log", "inverse", "tweedie"), prior = -1, lambda_min_ratio = -1, beta_constraints = NULL, max_active_predictors = -1, interactions = NULL, balance_classes = FALSE, class_sampling_factors = NULL, max_after_balance_size = 5, max_hit_ratio_k = 0, max_runtime_secs = 0)
Logical
. Whether to keep the predictions of the cross-validation models. Defaults to FALSE.Logical
. Whether to keep the cross-validation fold assignment. Defaults to FALSE.Logical
. Ignore constant columns. Defaults to TRUE.Logical
. Whether to score during each iteration of model training. Defaults to FALSE.Logical
. use lambda search starting at lambda max, given lambda is then interpreted as lambda min
Defaults to FALSE.Logical
. stop early when there is no more relative improvement on train or validation (if provided)
Defaults to TRUE.Logical
. Standardize numeric columns to have zero mean and unit variance Defaults to TRUE.Logical
. request p-values computation, p-values work only with IRLSM solver and no regularization
Defaults to FALSE.Logical
. in case of linearly dependent columns remove some of the dependent columns Defaults to FALSE.Logical
. include constant term in the model Defaults to TRUE.Logical
. Restrict coefficients (not intercept) to be non-negative Defaults to FALSE.Logical
. Balance training data class counts via over/under-sampling (for imbalanced data). Defaults to
FALSE.H2OModel
is returned. The specific subclass depends on the machine
learning task at hand (if it's binomial classification, then an H2OBinomialModel
is
returned, if it's regression then a H2ORegressionModel
is returned). The default print-
out of the models is shown, but further GLM-specifc information can be queried out of the object. To access
these various items, please refer to the seealso section below. Upon completion of the GLM, the resulting
object has coefficients, normalized coefficients, residual/null deviance, aic, and a host of model metrics
including MSE, AUC (for logistic regression), degrees of freedom, and confusion matrices. Please refer to the
more in-depth GLM documentation available here:
https://h2o-release.s3.amazonaws.com/h2o-dev/rel-shannon/2/docs-website/h2o-docs/index.html#Data+Science+Algorithms-GLM
predict.H2OModel
for prediction, h2o.mse
, h2o.auc
,
h2o.confusionMatrix
, h2o.performance
, h2o.giniCoef
,
h2o.logloss
, h2o.varimp
, h2o.scoreHistory
h2o.init()
# Run GLM of CAPSULE ~ AGE + RACE + PSA + DCAPS
prostatePath = system.file("extdata", "prostate.csv", package = "h2o")
prostate.hex = h2o.importFile(path = prostatePath, destination_frame = "prostate.hex")
h2o.glm(y = "CAPSULE", x = c("AGE","RACE","PSA","DCAPS"), training_frame = prostate.hex,
family = "binomial", nfolds = 0, alpha = 0.5, lambda_search = FALSE)
# Run GLM of VOL ~ CAPSULE + AGE + RACE + PSA + GLEASON
myX = setdiff(colnames(prostate.hex), c("ID", "DPROS", "DCAPS", "VOL"))
h2o.glm(y = "VOL", x = myX, training_frame = prostate.hex, family = "gaussian",
nfolds = 0, alpha = 0.1, lambda_search = FALSE)
# GLM variable importance
# Also see:
# https://github.com/h2oai/h2o/blob/master/R/tests/testdir_demos/runit_demo_VI_all_algos.R
data.hex = h2o.importFile(
path = "https://s3.amazonaws.com/h2o-public-test-data/smalldata/demos/bank-additional-full.csv",
destination_frame = "data.hex")
myX = 1:20
myY="y"
my.glm = h2o.glm(x=myX, y=myY, training_frame=data.hex, family="binomial", standardize=TRUE,
lambda_search=TRUE)
Run the code above in your browser using DataLab