Learn R Programming

sparklyr (version 0.4)

ml_generalized_linear_regression: Spark ML -- Generalized Linear Regression

Description

Perform generalized linear regression on a Spark DataFrame.

Usage

ml_generalized_linear_regression(x, response, features, intercept = TRUE,
  family = gaussian(link = "identity"), iter.max = 100L,
  ml.options = ml_options(), ...)

Arguments

x

An object coercable to a Spark DataFrame (typically, a tbl_spark).

response

The name of the response vector (as a length-one character vector), or a formula, giving a symbolic description of the model to be fitted. When response is a formula, it is used in preference to other parameters to set the response, features, and intercept parameters (if available). Currently, only simple linear combinations of existing parameters is supposed; e.g. response ~ feature1 + feature2 + .... The intercept term can be omitted by using - 1 in the model fit.

features

The name of features (terms) to use for the model fit.

intercept

Boolean; should the model be fit with an intercept term?

family

The family / link function to use; analogous to those normally passed in to calls to R's own glm.

iter.max

The maximum number of iterations to use.

ml.options

Optional arguments, used to affect the model generated. See ml_options for more details.

...

Optional arguments; currently unused.

Details

In contrast to ml_linear_regression() and ml_logistic_regression(), these routines do not allow you to tweak the loss function (e.g. for elastic net regression); however, the model fits returned by this routine are generally richer in regards to information provided for assessing the quality of fit.

See Also

Other Spark ML routines: ml_als_factorization, ml_decision_tree, ml_gradient_boosted_trees, ml_kmeans, ml_lda, ml_linear_regression, ml_logistic_regression, ml_multilayer_perceptron, ml_naive_bayes, ml_one_vs_rest, ml_pca, ml_random_forest, ml_survival_regression