Fits a generalized linear model, similarly to R's glm().
glm(formula, family = gaussian, data, weights, subset, na.action,
  start = NULL, etastart, mustart, offset, control = list(...),
  model = TRUE, method = "glm.fit", x = FALSE, y = TRUE,
  contrasts = NULL, ...)# S4 method for formula,ANY,SparkDataFrame
glm(formula, family = gaussian, data,
  epsilon = 1e-06, maxit = 25, weightCol = NULL)
a symbolic description of the model to be fitted. Currently only a few formula operators are supported, including '~', '.', ':', '+', and '-'.
a description of the error distribution and link function to be used in the model.
This can be a character string naming a family function, a family function or
the result of a call to a family function. Refer R family at
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html.
Currently these families are supported: binomial, gaussian,
Gamma, and poisson.
a SparkDataFrame or R's glm data for training.
an optional vector of ‘prior weights’ to be used
    in the fitting process.  Should be NULL or a numeric vector.
an optional vector specifying a subset of observations to be used in the fitting process.
a function which indicates what should happen
    when the data contain NAs.  The default is set by
    the na.action setting of options, and is
    na.fail if that is unset.  The ‘factory-fresh’
    default is na.omit.  Another possible value is
    NULL, no action.  Value na.exclude can be useful.
starting values for the parameters in the linear predictor.
starting values for the linear predictor.
starting values for the vector of means.
this can be used to specify an a priori known
    component to be included in the linear predictor during fitting.
    This should be NULL or a numeric vector of length equal to
    the number of cases.  One or more offset terms can be
    included in the formula instead or as well, and if more than one is
    specified their sum is used.  See model.offset.
a list of parameters for controlling the fitting
    process.  For glm.fit this is passed to
    glm.control.
a logical value indicating whether model frame should be included as a component of the returned value.
the method to be used in fitting the model.  The default
    method "glm.fit" uses iteratively reweighted least squares
    (IWLS): the alternative "model.frame" returns the model frame
    and does no fitting.
User-supplied fitting functions can be supplied either as a function
    or a character string naming a function, with a function which takes
    the same arguments as glm.fit.  If specified as a character
    string it is looked up from within the stats namespace.
For glm: logical values indicating whether the response vector
and model matrix used in the fitting process should be returned as
components of the returned value.
an optional list. See the contrasts.arg
    of model.matrix.default.
For glm: arguments to be used to form the default
    control argument if it is not supplied directly.
For weights: further arguments passed to or from other methods.
positive convergence tolerance of iterations.
integer giving the maximal number of IRLS iterations.
the weight column name. If this is not set or NULL, we treat all instance
weights as 1.0.
glm returns a fitted generalized linear model.
# NOT RUN {
sparkR.session()
data(iris)
df <- createDataFrame(iris)
model <- glm(Sepal_Length ~ Sepal_Width, df, family = "gaussian")
summary(model)
# }
Run the code above in your browser using DataLab