Fits a generalized linear model, similarly to R's glm().
glm(formula, family = gaussian, data, weights, subset, na.action,
start = NULL, etastart, mustart, offset, control = list(...),
model = TRUE, method = "glm.fit", x = FALSE, y = TRUE,
contrasts = NULL, ...)# S4 method for formula,ANY,SparkDataFrame
glm(formula, family = gaussian, data,
epsilon = 1e-06, maxit = 25, weightCol = NULL)
a symbolic description of the model to be fitted. Currently only a few formula operators are supported, including '~', '.', ':', '+', and '-'.
a description of the error distribution and link function to be used in the model.
This can be a character string naming a family function, a family function or
the result of a call to a family function. Refer R family at
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html.
Currently these families are supported: binomial
, gaussian
,
Gamma
, and poisson
.
a SparkDataFrame or R's glm data for training.
an optional vector of ‘prior weights’ to be used
in the fitting process. Should be NULL
or a numeric vector.
an optional vector specifying a subset of observations to be used in the fitting process.
a function which indicates what should happen
when the data contain NA
s. The default is set by
the na.action
setting of options
, and is
na.fail
if that is unset. The ‘factory-fresh’
default is na.omit
. Another possible value is
NULL
, no action. Value na.exclude
can be useful.
starting values for the parameters in the linear predictor.
starting values for the linear predictor.
starting values for the vector of means.
this can be used to specify an a priori known
component to be included in the linear predictor during fitting.
This should be NULL
or a numeric vector of length equal to
the number of cases. One or more offset
terms can be
included in the formula instead or as well, and if more than one is
specified their sum is used. See model.offset
.
a list of parameters for controlling the fitting
process. For glm.fit
this is passed to
glm.control
.
a logical value indicating whether model frame should be included as a component of the returned value.
the method to be used in fitting the model. The default
method "glm.fit"
uses iteratively reweighted least squares
(IWLS): the alternative "model.frame"
returns the model frame
and does no fitting.
User-supplied fitting functions can be supplied either as a function
or a character string naming a function, with a function which takes
the same arguments as glm.fit
. If specified as a character
string it is looked up from within the stats namespace.
For glm
: logical values indicating whether the response vector
and model matrix used in the fitting process should be returned as
components of the returned value.
an optional list. See the contrasts.arg
of model.matrix.default
.
For glm
: arguments to be used to form the default
control
argument if it is not supplied directly.
For weights
: further arguments passed to or from other methods.
positive convergence tolerance of iterations.
integer giving the maximal number of IRLS iterations.
the weight column name. If this is not set or NULL
, we treat all instance
weights as 1.0.
glm
returns a fitted generalized linear model.
# NOT RUN {
sparkR.session()
data(iris)
df <- createDataFrame(iris)
model <- glm(Sepal_Length ~ Sepal_Width, df, family = "gaussian")
summary(model)
# }
Run the code above in your browser using DataLab