lm
is used to fit linear models.
It can be used to carry out regression,
single stratum analysis of variance and
analysis of covariance (although aov
may provide a more
convenient interface for these).lm(formula, data, subset, weights, na.action,
method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE,
singular.ok = TRUE, contrasts = NULL, offset, ...)
"formula"
(or one that
can be coerced to that class): a symbolic description of the
model to be fitted. The details of model specification are given
under as.data.frame
to a data frame) containing
the variables in the model. If not found in data
, the
variables are taken from environment(formula)
,
typically the environment from which lm
is called.NULL
or a numeric vector.
If non-NULL, weighted least squares is used with weights
weights
(that is, minimizing sum(w*e^2)
); otherwise
ordinary least squares is used. See also NA
s. The default is set by
the na.action
setting of options
, and is
na.fail
if that is unset. The na.omit
. Another possible value is
NULL
, no action. Value na.exclude
can be useful.method = "qr"
is supported; method = "model.frame"
returns
the model frame (the same as with model = TRUE
, see below).TRUE
the corresponding
components of the fit (the model frame, the model matrix, the
response, the QR decomposition) are returned.FALSE
(the default in S but
not in R) a singular fit is an error.contrasts.arg
of model.matrix.default
.NULL
or a numeric vector of length equal to
the number of cases. One or more offset
terms can be
included in the formula instead or as well, and if more than one are
specified their sum is used. See model.offset
.lm
returns an object of class
"lm"
or for
multiple responses of class c("mlm", "lm")
. The functions summary
and anova
are used to
obtain and print a summary and analysis of variance table of the
results. The generic accessor functions coefficients
,
effects
, fitted.values
and residuals
extract
various useful features of the value returned by lm
.
An object of class "lm"
is a list containing at least the
following components:
terms
object used.model.frame
on the special handling of NA
s.assign
,
effects
and (unless not requested) qr
relating to the linear
fit, for use by extractor functions such as summary
and
effects
.lm
with time series. Unless na.action = NULL
, the time series attributes are
stripped from the variables before the regression is done. (This is
necessary as omitting NA
s would invalidate the time series
attributes, and if NA
s are omitted in the middle of the series
the result would no longer be a regular time series.)
Even if the time series attributes are retained, they are not used to
line up series, so that the time shift of a lagged or differenced
regressor would be ignored. It is good practice to prepare a
data
argument by ts.intersect(..., dframe = TRUE)
,
then apply a suitable na.action
to that data frame and call
lm
with na.action = NULL
so that residuals and fitted
values are time series.
lm
are specified symbolically. A typical model has
the form response ~ terms
where response
is the (numeric)
response vector and terms
is a series of terms which specifies a
linear predictor for response
. A terms specification of the form
first + second
indicates all the terms in first
together
with all the terms in second
with duplicates removed. A
specification of the form first:second
indicates the set of
terms obtained by taking the interactions of all terms in first
with all terms in second
. The specification first*second
indicates the cross of first
and second
. This is
the same as first + second + first:second
. If the formula includes an offset
, this is evaluated and
subtracted from the response.
If response
is a matrix a linear model is fitted separately by
least-squares to each column of the matrix.
See model.matrix
for some further details. The terms in
the formula will be re-ordered so that main effects come first,
followed by the interactions, all second-order, all third-order and so
on: to avoid this pass a terms
object as the formula (see
aov
and demo(glm.vr)
for an example).
A formula has an implied intercept term. To remove this use either
y ~ x - 1
or y ~ 0 + x
. See formula
for
more details of allowed formulae.
Non-NULL
weights
can be used to indicate that different
observations have different variances (with the values in
weights
being inversely proportional to the variances); or
equivalently, when the elements of weights
are positive
integers $w_i$, that each response $y_i$ is the mean of
$w_i$ unit-weight observations (including the case that there are
$w_i$ observations equal to $y_i$ and the data have been
summarized).
lm
calls the lower level functions lm.fit
, etc,
see below, for the actual numerical computations. For programming
only, you may consider doing likewise.
All of weights
, subset
and offset
are evaluated
in the same way as variables in formula
, that is first in
data
and then in the environment of formula
.
Wilkinson, G. N. and Rogers, C. E. (1973) Symbolic descriptions of factorial models for analysis of variance. Applied Statistics, 22, 392--9.
summary.lm
for summaries and anova.lm
for
the ANOVA table; aov
for a different interface. The generic functions coef
, effects
,
residuals
, fitted
, vcov
.
predict.lm
(via predict
) for prediction,
including confidence and prediction intervals;
confint
for confidence intervals of parameters.
lm.influence
for regression diagnostics, and
glm
for generalized linear models.
The underlying low level functions,
lm.fit
for plain, and lm.wfit
for weighted
regression fitting.
More lm()
examples are available e.g., in
anscombe
, attitude
, freeny
,
LifeCycleSavings
, longley
,
stackloss
, swiss
.
biglm
in package
require(graphics)
## Annette Dobson (1990) "An Introduction to Generalized Linear Models".
## Page 9: Plant Weight Data.
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2, 10, 20, labels = c("Ctl","Trt"))
weight <- c(ctl, trt)
lm.D9 <- lm(weight ~ group)
lm.D90 <- lm(weight ~ group - 1) # omitting intercept
anova(lm.D9)
summary(lm.D90)
opar <- par(mfrow = c(2,2), oma = c(0, 0, 1.1, 0))
plot(lm.D9, las = 1) # Residuals, Fitted, ...
par(opar)
## model frame :
stopifnot(identical(lm(weight ~ group, method = "model.frame"),
model.frame(lm.D9)))
### less simple examples in "See Also" above
Run the code above in your browser using DataLab