Fits the usual weighted or unweighted linear regression model using the
same fitting routines used by lm
, but also storing the variance-covariance
matrix var
and using traditional dummy-variable coding for categorical
factors.
Also fits unweighted models using penalized least squares, with the same
penalization options as in the lrm
function. For penalized estimation,
there is a fitter function call lm.pfit
.
ols(formula, data=environment(formula), weights, subset, na.action=na.delete,
method="qr", model=FALSE,
x=FALSE, y=FALSE, se.fit=FALSE, linear.predictors=TRUE,
penalty=0, penalty.matrix, tol=.Machine$double.eps, sigma,
var.penalty=c('simple','sandwich'), ...)
the same objects returned from lm
(unless penalty
or penalty.matrix
are given - then an
abbreviated list is returned since lm.pfit
is used as a fitter)
plus the design attributes
(see rms
).
Predicted values are always returned, in the element linear.predictors
.
The vectors or matrix stored if y=TRUE
or x=TRUE
have rows deleted according to subset
and
to missing data, and have names or row names that come from the
data frame used as input data. If penalty
or penalty.matrix
is given,
the var
matrix
returned is an improved variance-covariance matrix
for the penalized regression coefficient estimates. If
var.penalty="sandwich"
(not the default, as limited simulation
studies have found it provides variance estimates that are too low) it
is defined as
penalty factors * penalty.matrix
, with a column and row of zeros
added for the
intercept. When var.penalty="simple"
(the default), var
is
stats
with named elements
n, Model L.R., d.f., R2, g, Sigma
. Model L.R.
is the model
likelihood ratio R2
is
d.f.
is the
effective degrees of freedom, which is the sum of the elements of another
vector returned, effective.df.diagonal
, minus one for the
intercept.
g
is the Sigma
is the penalized maximum likelihood estimate (see below).
an S formula object, e.g.
Y ~ rcs(x1,5)*lsp(x2,c(10,20))
name of an S data frame containing all needed variables. Omit this to use a data frame already in the S ``search list''.
an optional vector of weights to be used in the fitting
process. If specified, weighted least squares is used with
weights weights
(that is, minimizing
an expression defining a subset of the observations to use in the fit. The default
is to use all observations. Specify for example age>50 & sex="male"
or
c(1:100,200:300)
respectively to use the observations satisfying a logical expression or those having
row numbers in the given vector.
specifies an S function to handle missing data. The default is the function na.delete
,
which causes observations with any variable missing to be deleted. The main difference
between na.delete
and the S-supplied function na.omit
is that
na.delete
makes a list
of the number of observations that are missing on each variable in the model.
The na.action
is usally specified by e.g. options(na.action="na.delete")
.
specifies a particular fitting method, or "model.frame"
instead to return the model frame
of the predictor and response variables satisfying any subset or missing value
checks.
default is FALSE
. Set to TRUE
to return the model frame
as element model
of the fit object.
default is FALSE
. Set to TRUE
to return the expanded design matrix as element x
(without intercept indicators) of the
returned fit object. Set both x=TRUE
if you are going to use
the residuals
function later to return anything other than ordinary residuals.
default is FALSE
. Set to TRUE
to return the vector of response values
as element y
of the fit.
default is FALSE
. Set to TRUE
to compute the estimated standard errors of
the estimate of se.fit
of the fit.
set to FALSE
to cause predicted values not to be stored
see lrm
see lrm
tolerance for information matrix singularity
If sigma
is given, it is taken as the actual root mean squared error parameter for the model. Otherwise sigma
is estimated from the data using the usual formulas (except for penalized models). It is often convenient to specify
sigma=1
for models with no error, when using fastbw
to find an
approximate model that predicts predicted values from the full model with
a given accuracy.
the type of variance-covariance matrix to be stored in the var
component of the fit when penalization is used. The default is the
inverse of the penalized information matrix. Specify
var.penalty="sandwich"
to use the sandwich estimator (see below
under var
), which limited simulation studies have shown yields
variances estimates that are too low.
Frank Harrell
Department of Biostatistics, Vanderbilt University
fh@fharrell.com
For penalized estimation, the penalty factor on the log likelihood is
sse
is the sum of squared errors (residuals).
The effective.df.diagonal
vector is the
diagonal of the matrix
rms
, rms.trans
, anova.rms
,
summary.rms
, predict.rms
,
fastbw
, validate
, calibrate
,
Predict
, specs.rms
, cph
,
lrm
, which.influence
, lm
,
summary.lm
, print.ols
,
residuals.ols
, latex.ols
,
na.delete
, na.detail.response
,
datadist
, pentrace
, vif
,
abs.error.pred
set.seed(1)
x1 <- runif(200)
x2 <- sample(0:3, 200, TRUE)
distance <- (x1 + x2/3 + rnorm(200))^2
d <- datadist(x1,x2)
options(datadist="d") # No d -> no summary, plot without giving all details
f <- ols(sqrt(distance) ~ rcs(x1,4) + scored(x2), x=TRUE)
# could use d <- datadist(f); options(datadist="d") at this point,
# but predictor summaries would not be stored in the fit object for
# use with Predict, summary.rms. In that case, the original
# dataset or d would need to be accessed later, or all variable values
# would have to be specified to summary, plot
anova(f)
which.influence(f)
summary(f)
summary.lm(f) # will only work if penalty and penalty.matrix not used
# Fit a complex model and approximate it with a simple one
x1 <- runif(200)
x2 <- runif(200)
x3 <- runif(200)
x4 <- runif(200)
y <- x1 + x2 + rnorm(200)
f <- ols(y ~ rcs(x1,4) + x2 + x3 + x4)
pred <- fitted(f) # or predict(f) or f$linear.predictors
f2 <- ols(pred ~ rcs(x1,4) + x2 + x3 + x4, sigma=1)
# sigma=1 prevents numerical problems resulting from R2=1
fastbw(f2, aics=100000)
# This will find the best 1-variable model, best 2-variable model, etc.
# in predicting the predicted values from the original model
options(datadist=NULL)
Run the code above in your browser using DataLab