Diagnostic plots for fitted regression models: Residuals versus fit (Tukey-Anscombe plot) and/or target variable versus fit; Absolute residuals versus fit to assess equality of error variances; Normal Q-Q plot (for ordinary regression models); Residuals versus leverages to identify influential observations; Residuals versus sequence (if requested); and residuals versus explanatory variables. These plots are adjusted to the type of regression model.
plregr(x, data = NULL, plotselect = NULL, xvar = TRUE,
transformed = NULL, sequence = FALSE, weights = NULL,
addcomp = NULL, smooth = 2, smooth.legend = FALSE, markextremes = NA,
plargs = NULL, ploptions = NULL, assign = TRUE, ...)plresx(x, data = NULL, xvar = TRUE, transformed = NULL,
sequence = FALSE, weights = NULL,
addcomp = NULL, smooth = 2, smooth.legend = FALSE, markextremes = NA,
plargs = NULL, ploptions = NULL, assign = TRUE, ...)
The list of the evaluations of all arguments and some more useful items is returned invisibly.
"regr"
(or also lm
or glm
)
object, result of a call to regr()
from package regr.
This is the only argument needed. All others have useful defaults.
data set where explanatory variables and the following
possible arguments are found: weights, plweights, pch, plabs
which plots should be shown? See Details
if TRUE, residuals will be plotted versus all
explanatory variables (or terms, according to argument 'transformed')
in the model (plregr
will call plresx
).
If it is a character vector, it contains the variables to be used.
If it is a formula, its right hand side contains these variables.
The model formula is updated by such a formula.
Whence, the use of \~{}.+
adds variables to those in the
model.
If any variables are not be contained in the model, the argument
data
is needed.
logical: should residuals be shown against
transformed explanatory variables? If TRUE
, the variables are
transformed as implied by the model.
if TRUE, residuals will be plotted versus the sequence as they appear in the data. If another explanatory variable is monotone increasing or decreasing, the plot is not shown, but a warning is given.
if TRUE, residuals will be plotted versus
x$weights
. Alternatively, a vector of weights can be specified
logical: should component effects be added to residuals for residuals versus input variables plots?
logical: should a smooth line be added?
When a grouping factor is used
(argument smooth.group
, see below),
this argument determines whether and where the legend
for identifying the groups should be shown, see Details
proportion of extreme residuals to be labeled.
If all points should be labeled, let markextremes=1
.
result of calling pl.control
.
If NULL
, pl.control
will be called to generate it.
If not null, arguments given in ...
will be ignored.
list of pl options.
logical: Should the plargs be stored
in the pl.envir
environment?
Many further arguments are available to customize the plots,
see below for some of the most useful ones, and
plregr.control
for a complete list.
Werner A. Stahel, ETH Zurich
Argument plotselect
is used to determine which plots will be
shown. It should be a named vector of numbers indicating
do not show
show without smooth
show with smooth (not for qq
nor leverage
)
show with smooth and smooth band (only for resfit
in plregr
and in plresx
)
The default is
c( yfit=0, resfit=smdef, absresfit = NA, absresweights = NA, qq = NA,
leverage = 2, resmatrix = 1, qqmult = 3)
, where
smdef
is 3 (actually argument smooth
of
plregr.control
plus 1) for normal random deviations and
one less (no band) for others.
Modify this vector to change the selection and the sequence in
which the plots appear.
Alternatively, provide a named vector defining all plots that should
be shown on a different level than the default indicates,
like plotselect = c(resfit = 2, leverage = 1)
.
Adding an element default = 0
suppresses all plots not
mentioned. This is useful to select single plots, like
plotselect = c(resfit = 3, default = 0)
The names of plotselect
refer to:
response versus fitted values
residuals versus fitted values (Tukey-Anscombe plot)
residuals versus fitted values, defaults to TRUE for ordinary regression, FALSE for glm and others
residuals versus weights
normal Q-Q plot, defaults to TRUE for ordinary regression, FALSE for glm and others
residuals versus leverage (hat diabgonal)
scatterplot matrix of residuals for multivariate regression
qq plot for Mahlanobis lengths versus sqrt of chisquare quantiles.
In the 'resfit' (Tukey-Anscombe) plot, the reference line indicates
a "contour" line with constant values of the response variable,
\(Y=\widehat y+r=\) constant. It has slope -1
.
It is useful to judge whether any curvature shown by the smooth
might disappear after a nonlinear, monotone transformation of the
response.
If smresid
is true, the 'absresfit' plot uses modified
residuals: differences between the ordinary residuals and the smooth
appearing in the 'resfit' plot.
Analogously, the 'qq' plot is then based on yet another modification
of these modified residuals: they are scaled by the smoothed scale
shown in the 'absresfit' plot, after these scales have been
standardized to have a median of 0.674 (=qnorm(0.75)
).
The smoothing function used by default is smoothRegr
,
which calls loess
. This can be changed by setting
ploptions(smooth.function=<func>)
, which must have the same
arguments as smoothRegr
.
The arguments lty, lwd, colors
characterize how the graphical
elements in the plot are shown.
They should be three vectors of length 9 each, defining the
line types, line widths, and colors to be used for ...
observations;
reference lines;
smooth;
simulated smooths;
component effects in plresx;
confidence bands of component effects.
In the case of glm.restype="cond.quant"
(random) observations;
conditional medians;
bars showing conditional quantiles.
If smooths are shown according to groups (given in
smooth.group
), then a legend can be required and positioned
in the respecive plots by using the argument smooth.legend
.
If it is TRUE
, then the legend will be placed in the
"bottomright"
corner.
Alternatively, the corner can be specified as
"bottomright", "bottomleft", "topleft", or "topright".
A coordinate pair may also be given.
These possibilities can be used individually for each plot by
giving a named vector or a named list, where the names are
one of "yfit", "resfit", "absresfit", "absresweight", ".xvar." or
names of x variables provided by the xvar
argument.
A component ".xvar." selects the first x variable.
There is an hidden argument innerrange.fit
that allows
for fixing an inner range for plotting the fitted values.
plregr.control, plot.lm
data(LifeCycleSavings, package="datasets")
r.savings <- lm(sr ~ pop15 + pop75 + dpi + ddpi, data = LifeCycleSavings)
plregr(r.savings)
## --- *transformed* linear model
data(d.blast)
r.blast <-
lm(log10(tremor) ~ location+log10(distance)+log10(charge),
data=d.blast)
plregr(r.blast, sequence=TRUE, transformed=TRUE)
plregr(r.blast, xvar=FALSE, innerrange.fit=c(0.3,1.2))
# \donttest{
## --- multivariate regression
data(d.fossileSamples)
r.foss <-
lm(cbind(sAngle,lLength,rWidth) ~ SST+Salinity+lChlorophyll+Region+N,
data=d.fossileSamples)
plregr(r.foss, plotselect=c(resfit=3, resmatrix=1, qqmult=1))
# }
## --- logistic regression
data(d.babysurvival)
rr <- glm(Survival ~ Weight+Age+Apgar1, data=d.babysurvival, family=binomial)
plregr(rr, xvar= ~Weight, cex.plab=0.7, ylim=c(-5,5))
plregr(rr, condquant=FALSE)
## --- ordinal regression
if(requireNamespace("MASS")) {
data(housing, package="MASS")
rr <- MASS::polr(Sat ~ Infl + Type + Cont, weights = Freq, data = housing)
plregr(rr, factor.show="jitter")
}
Run the code above in your browser using DataLab