plregr: Diagnostic Plots for Regr Objects

Description

Diagnostic plots for fitted regression models: Residuals versus fit (Tukey-Anscombe plot) and/or target variable versus fit; Absolute residuals versus fit to assess equality of error variances; Normal Q-Q plot (for ordinary regression models); Residuals versus leverages to identify influential observations; Residuals versus sequence (if requested); and residuals versus explanatory variables. These plots are adjusted to the type of regression model.

Usage

plregr(x, data = NULL, plotselect = NULL, xvar = TRUE,
  transformed = NULL, sequence = FALSE, weights = NULL,
  addcomp = NULL, smooth = 2, smooth.legend = FALSE, markextremes = NA,
  plargs = NULL, ploptions = NULL, assign = TRUE, ...)
plresx(x, data = NULL, xvar = TRUE, transformed = NULL,
  sequence = FALSE, weights = NULL,
  addcomp = NULL, smooth = 2, smooth.legend = FALSE, markextremes = NA,
  plargs = NULL, ploptions = NULL, assign = TRUE, ...)

Value

The list of the evaluations of all arguments and some more useful items is returned invisibly.

Arguments

x: "regr" (or also lm or glm) object, result of a call to regr() from package regr. This is the only argument needed. All others have useful defaults.
data: data set where explanatory variables and the following possible arguments are found: weights, plweights, pch, plabs
plotselect: which plots should be shown? See Details
xvar: if TRUE, residuals will be plotted versus all explanatory variables (or terms, according to argument 'transformed') in the model (plregr will call plresx).
If it is a character vector, it contains the variables to be used.
If it is a formula, its right hand side contains these variables. The model formula is updated by such a formula. Whence, the use of \~{}.+ adds variables to those in the model.
If any variables are not be contained in the model, the argument data is needed.

transformed: logical: should residuals be shown against transformed explanatory variables? If TRUE, the variables are transformed as implied by the model.
sequence: if TRUE, residuals will be plotted versus the sequence as they appear in the data. If another explanatory variable is monotone increasing or decreasing, the plot is not shown, but a warning is given.
weights: if TRUE, residuals will be plotted versus x$weights. Alternatively, a vector of weights can be specified
addcomp: logical: should component effects be added to residuals for residuals versus input variables plots?
smooth: logical: should a smooth line be added?
smooth.legend: When a grouping factor is used (argument smooth.group, see below), this argument determines whether and where the legend for identifying the groups should be shown, see Details
markextremes: proportion of extreme residuals to be labeled. If all points should be labeled, let markextremes=1.
plargs: result of calling pl.control. If NULL, pl.control will be called to generate it. If not null, arguments given in ... will be ignored.
ploptions: list of pl options.
assign: logical: Should the plargs be stored in the pl.envir environment?
...: Many further arguments are available to customize the plots, see below for some of the most useful ones, and plregr.control for a complete list.

Author

Werner A. Stahel, ETH Zurich

Details

Argument plotselect is used to determine which plots will be shown. It should be a named vector of numbers indicating

0: do not show
1: show without smooth
2: show with smooth (not for qq nor leverage)
3: show with smooth and smooth band (only for resfit in plregr and in plresx)

The default is c( yfit=0, resfit=smdef, absresfit = NA, absresweights = NA, qq = NA, leverage = 2, resmatrix = 1, qqmult = 3), where smdef is 3 (actually argument smooth of plregr.control plus 1) for normal random deviations and one less (no band) for others.

Modify this vector to change the selection and the sequence in which the plots appear. Alternatively, provide a named vector defining all plots that should be shown on a different level than the default indicates, like plotselect = c(resfit = 2, leverage = 1). Adding an element default = 0 suppresses all plots not mentioned. This is useful to select single plots, like plotselect = c(resfit = 3, default = 0)

The names of plotselect refer to:

yfit: response versus fitted values
resfit: residuals versus fitted values (Tukey-Anscombe plot)
absresfit: residuals versus fitted values, defaults to TRUE for ordinary regression, FALSE for glm and others
absresweights: residuals versus weights
qq: normal Q-Q plot, defaults to TRUE for ordinary regression, FALSE for glm and others
leverage: residuals versus leverage (hat diabgonal)
resmatrix: scatterplot matrix of residuals for multivariate regression
qqmult: qq plot for Mahlanobis lengths versus sqrt of chisquare quantiles.

In the 'resfit' (Tukey-Anscombe) plot, the reference line indicates a "contour" line with constant values of the response variable, $Y=\widehat y+r=$ constant. It has slope -1. It is useful to judge whether any curvature shown by the smooth might disappear after a nonlinear, monotone transformation of the response.

If smresid is true, the 'absresfit' plot uses modified residuals: differences between the ordinary residuals and the smooth appearing in the 'resfit' plot. Analogously, the 'qq' plot is then based on yet another modification of these modified residuals: they are scaled by the smoothed scale shown in the 'absresfit' plot, after these scales have been standardized to have a median of 0.674 (=qnorm(0.75)).

The smoothing function used by default is smoothRegr, which calls loess. This can be changed by setting ploptions(smooth.function=<func>), which must have the same arguments as smoothRegr.

The arguments lty, lwd, colors characterize how the graphical elements in the plot are shown. They should be three vectors of length 9 each, defining the line types, line widths, and colors to be used for ...

[1]: observations;
[2]: reference lines;
[3]: smooth;
[4]: simulated smooths;
[5]: component effects in plresx;
[6]: confidence bands of component effects.
[7]: (random) observations;
[8]: conditional medians;
[9]: bars showing conditional quantiles.

If smooths are shown according to groups (given in smooth.group), then a legend can be required and positioned in the respecive plots by using the argument smooth.legend. If it is TRUE, then the legend will be placed in the "bottomright" corner. Alternatively, the corner can be specified as "bottomright", "bottomleft", "topleft", or "topright". A coordinate pair may also be given. These possibilities can be used individually for each plot by giving a named vector or a named list, where the names are one of "yfit", "resfit", "absresfit", "absresweight", ".xvar." or names of x variables provided by the xvar argument. A component ".xvar." selects the first x variable.

There is an hidden argument innerrange.fit that allows for fixing an inner range for plotting the fitted values.

Examples

Run this code

data(LifeCycleSavings, package="datasets")
r.savings <- lm(sr ~ pop15 + pop75 + dpi + ddpi, data = LifeCycleSavings)
plregr(r.savings)

## --- *transformed* linear model
data(d.blast)
r.blast <-
     lm(log10(tremor) ~ location+log10(distance)+log10(charge),
          data=d.blast)
plregr(r.blast, sequence=TRUE, transformed=TRUE)
plregr(r.blast, xvar=FALSE, innerrange.fit=c(0.3,1.2))

# \donttest{
## --- multivariate regression
data(d.fossileSamples)
r.foss <-
  lm(cbind(sAngle,lLength,rWidth) ~ SST+Salinity+lChlorophyll+Region+N,
  data=d.fossileSamples)
plregr(r.foss, plotselect=c(resfit=3, resmatrix=1, qqmult=1))
# }

## --- logistic regression
data(d.babysurvival)
rr <- glm(Survival ~ Weight+Age+Apgar1, data=d.babysurvival, family=binomial)
plregr(rr, xvar= ~Weight, cex.plab=0.7, ylim=c(-5,5))
plregr(rr, condquant=FALSE)

## --- ordinal regression
if(requireNamespace("MASS")) {
data(housing, package="MASS")
rr <- MASS::polr(Sat ~ Infl + Type + Cont, weights = Freq, data = housing)
plregr(rr, factor.show="jitter")
}

Run the code above in your browser using DataLab