Returns diagnostic measures for a binary regression model by covariate pattern
dx(x, ...)# S3 method for glm
dx(x, ..., byCov = TRUE)
A regression model with class glm
and
x$family$family == "binomial"
.
Additional arguments which can be passed to:
?stats::model.matrix
e.g. contrasts.arg
which can be used for factor
coding.
Return values by covariate pattern, rather than by individual observation.
A data.table
, with rows sorted by \(\Delta \hat{\beta}_i\).
If byCov==TRUE
, there is one row per covariate pattern
with at least one observation.
The initial columns give the predictor variables \(1 \ldots p\).
Subsequent columns are labelled as follows:
The actual number of observations with \(y=1\) in the model data.
Probability of this covariate pattern.
This is given by the inverse of the link function,
x$family$linkinv
. See:
?stats::family
Number of observations with these covariates.
If byCov=FALSE
then this will be \(=1\) for all observations.
The predicted number of observations having a response of \(y=1\), according to the model. This is: $$\hat{y_i} = n_i P_i$$
Leverage, the diagonal of the hat matrix used to
generate the model:
$$H = \sqrt{V} X (X^T V X)^{-1} X^T \sqrt{V}$$
Here \(^{-1}\) is the inverse and
\(^T\) is the transpose of a matrix.
\(X\) is the matrix of predictors, given by stats::model.matrix
.
\(V\) is an \(N \times N\) sparse matrix. All elements are
\(=0\) except for the diagonal, which is:
$$v_{ii} = n_iP_i (1 - P_i)$$
Leverage \(H\) is also the estimated covariance matrix of
\(\hat{\beta}\).
Leverage is measure of the influence of this
covariate pattern on the model and is approximately
$$h_i \approx x_i - \bar{x} \quad \mathrm{for} \quad 0.1 < P_i < 0.9$$
That is, leverage is approximately equal to the distance of
the covariate pattern \(i\) from the mean \(\bar{x}\).
For values of \(p\) which are large (\(>0.9\)) or
small (\(<0.1\)) this relationship no longer holds.
The Pearson residual, a measure of influence. This is: $$Pr_i = \frac{y_i - \mu_y}{\sigma_y}$$ where \(\mu_y\) and \(\sigma_y\) refer to the mean and standard deviation of a binomial distribution. \(\sigma^2_y = Var_y\), is the variance. $$E(y=1) = \mu_y = \hat{y} = nP \quad \mathrm{and} \quad \sigma_y=\sqrt{nP(1 - P)}$$ Thus: $$Pr_i = \frac{y_i - n_i P_i}{\sqrt{n_i P_i (1 - P_i)}}$$
The deviance residual, a measure of influence: $$dr_i = \mathrm{sign}(y_i - \hat{y}_i) \sqrt{d_i}$$ \(d_i\) is the contribution of observation \(i\) to the model deviance. The \(\mathrm{sign}\) above is:
\(y_i > \hat{y}_i \quad \rightarrow \mathrm{sign}(i)=1\)
\(y_i = \hat{y}_i \quad \rightarrow \mathrm{sign}(i)=0\)
\(y_i < \hat{y}_i \quad \rightarrow \mathrm{sign}(i)=-1\)
The standardized Pearson residual. The residual is standardized by the leverage \(h_i\): $$sPr_i = \frac{Pr_i}{\sqrt{(1 - h_i)}}$$
The standardized deviance residual. The residual is standardized by the leverage, as above: $$sdr_i = \frac{dr_i}{\sqrt{(1 - h_i)}}$$
The change in the Pearson chi-square statistic with observation \(i\) removed. Given by: $$\Delta P\chi^2_i = sPr_i^2 = \frac{Pr_i^2}{1 - h_i}$$ where \(sPr_i\) is the standardized Pearson residual, \(Pr_i\) is the Pearson residual and \(h_i\) is the leverage. \(\Delta P\chi^2_i\) should be \(<4\) if the observation has little influence on the model.
The change in the deviance statistic \(D = \sum_{i=1}^n dr_i\) with observation \(i\) excluded. It is scaled by the leverage \(h_i\) as above: $$\Delta D_i = sdr_i^2 = \frac{dr_i^2}{1 - h_i}$$
The change in \(\hat{\beta}\) with observation \(i\) excluded. This is scaled by the leverage as above: $$\Delta \hat{\beta} = \frac{sPr_i^2 h_i}{1 - h_i}$$ where \(sPr_i\) is the standardized Pearson residual. \(\Delta \hat{\beta}_i\) should be \(<1\) if the observation has little influence on the model coefficients.
# NOT RUN {
## H&L 2nd ed. Table 5.8. Page 182.
## Pattern nos. 31, 477, 468
data(uis)
uis <- within(uis, {
NDRGFP1 <- 10 / (NDRGTX + 1)
NDRGFP2 <- NDRGFP1 * log((NDRGTX + 1) / 10)
})
(d1 <- dx(g1 <- glm(DFREE ~ AGE + NDRGFP1 + NDRGFP2 + IVHX +
RACE + TREAT + SITE +
AGE:NDRGFP1 + RACE:SITE,
family=binomial, data=uis)))
d1[519:521, ]
# }
Run the code above in your browser using DataLab