gof: Goodness of fit tests for binomial regression

Description

Goodness of fit tests for binomial regression

Usage

gof(x, ...)
# S3 method for glm
gof(x, ..., g = 10, plotROC = TRUE)

Arguments

A regression model with class glm and x$family$family == "binomial".

...

Additional arguments when plotting the receiver-operating curve. See:

?pROC::roc

and

?pROC::plot.roc

Number of groups (quantiles) into which to split observations for the Hosmer-Lemeshow and the modified Hosmer-Lemeshow tests.

plotROC

Plot a receiver operating curve?

Value

A list of data.tables as follows:

Contingency table.

chiSq

$\chi^2$ tests of the significance of the model. The tests are:

PrI	test of the Pearsons residuals, calculated by individual
drI	test of the deviance residuals, calculated by individual
PrG	test of the Pearsons residuals, calculated by covariate group
drG	test of the deviance residuals, calculated by covariate group
PrCT	test of the Pearsons residuals, calculated from the contingency table

ctHL

Contingency table for the Hosmer-Lemeshow test.

gof

Goodness-of-fit tests. These are:

HL Hosmer-Lemeshow's $C$ statistic.
mHL The modified Hosmer-Lemeshow test.
OsRo Osius and Rojek's test of the link function.

S Stukel's tests:

SstPgeq0.5	score test for addition of vector $z1$
SstPl0.5	score test for addition of vector $z2$
SstBoth	score test for addition of vector $z2$
SllPgeq0.5	log-likelihood test for addition of vector $z1$
SllPl0.5	log-likelihood test for addition of vector $z2$

R-squared like tests:

ssI	sum-of-squares, by individual
ssG	sum-of-squares, by covariate group
llI	log-likelihood, by individual

auc

Area under the receiver-operating curve (ROC) with 95 % CIs.

Additionally, if plotROC=TRUE, a plot of the ROC.

Details

Details of the elements in the returned list follow below:

ct:

A contingency table, similar to the output of dx.

The following are given per covariate group:

n	number of observations
y1hat	predicted number of observations with $y=1$
y1	actual number of observations with $y=1$
y0hat	predicted number of observations with $y=0$

chiSq:

$P \chi^2$ tests of the significance of the model.

Pearsons test and the deviance $D$ test are given.

These are calculated by indididual I, by covariate group G and also from the contingency table CT above. They are calculated as: $$P \chi^2 = \sum_{i=1}^n Pr_i^2$$ and $$D = \sum_{i=1}^n dr_i^2$$ The statistics should follow a $\chi^2$ distribution with $n - p$ degrees of freedom.

Here, $n$ is the number of observations (taken individually or by covariate group) and $p$ is the number pf predictors in the model.

A high $p$ value for the test suggests that the model is a poor fit.

The assumption of a $\chi^2$ distribution is most valid when observations are considered by group.

The statistics from the contingency table should be similar to those obtained when caluclated by group.

ctHL:

The contingency table for the Hosmer-Lemeshow test.

The observations are ordered by probability, then grouped into g groups of approximately equal size.

The columns are:

P	the probability
y1	the actual number of observations with $y=1$
y1hat	the predicted number of observations with $y=1$
y0	the actual number of observations with $y=0$
y0hat	the predicted number of observations with $y=0$
n	the number of observations

gof:

All of these tests rely on assessing the effect of adding an additional variable to the model.

Thus a low $p$ value for any of these tests implies that the model is a poor fit.

Hosmer and Lemeshow tests

Hosmer and Lemeshows $C$ statistic is based on: $y_k$, the number of observations where $y=1$, $n_k$, the number of observations and $\bar{P_k}$, the average probability in group $k$: $$\bar{P_k} = \sum_{i=1}^{i=n_k} \frac{n_iP_i}{n_k}, \quad k=1,2, \ldots ,g$$ The statistic is: $$C = \sum_{k=1}^g \frac{(y_k - n_k \bar{P_k})^2}{ n_k \bar{P_k} (1 - \bar{P_k})}$$ This should follow a $\chi^2$ distribution with g - 2 degrees of freedom.

The modified Hosmer and Lemeshow test is assesses the change in model deviance $D$ when G is added as a predictor. That is, a linear model is fit as: $$dr_i \sim G, \quad dr_i \equiv \mathrm{deviance residual}$$ and the effect of adding $G$ assessed with anova(lm(dr ~ G)).

Osius and Rojek's tests

These are based on a power-divergence statistic $PD_{\lambda}$ ($\lambda = 1$ for Pearsons test) and the standard deviation (herein, of a binomial distribution) $\sigma$. The statistic is: $$Z_{OR} = \frac{PD_{\lambda} - \mu_{\lambda}}{\sigma_{\lambda}}$$

For logistic regression, it is calculated as: $$Z_{OR} = \frac{P \chi^2 - (n - p)}{\sqrt{2(n - \sum_{i=1}^n \frac{1}{n_i}) + RSS}}$$ where $RSS$ is the residual sum-of-squares from a weighted linear regression: $$ \frac{1 - 2P_i}{\sigma_i} \sim X, \quad \mathrm{weights=} \sigma_i$$ Here $\bold{X}$ is the matrix of model predictors.

A two-tailed test against a standard normal distribution $N ~ (0, 1)$ should not be significant.

Stukels tests

These are based on the addition of the vectors: $$z_1 = \mathrm{Pgeq0.5} = \mathrm{sign}(P_i \geq 0.5)$$ and / or $$z_2 = \mathrm{Pl0.5} = \mathrm{sign}(P_i < 0.5)$$ to the existing model predictors.

The model fit is compared to the original using the score (e.g. SstPgeq0.5) and likelihood-ratio (e.g. SllPl0.5) tests. These models should not be a significantly better fit to the data.

R2:

Pseudo-$R^2$ comparisons of the predicted values from the fitted model vs. an intercept-only model.

sum-of-squares

The sum-of-squres (linear-regression) measure based on the squared Pearson correlation coefficient by individual is based on the mean probability: $$\bar{P} = \frac{\sum n_i}{n}$$ and is given by: $$R^2_{ssI} = 1 - \frac{\sum (y_i - P_i)^2}{\sum (y_i - \bar{P})^2}$$ The same measure, by covariate group, is: $$R^2_{ssG} = 1 - \frac{\sum (y_i - n_iP_i)^2}{\sum (y_i - n_i\bar{P})^2}$$

log-likelihood

The log-likelihood based $R^2$ measure per individual is based on:

$ll_0$, the log-likelihood of the intercept-only model
$ll_p$, the log-likelihood of the model with $p$ covariates

It is calculated as $$R^2_{llI} = \frac{ll_0 - ll_p}{ll_0} = 1 - \frac{ll_p}{ll_0}$$ This measure per covariate group is based on $ll_s$, the log-likelihood for the saturated model, which is calculated from the model deviance $D$: $$ll_s = ll_p - \frac{D}{2}$$ It is cacluated as: $$R^2_{llG} = \frac{ll_0 - ll_p}{ll_0 - ll_s}$$

auc:

The area under the receiver-operating curve.

This may broadly be interpreted as:

auc	Discrimination
$\mathrm{auc} = 0.5$	useless
$0.7 \leq \mathrm{auc} < 0.8$	acceptable
$0.8 \leq \mathrm{auc} < 0.9$	excellent

$\mathrm{auc} \geq 0.9$ occurs rarely as this reuqires almost complete separation/ perfect classification.

References

Osius G & Rojek D, 1992. Normal goodness-of-fit tests for multinomial models with large degrees of freedom. Journal of the American Statistical Association. 87(420):1145-52. 10.1080/01621459.1992.10476271. Also available at JSTOR at https://www.jstor.org/stable/2290653

Hosmer D, Hosmer T, Le Cessie S & Lemeshow S (1997). A comparison of goodness-of-fit tests for the logistic regression model. Statistics in Medicine. 16(9):965-80. 10.1002/(SICI)1097-0258(19970515)16:9<965::AID-SIM509>3.0.CO;2-O

Mittlboch M, Schemper M (1996). Explained variation for logistic regression. Statistics in Medicine. 15(19):1987-97. 10.1002/(SICI)1097-0258(19961015)15:19<1987::AID-SIM318>3.0.CO;2-9 Also available from CiteSeerX / Penn State University (free).

Examples

Run this code

# NOT RUN {
## H&L 2nd ed. Sections 5.2.2, 5.2.4, 5.2.5. Pages 147-167.
# }
# NOT RUN {
data(uis)
uis$NDRGFP1 <- 10 / (uis$NDRGTX + 1)
uis$NDRGFP2 <- uis$NDRGFP1 * log((uis$NDRGTX + 1) / 10)
g1 <- glm(DFREE ~ AGE + NDRGFP1 + NDRGFP2 + IVHX +
              RACE + TREAT + SITE +
              AGE:NDRGFP1 + RACE:SITE,
              family=binomial, data=uis)
gof(g1, plotROC=FALSE)
unclass(g1)
attributes(g1$gof)
# }

Run the code above in your browser using DataLab

SstPgeq0.5	score test for addition of vector \(z1\)
SstPl0.5	score test for addition of vector \(z2\)
SstBoth	score test for addition of vector \(z2\)
SllPgeq0.5	log-likelihood test for addition of vector \(z1\)
SllPl0.5	log-likelihood test for addition of vector \(z2\)

n	number of observations
y1hat	predicted number of observations with \(y=1\)
y1	actual number of observations with \(y=1\)
y0hat	predicted number of observations with \(y=0\)

P	the probability
y1	the actual number of observations with \(y=1\)
y1hat	the predicted number of observations with \(y=1\)
y0	the actual number of observations with \(y=0\)
y0hat	the predicted number of observations with \(y=0\)
n	the number of observations

auc	Discrimination
\(\mathrm{auc} = 0.5\)	useless
\(0.7 \leq \mathrm{auc} < 0.8\)	acceptable
\(0.8 \leq \mathrm{auc} < 0.9\)	excellent