Goodness of fit tests for binomial regression
gof(x, ...)# S3 method for glm
gof(x, ..., g = 10, plotROC = TRUE)
A regression model with class glm
and
x$family$family == "binomial"
.
Additional arguments when plotting the receiver-operating curve. See:
?pROC::roc
and
?pROC::plot.roc
Number of groups (quantiles) into which to split observations for the Hosmer-Lemeshow and the modified Hosmer-Lemeshow tests.
Plot a receiver operating curve?
A list
of data.table
s as follows:
Contingency table.
\(\chi^2\) tests of the significance of the model. The tests are:
PrI | test of the Pearsons residuals, calculated by individual |
drI | test of the deviance residuals, calculated by individual |
PrG | test of the Pearsons residuals, calculated by covariate group |
drG | test of the deviance residuals, calculated by covariate group |
PrCT | test of the Pearsons residuals, calculated from the contingency table |
Contingency table for the Hosmer-Lemeshow test.
Goodness-of-fit tests. These are:
HL Hosmer-Lemeshow's \(C\) statistic.
mHL The modified Hosmer-Lemeshow test.
OsRo Osius and Rojek's test of the link function.
S Stukel's tests:
SstPgeq0.5 | score test for addition of vector \(z1\) |
SstPl0.5 | score test for addition of vector \(z2\) |
SstBoth | score test for addition of vector \(z2\) |
SllPgeq0.5 | log-likelihood test for addition of vector \(z1\) |
SllPl0.5 | log-likelihood test for addition of vector \(z2\) |
R-squared like tests:
ssI | sum-of-squares, by individual |
ssG | sum-of-squares, by covariate group |
llI | log-likelihood, by individual |
Area under the receiver-operating curve (ROC) with 95 % CIs.
Details of the elements in the returned list
follow below:
ct:
A contingency table, similar to the output of dx
.
The following are given per covariate group:
n | number of observations |
y1hat | predicted number of observations with \(y=1\) |
y1 | actual number of observations with \(y=1\) |
y0hat | predicted number of observations with \(y=0\) |
chiSq:
\(P \chi^2\) tests of the significance of the model.
Pearsons test and the deviance \(D\) test are given.
These are calculated by indididual I
, by covariate group G
and also from the contingency table CT
above.
They are calculated as:
$$P \chi^2 = \sum_{i=1}^n Pr_i^2$$
and
$$D = \sum_{i=1}^n dr_i^2$$
The statistics should follow a
\(\chi^2\) distribution with
\(n - p\) degrees of freedom.
Here, \(n\) is the number of observations (taken individually or by covariate group) and \(p\) is the number pf predictors in the model.
A high \(p\) value for the test suggests that the model is a poor fit.
The assumption of a \(\chi^2\) distribution is most valid when observations are considered by group.
The statistics from the contingency table should be similar to those obtained when caluclated by group.
ctHL:
The contingency table for the Hosmer-Lemeshow test.
The observations are ordered by probability, then
grouped into g
groups of approximately equal size.
The columns are:
P | the probability |
y1 | the actual number of observations with \(y=1\) |
y1hat | the predicted number of observations with \(y=1\) |
y0 | the actual number of observations with \(y=0\) |
y0hat | the predicted number of observations with \(y=0\) |
n | the number of observations |
gof:
All of these tests rely on assessing the effect of adding an additional variable to the model.
Thus a low \(p\) value for any of these tests implies that the model is a poor fit.
Hosmer and Lemeshows \(C\) statistic is based on:
\(y_k\), the number of observations
where \(y=1\),
\(n_k\), the number of observations and
\(\bar{P_k}\), the average probability
in group \(k\):
$$\bar{P_k} = \sum_{i=1}^{i=n_k} \frac{n_iP_i}{n_k}, \quad k=1,2, \ldots ,g$$
The statistic is:
$$C = \sum_{k=1}^g \frac{(y_k - n_k \bar{P_k})^2}{
n_k \bar{P_k} (1 - \bar{P_k})}$$
This should follow a \(\chi^2\) distribution
with g - 2
degrees of freedom.
The modified Hosmer and Lemeshow test is assesses the
change in model deviance \(D\) when G
is added as a predictor.
That is, a linear model is fit as:
$$dr_i \sim G, \quad dr_i \equiv \mathrm{deviance residual}$$
and the effect of adding \(G\) assessed with anova(lm(dr ~ G))
.
These are based on a power-divergence statistic \(PD_{\lambda}\) (\(\lambda = 1\) for Pearsons test) and the standard deviation (herein, of a binomial distribution) \(\sigma\). The statistic is: $$Z_{OR} = \frac{PD_{\lambda} - \mu_{\lambda}}{\sigma_{\lambda}}$$
For logistic regression, it is calculated as: $$Z_{OR} = \frac{P \chi^2 - (n - p)}{\sqrt{2(n - \sum_{i=1}^n \frac{1}{n_i}) + RSS}}$$ where \(RSS\) is the residual sum-of-squares from a weighted linear regression: $$ \frac{1 - 2P_i}{\sigma_i} \sim X, \quad \mathrm{weights=} \sigma_i$$ Here \(\bold{X}\) is the matrix of model predictors.
A two-tailed test against a standard normal distribution \(N ~ (0, 1)\) should not be significant.
These are based on the addition of the vectors: $$z_1 = \mathrm{Pgeq0.5} = \mathrm{sign}(P_i \geq 0.5)$$ and / or $$z_2 = \mathrm{Pl0.5} = \mathrm{sign}(P_i < 0.5)$$ to the existing model predictors.
The model fit is compared to the original using the
score (e.g. SstPgeq0.5
) and likelihood-ratio (e.g. SllPl0.5
) tests.
These models should not be a significantly better fit to the data.
R2:
Pseudo-\(R^2\) comparisons of the predicted values from the fitted model vs. an intercept-only model.
The sum-of-squres (linear-regression) measure based on the squared Pearson correlation coefficient by individual is based on the mean probability: $$\bar{P} = \frac{\sum n_i}{n}$$ and is given by: $$R^2_{ssI} = 1 - \frac{\sum (y_i - P_i)^2}{\sum (y_i - \bar{P})^2}$$ The same measure, by covariate group, is: $$R^2_{ssG} = 1 - \frac{\sum (y_i - n_iP_i)^2}{\sum (y_i - n_i\bar{P})^2}$$
The log-likelihood based \(R^2\) measure per individual is based on:
\(ll_0\), the log-likelihood of the intercept-only model
\(ll_p\), the log-likelihood of the model with \(p\) covariates
auc:
The area under the receiver-operating curve.
This may broadly be interpreted as:
auc | Discrimination |
\(\mathrm{auc} = 0.5\) | useless |
\(0.7 \leq \mathrm{auc} < 0.8\) | acceptable |
\(0.8 \leq \mathrm{auc} < 0.9\) | excellent |
\(\mathrm{auc} \geq 0.9\) occurs rarely as this reuqires almost complete separation/ perfect classification.
Osius G & Rojek D, 1992. Normal goodness-of-fit tests for multinomial models with large degrees of freedom. Journal of the American Statistical Association. 87(420):1145-52. 10.1080/01621459.1992.10476271. Also available at JSTOR at https://www.jstor.org/stable/2290653
Hosmer D, Hosmer T, Le Cessie S & Lemeshow S (1997). A comparison of goodness-of-fit tests for the logistic regression model. Statistics in Medicine. 16(9):965-80. 10.1002/(SICI)1097-0258(19970515)16:9<965::AID-SIM509>3.0.CO;2-O
Mittlboch M, Schemper M (1996). Explained variation for logistic regression. Statistics in Medicine. 15(19):1987-97. 10.1002/(SICI)1097-0258(19961015)15:19<1987::AID-SIM318>3.0.CO;2-9 Also available from CiteSeerX / Penn State University (free).
# NOT RUN {
## H&L 2nd ed. Sections 5.2.2, 5.2.4, 5.2.5. Pages 147-167.
# }
# NOT RUN {
data(uis)
uis$NDRGFP1 <- 10 / (uis$NDRGTX + 1)
uis$NDRGFP2 <- uis$NDRGFP1 * log((uis$NDRGTX + 1) / 10)
g1 <- glm(DFREE ~ AGE + NDRGFP1 + NDRGFP2 + IVHX +
RACE + TREAT + SITE +
AGE:NDRGFP1 + RACE:SITE,
family=binomial, data=uis)
gof(g1, plotROC=FALSE)
unclass(g1)
attributes(g1$gof)
# }
Run the code above in your browser using DataLab