PseudoR2: Pseudo R2 Statistics

Description

Although there's no commonly accepted agreement on how to assess the fit of a logistic regression, there are some approaches. The goodness of fit of the logistic regression model can be expressed by some variants of pseudo R squared statistics, most of which being based on the deviance of the model.

Usage

PseudoR2(x, which = NULL)

Arguments

the glm, polr or multinom model object to be evaluated.

which

character, one out of "McFadden", "McFaddenAdj", "CoxSnell", "Nagelkerke", "AldrichNelson", "VeallZimmermann", "Efron", "McKelveyZavoina", "Tjur", "all". Partial matching is supported.

Value

the value of the specific statistic. AIC, LogLik, LogLikNull and G2 will only be reported with option "all".

McFadden

McFadden pseudo-\(R^2\)

McFaddenAdj

McFadden adjusted pseudo-\(R^2\)

CoxSnell

Cox and Snell pseudo-\(R^2\) (also known as ML pseudo-\(R^2\))

Nagelkerke

Nagelkerke pseudo\(R^2\) (also known as CraggUhler \(R^2\))

AldrichNelson

AldrichNelson pseudo-\(R^2\)

VeallZimmermann

VeallZimmermann pseudo-\(R^2\)

McKelveyZavoina

McKelvey and Zavoina pseudo-\(R^2\)

Efron

Efron pseudo-\(R^2\)

Tjur

Tjur's pseudo-\(R^2\)

AIC

Akaike's information criterion

LogLik

log-Likelihood for the fitted model (by maximum likelihood)

LogLikNull

log-Likelihood for the null model. The null model will include the offset, and an intercept if there is one in the model.

differenz of the null deviance - model deviance

Details

Cox and Snell's \(R^2\) is based on the log likelihood for the model compared to the log likelihood for a baseline model. However, with categorical outcomes, it has a theoretical maximum value of less than 1, even for a "perfect" model.

Nagelkerke's \(R^2\) (also sometimes called Cragg-Uhler) is an adjusted version of the Cox and Snell's \(R^2\) that adjusts the scale of the statistic to cover the full range from 0 to 1.

McFadden's \(R^2\) is another version, based on the log-likelihood kernels for the intercept-only model and the full estimated model.

Veall and Zimmermann concluded that from a set of six widely used measures the measure suggested by McKelvey and Zavoina had the closest correspondance to ordinary least square R2. The Aldrich-Nelson pseudo-R2 with the Veall-Zimmermann correction is the best approximation of the McKelvey-Zavoina pseudo-R2. Efron, Aldrich-Nelson, McFadden and Nagelkerke approaches severely underestimate the "true R2".

References

Aldrich, J. H. and Nelson, F. D. (1984): Linear Probability, Logit, and probit Models, Sage University Press, Beverly Hills.

Cox D R & Snell E J (1989) The Analysis of Binary Data 2nd ed. London: Chapman and Hall.

Efron, B. (1978). Regression and ANOVA with zero-one data: Measures of residual variation. Journal of the American Statistical Association, 73(361), 113--121.

Hosmer, D. W., & Lemeshow, S. (2000). Applied logistic regression (2nd ed.). Hoboke, NJ: Wiley.

McFadden D (1979). Quantitative methods for analysing travel behavior of individuals: Some recent developments. In D. A. Hensher & P. R. Stopher (Eds.), Behavioural travel modelling (pp. 279-318). London: Croom Helm.

McKelvey, R. D., & Zavoina, W. (1975). A statistical model for the analysis of ordinal level dependent variables. The Journal of Mathematical Sociology, 4(1), 103--120

Nagelkerke, N. J. D. (1991). A note on a general definition of the coefficient of determination. Biometrika, 78(3), 691--692.

Tjur, T. (2009) Coefficients of determination in logistic regression models - a new proposal: The coefficient of discrimination. The American Statistician, 63(4): 366-372

Veall, M.R., & Zimmermann, K.F. (1992) Evalutating Pseudo-R2's fpr binary probit models. Quality&Quantity, 28, pp. 151-164

Examples

Run this code

# NOT RUN {
r.glm <- glm(Survived ~ ., data=Untable(Titanic), family=binomial)
PseudoR2(r.glm)

PseudoR2(r.glm, c("McFadden", "Nagel"))
# }

Run the code above in your browser using DataLab