impactPO: Impact of Proportional Odds Assumpton

Description

Checks the impact of the proportional odds assumption by comparing predicted cell probabilities from a PO model with those from a multinomial or partial proportional odds logistic model that relax assumptions. For a given model formula, fits the model with both lrm and either nnet::multinom or VGAM::vglm or both, and obtains predicted cell probabilities for the PO and relaxed models on the newdata data frame. A print method formats the output.

Usage

impactPO(
  formula,
  relax = if (missing(nonpo)) "multinomial" else "both",
  nonpo,
  newdata,
  data = environment(formula),
  minfreq = 15,
  B = 0,
  ...
)

Value

an impactPO object which is a list with elements estimates, stats, mad, newdata, nboot, and boot. estimates is a data frame containing the variables and values in newdata in a tall and thin format with additional variable method ("PO", "Multinomial", "PPO"), y (current level of the dependent variable), and Probability (predicted cell probability for covariate values and value of y in the current row). stats is a data frame containing Deviance the model deviance, d.f. the total number of parameters counting intercepts, AIC, p the number of regression coefficients, LR chi^2 the likelihood ratio chi-square statistic for testing the predictors, LR - p a chance-corrected LR chi-square, LR chi^2 test for PO the likelihood ratio chi-square test statistic for testing the PO assumption (by comparing -2 log likelihood for a relaxed model to that of a fully PO model), d.f. the degrees of freedom for this test, Pr(>chi^2) the P-value for this test, MCS R2 the Maddala-Cox-Snell R2 using the actual sample size, MCS R2 adj (MCS R2 adjusted for estimating p regression coefficients by subtracting p from LR), McFadden R2, McFadden R2 adj (an AIC-like adjustment proposed by McFadden without full justification), Mean |difference} from PO the overall mean absolute difference between predicted probabilities over all categories of Y and over all covariate settings. mad contains newdata and separately by rows in newdata the mean absolute difference (over Y categories) between estimated probabilities by the indicated relaxed model and those from the PO model. nboot is the number of successful bootstrap repetitions, and boot is a 4-way array with dimensions represented by the nboot resamples, the number of rows in newdata, the number of outcome levels, and elements for PPO and multinomial. For the modifications of the Maddala-Cox-Snell indexes see Hmisc::R2Measures.

Arguments

formula: a model formula. To work properly with multinom or vglm the terms should have completely specified knot locations if a spline function is being used.
relax: defaults to "both" if nonpo is given, resulting in fitting two relaxed models. Set relax to "multinomial" or "ppo" to fit only one relaxed model. The multinomial model does not assume PO for any predictor.
nonpo: a formula with no left hand side variable, specifying the variable or variables for which PO is not assumed. Specifying nonpo results in a relaxed fit that is a partial PO model fitted with VGAM::vglm.
newdata: a data frame or data table with one row per covariate setting for which predictions are to be made
data: data frame containing variables to fit; default is the frame in which formula is found
minfreq: minimum sample size to allow for the least frequent category of the dependent variable. If the observed minimum frequency is less than this, the combine.levels() function will be called to combine enough consecutive levels so that this minimum frequency is achieved.
B: number of bootstrap resamples to do to get confidence intervals for differences in predicted probabilities for relaxed methods vs. PO model fits. Default is not to run the bootstrap. When running the bootstrap make sure that all model variables are explicitly in data= so that selection of random subsets of data will call along the correct rows for all predictors.
...: other parameters to pass to lrm and multinom

Author

Frank Harrell fh@fharrell.com

Details

Since partial proportional odds models and especially multinomial logistic models can have many parameters, it is not feasible to use this model comparison approach when the number of levels of the dependent variable Y is large. By default, the function will use combine.levels() to combine consecutive levels if the lowest frequency category of Y has fewer than minfreq observations.

References

Adjusted R-square note

Examples

Run this code


if (FALSE) {
set.seed(1)
age <- rnorm(500, 50, 10)
sex <- sample(c('female', 'male'), 500, TRUE)
y   <- sample(0:4, 500, TRUE)
d   <- expand.grid(age=50, sex=c('female', 'male'))
w   <- impactPO(y ~ age + sex, nonpo = ~ sex, newdata=d)
w
# Note that PO model is a better model than multinomial (lower AIC)
# since multinomial model's improvement in fit is low in comparison
# with number of additional parameters estimated.  Same for PO model
# in comparison with partial PO model.

# Reverse levels of y so stacked bars have higher y located higher
revo <- function(z) {
  z <- as.factor(z)
  factor(z, levels=rev(levels(as.factor(z))))
}

require(ggplot2)
ggplot(w$estimates, aes(x=method, y=Probability, fill=revo(y))) +
  facet_wrap(~ sex) + geom_col() +
  xlab('') + guides(fill=guide_legend(title=''))

# Now vary 2 predictors

d <- expand.grid(sex=c('female', 'male'), age=c(40, 60))
w <- impactPO(y ~ age + sex, nonpo = ~ sex, newdata=d)
w
ggplot(w$estimates, aes(x=method, y=Probability, fill=revo(y))) +
  facet_grid(age ~ sex) + geom_col() +
 xlab('') + guides(fill=guide_legend(title=''))
}

Run the code above in your browser using DataLab