cumulative: Ordinal Regression with Cumulative Probabilities

Description

Fits a cumulative logit/probit/cloglog/cauchit/... regression model to a (preferably ordered) factor response.

Usage

cumulative(link = "logit", earg = list(),
           parallel = FALSE, reverse = FALSE,
           mv = FALSE, intercept.apply = FALSE)
scumulative(link="logit", earg = list(),
            lscale="loge", escale = list(),
            parallel=FALSE, sparallel=TRUE, reverse=FALSE, iscale = 1)

Arguments

link

Link function applied to the $J$ cumulative probabilities. See Links for more choices.

lscale

Link function applied to the $J$ scaling parameters. See Links for more choices.

earg, escale

List. Extra argument for the respective link functions. See earg in Links for general information.

parallel

A logical or formula specifying which terms have equal/unequal coefficients. See below for more information about the parallelism assumption.

sparallel

For the scaling parameters. A logical, or formula specifying which terms have equal/unequal coefficients. This argument is not applied to the intercept. The scumulative() function requires covariates; for intercept models use

reverse

Logical. By default, the cumulative probabilities used are $P(Y\leq 1)$, $P(Y\leq 2)$, ..., $P(Y\leq J)$. If reverse is TRUE, then $P(Y\geq 2)$, $P(Y\geq 3)$, ..., $P(Y\geq J+1)$ will be used.

This should be set t

Logical. Multivariate response? If TRUE then the input should be a matrix with values $1,2,\dots,L$, where $L=J+1$ is the number of levels. Each column of the matrix is a response, i.e., multivariate response. A suitable matrix can

intercept.apply

Logical. Whether the parallel argument should be applied to the intercept term. This should be set to TRUE for link= golf, polf

iscale

Numeric. Initial values for the scale parameters.

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm, rrvglm and vgam.

Warning

No check is made to verify that the response is ordinal; see ordered.

Details

By default, the non-parallel cumulative logit model is fitted, i.e., $$\eta_j = logit(P[Y \leq j])$$ where $j=1,2,\dots,M$ and the $\eta_j$ are not constrained to be parallel. This is also known as the non-proportional odds model. If the logit link is replaced by a complementary log-log link (cloglog) then this is known as the proportional-hazards model.

In almost all the literature, the constraint matrices associated with this family of models are known. For example, setting parallel=TRUE will make all constraint matrices (except for the intercept) equal to a vector of $M$ 1's. If the constraint matrices are equal, unknown and to be estimated, then this can be achieved by fitting the model as a reduced-rank vector generalized linear model (RR-VGLM; see rrvglm). Currently, reduced-rank vector generalized additive models (RR-VGAMs) have not been implemented here.

The scaled version of cumulative(), called scumulative(), has $J$ positive scaling factors. They are described in pages 154 and 177 of McCullagh and Nelder (1989); see their equation (5.4) in particular, which they call the generalized rational model.

References

Agresti, A. (2002) Categorical Data Analysis, 2nd ed. New York: Wiley.

Dobson, A. J. (2001) An Introduction to Generalized Linear Models, 2nd ed. Boca Raton: Chapman & Hall/CRC Press.

McCullagh, P. and Nelder, J. A. (1989) Generalized Linear Models, 2nd ed. London: Chapman & Hall.

Simonoff, J. S. (2003) Analyzing Categorical Data, New York: Springer-Verlag.

Yee, T. W. and Wild, C. J. (1996) Vector generalized additive models. Journal of the Royal Statistical Society, Series B, Methodological, 58, 481--493.

Documentation accompanying the VGAM package at http://www.stat.auckland.ac.nz/~yee contains further information and examples.

Examples

Run this code

# Fit the proportional odds model, p.179, in McCullagh and Nelder (1989)
data(pneumo)
pneumo = transform(pneumo, let=log(exposure.time))
(fit = vglm(cbind(normal, mild, severe) ~ let,
            cumulative(parallel=TRUE, reverse=TRUE), pneumo))
fit@y   # Sample proportions
weights(fit, type="prior")   # Number of observations
coef(fit, matrix=TRUE)
constraints(fit)   # Constraint matrices

# Check that the model is linear in let ----------------------
fit2 = vgam(cbind(normal, mild, severe) ~ s(let, df=2),
            cumulative(reverse=TRUE), pneumo)
plot(fit2, se=TRUE, overlay=TRUE, lcol=1:2, scol=1:2)

# Check the proportional odds assumption with a LRT ----------
(fit3 = vglm(cbind(normal, mild, severe) ~ let,
             cumulative(parallel=FALSE, reverse=TRUE), pneumo))
1 - pchisq(2*(logLik(fit3)-logLik(fit)),
           df=length(coef(fit3))-length(coef(fit)))

# A factor() version of fit ----------------------------------
nobs = round(fit@y * c(weights(fit, type="prior")))
sumnobs = colSums(nobs) # apply(nobs, 2, sum)
mydat = data.frame(
    response = ordered(c(rep("normal", times=sumnobs["normal"]),
                         rep("mild", times=sumnobs["mild"]),
                         rep("severe", times=sumnobs["severe"])),
                       levels = c("normal","mild","severe")),
    LET = c(with(pneumo, rep(let, times=nobs[,"normal"])),
            with(pneumo, rep(let, times=nobs[,"mild"])),
            with(pneumo, rep(let, times=nobs[,"severe"]))))
(fit4 = vglm(response ~ LET, data=mydat,
             cumulative(parallel=TRUE, reverse=TRUE), trace=TRUE))


# Long format (cf. wide format above) ----------------------------------
longdata = data.frame(symptoms=ordered(rep(rep(colnames(nobs), nrow(nobs)),
                                           times=c(t(nobs))),
                                       levels = colnames(nobs)),
                      let = rep(rep(with(pneumo, let), each=ncol(nobs)),
                                times=c(t(nobs))))
with(longdata, table(let, symptoms)) # check it; should be same as pneumo
mymustart = matrix(sumnobs / sum(sumnobs),
                   nrow(longdata), ncol(nobs), byrow=TRUE)
fit.long = vglm(symptoms ~ let,
                fam = cumulative(parallel=TRUE, reverse=TRUE),
                mustart=mymustart, data = longdata, trace=TRUE)
coef(fit.long, matrix=TRUE) # Should be same as coef(fit, matrix=TRUE)

Run the code above in your browser using DataLab