
Family for use with gam
or bam
, implementing regression for ordered categorical data.
A linear predictor provides the expected value of a latent variable following a logistic distribution. The
probability of this latent variable lying between certain cut-points provides the probability of the ordered
categorical variable being of the corresponding category. The cut-points are estimated along side the model
smoothing parameters (using the same criterion). The observed categories are coded 1, 2, 3, ... up to the
number of categories.
ocat(theta=NULL,link="identity",R=NULL)
An object of class extended.family
.
cut point parameter vector (dimension R-2
). If supplied and all positive, then taken to be the cut point increments (first cut point is fixed at -1). If any are negative then absolute values are taken as starting values for cutpoint increments.
The link function: only "identity"
allowed at present (possibly for ever).
the number of catergories.
Simon N. Wood simon.wood@r-project.org
Such cumulative threshold models are only identifiable up to an intercept, or one of the cut points.
Rather than remove the intercept, ocat
simply sets the first cut point to -1. Use predict.gam
with
type="response"
to get the predicted probabilities in each category.
Wood, S.N., N. Pya and B. Saefken (2016), Smoothing parameter and model selection for general smooth models. Journal of the American Statistical Association 111, 1548-1575 tools:::Rd_expr_doi("10.1080/01621459.2016.1180986")
library(mgcv)
## Simulate some ordered categorical data...
set.seed(3);n<-400
dat <- gamSim(1,n=n)
dat$f <- dat$f - mean(dat$f)
alpha <- c(-Inf,-1,0,5,Inf)
R <- length(alpha)-1
y <- dat$f
u <- runif(n)
u <- dat$f + log(u/(1-u))
for (i in 1:R) {
y[u > alpha[i]&u <= alpha[i+1]] <- i
}
dat$y <- y
## plot the data...
par(mfrow=c(2,2))
with(dat,plot(x0,y));with(dat,plot(x1,y))
with(dat,plot(x2,y));with(dat,plot(x3,y))
## fit ocat model to data...
b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=ocat(R=R),data=dat)
b
plot(b,pages=1)
gam.check(b)
summary(b)
b$family$getTheta(TRUE) ## the estimated cut points
## predict probabilities of being in each category
predict(b,dat[1:2,],type="response",se=TRUE)
Run the code above in your browser using DataLab