ssllrm: Fitting Smoothing Spline Log-Linear Regression Models

Description

Fit smoothing spline log-linear regression models. The symbolic model specification via formula follows the same rules as in lm.

Usage

ssllrm(formula, response, type=NULL, data=list(), weights, subset,
       na.action=na.omit, alpha=1, id.basis=NULL, nbasis=NULL,
       seed=NULL, random=NULL, prec=1e-7, maxiter=30, skip.iter=FALSE)

Value

ssllrm returns a list object of class "ssllrm".

The method predict.ssllrm can be used to evaluate

f(y|x) at arbitrary x, or contrasts of log{f(y|x)}

such as the odds ratio along with standard errors. The method

project.ssllrm can be used to calculate the Kullback-Leibler projection for model selection.

Arguments

formula: Symbolic description of the model to be fit.
response: Formula listing response variables.
type: List specifying the type of spline for each variable. See mkterm for details.
data: Optional data frame containing the variables in the model.
weights: Optional vector of weights to be used in the fitting process.
subset: Optional vector specifying a subset of observations to be used in the fitting process.
na.action: Function which indicates what should happen when the data contain NAs.
alpha: Parameter modifying GCV or Mallows' CL; larger absolute values yield smoother fits; negative value invokes a stable and more accurate GCV/CL evaluation algorithm but may take two to five times as long. Ignored when method="m" are specified.
id.basis: Index designating selected "knots".
nbasis: Number of "knots" to be selected. Ignored when id.basis is supplied.
seed: Seed to be used for the random generation of "knots". Ignored when id.basis is supplied.
random: Input for parametric random effects in nonparametric mixed-effect models. See mkran for details.
prec: Precision requirement for internal iterations.
maxiter: Maximum number of iterations allowed for internal iterations.
skip.iter: Flag indicating whether to use initial values of theta and skip theta iteration. See ssanova for notes on skipping theta iteration.

Details

The model is specified via formula and response, where response lists the response variables. For example, ssllrm(~y1*y2*x,~y1+y2) prescribe a model of the form $$ log f(y1,y2|x) = g_{1}(y1) + g_{2}(y2) + g_{12}(y1,y2) + g_{x1}(x,y1) + g_{x2}(x,y2) + g_{x12}(x,y1,y2) + C(x) $$ with the terms denoted by "y1", "y2", "y1:y2", "y1:x", "y2:x", and "y1:y2:x"; the term(s) not involving response(s) are removed and the constant C(x) is determined by the fact that a conditional density integrates (adds) to one on the y axis.

The model terms are sums of unpenalized and penalized terms. Attached to every penalized term there is a smoothing parameter, and the model complexity is largely determined by the number of smoothing parameters.

A subset of the observations are selected as "knots." Unless specified via id.basis or nbasis, the number of "knots" $q$ is determined by $max(30,10n^{2/9})$, which is appropriate for the default cubic splines for numerical vectors.

References

Gu, C. and Ma, P. (2011), Nonparametric regression with cross-classified responses. The Canadian Journal of Statistics, 39, 591--609.

Gu, C. (2014), Smoothing Spline ANOVA Models: R Package gss. Journal of Statistical Software, 58(5), 1-25. URL http://www.jstatsoft.org/v58/i05/.

Examples

Run this code

## Simulate data
test <- function(x)
        {.3*(1e6*(x^11*(1-x)^6)+1e4*(x^3*(1-x)^10))-2}
x <- (0:100)/100
p <- 1-1/(1+exp(test(x)))
y <- rbinom(x,3,p)
y1 <- as.ordered(y)
y2 <- as.factor(rbinom(x,1,p))
## Fit model
fit <- ssllrm(~y1*y2*x,~y1+y2)

## Evaluate f(y|x)
est <- predict(fit,data.frame(x=x),
               data.frame(y1=as.factor(0:3),y2=as.factor(rep(0,4))))
## f(y|x) at all y values (fit$qd.pt)
est <- predict(fit,data.frame(x=x))

## Evaluate contrast of log f(y|x)
est <- predict(fit,data.frame(x=x),odds=c(-1,.5,.5,0),
               data.frame(y1=as.factor(0:3),y2=as.factor(rep(0,4))),se=TRUE)
## Odds ratio log{f(0,0|x)/f(3,0|x)}
est <- predict(fit,data.frame(x=x),odds=c(1,-1),
               data.frame(y1=as.factor(c(0,3)),y2=as.factor(c(0,1))),se=TRUE)

## KL projection
kl <- project(fit,include=c("y2:x","y1:y2","y1:x","y2:x"))

## Clean up
if (FALSE) rm(test,x,p,y,y1,y2,fit,est,kl)
dev.off()

Run the code above in your browser using DataLab