Estimate relative risk using smoothing spline ANOVA models. The
symbolic model specification via formula
follows the same
rules as in lm
, but with the response of a special
form.
sscox(formula, type=NULL, data=list(), weights=NULL, subset,
na.action=na.omit, partial=NULL, alpha=1.4, id.basis=NULL,
nbasis=NULL, seed=NULL, random=NULL, prec=1e-7, maxiter=30,
skip.iter=FALSE)
sscox
returns a list object of class "sscox"
.
The method predict.sscox
can be used to evaluate the
fits at arbitrary points along with standard errors. The method
project.sscox
can be used to calculate the
Kullback-Leibler projection for model selection.
Symbolic description of the model to be fit, where
the response is of the form Surv(futime,status,start=0)
.
List specifying the type of spline for each variable.
See mkterm
for details.
Optional data frame containing the variables in the model.
Optional vector of counts for duplicated data.
Optional vector specifying a subset of observations to be used in the fitting process.
Function which indicates what should happen when the data contain NAs.
Optional symbolic description of parametric terms in partial spline models.
Parameter defining cross-validation score for smoothing parameter selection.
Index of observations to be used as "knots."
Number of "knots" to be used. Ignored when
id.basis
is specified.
Seed to be used for the random generation of "knots."
Ignored when id.basis
is specified.
Input for parametric random effects (frailty) in
nonparametric mixed-effect models. See mkran
for
details.
Precision requirement for internal iterations.
Maximum number of iterations allowed for internal iterations.
Flag indicating whether to use initial values of
theta and skip theta iteration. See ssanova
for
notes on skipping theta iteration.
A proportional hazard model is assumed, and the relative risk is
estimated via penalized partial likelihood. The model specification
via formula
is for the log relative risk. For example,
Suve(t,d)~u*v
prescribes a model of the form
$$
log f(u,v) = g_{u}(u) + g_{v}(v) + g_{u,v}(u,v)
$$
with the terms denoted by "u"
, "v"
, and "u:v"
;
relative risk is defined only up to a multiplicative constant, so
the constant term is not included in the model.
sscox
takes standard right-censored lifetime data, with
possible left-truncation and covariates; in
Surv(futime,status,start=0)~...
, futime
is the
follow-up time, status
is the censoring indicator, and
start
is the optional left-truncation time.
Parallel to those in a ssanova
object, the model terms
are sums of unpenalized and penalized terms. Attached to every
penalized term there is a smoothing parameter, and the model
complexity is largely determined by the number of smoothing
parameters.
The selection of smoothing parameters is through a cross-validation
mechanism designed for density estimation under biased sampling,
with a fudge factor alpha
; alpha=1
is "unbiased" for
the minimization of Kullback-Leibler loss but may yield severe
undersmoothing, whereas larger alpha
yields smoother
estimates.
A subset of the observations are selected as "knots." Unless
specified via id.basis
or nbasis
, the number of
"knots" \(q\) is determined by \(max(30,10n^{2/9})\), which is
appropriate for the default cubic splines for numerical vectors.
Gu, C. (2013), Smoothing Spline ANOVA Models (2nd Ed). New York: Springer-Verlag.
Gu, C. (2014), Smoothing Spline ANOVA Models: R Package gss. Journal of Statistical Software, 58(5), 1-25. URL http://www.jstatsoft.org/v58/i05/.
## Relative Risk
data(stan)
fit.rr <- sscox(Surv(futime,status)~age,data=stan)
est.rr <- predict(fit.rr,data.frame(age=c(35,40)),se=TRUE)
## Base Hazard
risk <- predict(fit.rr,stan)
fit.bh <- sshzd(Surv(futime,status)~futime,data=stan,offset=log(risk))
tt <- seq(0,max(stan$futime),length=51)
est.bh <- hzdcurve.sshzd(fit.bh,tt,se=TRUE)
## Clean up
if (FALSE) rm(stan,fit.rr,est.rr,risk,fit.bh,tt,est.bh)
Run the code above in your browser using DataLab