flexsurvreg(formula, anc=NULL, data, weights, bhazard, subset, na.action, dist, inits, fixedpars=NULL, dfns=NULL, aux=NULL, cl=0.95, integ.opts, sr.control=survreg.control(), ...)
Surv
function, and any covariates are given on the
right-hand side. For example, Surv(time, dead) ~ age + sex
Surv
objects of type="right"
,"counting"
,
"interval1"
or "interval2"
are supported, corresponding to
right-censored, left-truncated or interval-censored observations.
If there are no covariates, specify 1
on the right hand side,
for example Surv(time, dead) ~ 1
.
By default, covariates are placed on the ``location'' parameter of
the distribution, typically the "scale" or "rate" parameter, through
a linear model, or a log-linear model if this parameter must be
positive. This gives an accelerated failure time model or a
proportional hazards model (see dist
below) depending on how the
distribution is parameterised.
Covariates can be placed on other (``ancillary'') parameters by using
the name of the parameter as a ``function'' in the formula. For
example, in a Weibull model, the following expresses the scale
parameter in terms of age and a treatment variable treat
, and
the shape parameter in terms of sex and treatment.
Surv(time, dead) ~ age + treat + shape(sex) + shape(treat)
However, if the names of the ancillary parameters clash with any
real functions that might be used in formulae (such as I()
, or
factor()
), then those functions will not work in the
formula. A safer way to model covariates on ancillary parameters is
through the anc
argument to flexsurvreg
.
survreg
users should also note that the function
strata()
is ignored, so that any covariates surrounded by
strata()
are applied to the location parameter.
Surv(time, dead) ~ age + treat, anc = list(shape = ~ sex + treat)
formula
.
If not given, the variables should be in the working environment.
options()$na.action
.
"gengamma" |
Generalized gamma (stable) | mu |
AFT |
"gengamma.orig" |
Generalized gamma (original) |
scale | AFT |
"genf" |
Generalized F (stable) | mu | AFT |
"genf.orig" |
Generalized F (original) | mu |
AFT |
"weibull" |
Weibull |
scale | AFT |
"gamma" |
Gamma | rate | AFT |
"exp" |
Exponential | rate |
PH |
"llogis" |
Log-logistic |
scale | AFT |
"lnorm" |
Log-normal | meanlog | AFT |
"gompertz" |
Gompertz | rate |
PH |
"gengamma" |
Generalized gamma (stable) |
"exponential"
and "lognormal"
can be used as aliases
for "exp"
and "lnorm"
, for compatibility with
survreg
.
Alternatively, dist
can be a list specifying a custom
distribution. See section ``Custom distributions'' below for how to
construct this list.
Very flexible spline-based distributions can also be fitted with
flexsurvspline
.
The parameterisations of the built-in distributions used here are
the same as in their built-in distribution functions:
dgengamma
, dgengamma.orig
,
dgenf
, dgenf.orig
,
dweibull
, dgamma
, dexp
,
dlnorm
, dgompertz
, respectively. The
functions in base R are used where available, otherwise, they are
provided in this package.
For the Weibull, exponential and log-normal distributions,
flexsurvreg
simply works by calling
survreg
to obtain the maximum likelihood estimates,
then calling optim
to double-check convergence and
obtain the covariance matrix for flexsurvreg
's
preferred parameterisation.
The Weibull parameterisation is
different from that in survreg
, instead it
is consistent with dweibull
. The "scale"
reported by survreg
is equivalent to
1/shape
as defined by dweibull
and hence
flexsurvreg
. The first coefficient (Intercept)
reported by survreg
is equivalent to
log(scale)
in dweibull
and
flexsurvreg
.
Similarly in the exponential distribution, the rate, rather than the mean, is modelled on covariates.
flexsurv.dists
in the source for the exact methods used.
If the likelihood surface may be uneven, it is advised to run
the optimisation starting from various different initial values
to ensure convergence to the true global maximum.
inits
.
For example, in a stable
generalized Gamma model with two covariates, to fix the third
of three generalized gamma parameters (the shape Q
,
see the help for GenGamma
) and the second covariate,
specify fixedpars = c(3, 5)
"d"
, "p"
, "h"
,
or "H"
containing the probability density, cumulative
distribution, hazard, or cumulative hazard functions of the
distribution. For example, list(d=dllogis, p=pllogis)
.
If dfns
is used, a custom dlist
must still be
provided, but dllogis
and pllogis
need not be visible
from the global environment. This is useful if flexsurvreg
is called within other functions or environments where the
distribution functions are also defined dynamically.
flexsurvspline
to supply the knot locations and
modelling scale (e.g. hazard or odds). This cannot be used to fix
parameters of a distribution --- use fixedpars
for that.
integrate
, if a custom density or hazard is provided without its
cumulative version. For example, integ.opts = list(rel.tol=1e-12)
optim
. For example, the BFGS
optimisation algorithm is the default in flexsurvreg
,
but this can be changed, for example to method="Nelder-Mead"
which can be
more robust to poor initial values.
If the optimisation fails to converge, consider normalising the
problem using, for example, control=list(fnscale = 2500)
, for
example, replacing 2500 by a number of the order of magnitude of the
likelihood. If 'false' convergence is reported with a non-positive-definite Hessian,
then consider tightening the tolerance criteria for convergence. If
the optimisation takes a long time, intermediate steps can be
printed using the trace
argument of the control list. See
optim
for details.
"flexsurvreg"
containing information about the
fitted model. Components of interest to users may include:coef
,
vcov
and confint
methods for
flexsurvreg
objects work on this scale.vcov
.model.frame.flexsurvreg
or model.matrix.flexsurvreg
.flexsurvreg
is intended to be easy to extend to handle
new distributions. To define a new distribution for use in
flexsurvreg
, construct a list with the following
elements: name
:"dist"
, for example, then there must be
visible in the working environment, at least, either a) a function called ddist
which defines the probability
density, or b) a function called hdist
which defines the hazard. Ideally, in case a) there should also be a function called
pdist
which defines the probability distribution or
cumulative density, and in case b) there should be a function
called Hdist
defining the cumulative hazard. If these
additional functions are not provided, flexsurv attempts to
automatically create them by numerically integrating the density
or hazard function. However, model fitting will be much slower,
or may not even work at all, if the analytic versions of these
functions are not available. The functions must accept vector arguments (representing different
times, or alternative values for each parameter) and return the results
as a vector. The function Vectorize
may be helpful
for doing this: see the example below. These functions may be in an add-on package (see below for an
example) or may be user-written. If they are user-written they
must be defined in the global environment, or supplied explicitly
through the dfns
argument to flexsurvreg
.
The latter may be useful if the functions are created dynamically
(as in the source of flexsurvspline
) and thus not visible
through R's scoping rules. Arguments other than parameters must be named in the conventional
way -- for example x
for the first argument of the density
function or hazard, as in dnorm(x, ...)
and q
for the first argument of the probability function. Density
functions should also have an argument log
, after the
parameters, which when TRUE
, computes the log density,
using a numerically stable additive formula if possible. Additional functions with names beginning with "DLd"
and "DLS"
may be
defined to calculate the derivatives of the
log density and log survival probability, with respect to the
parameters of the distribution. The parameters are expressed on
the real line, for example after log transformation if they are
defined as positive. The first argument must be named
t
, representing the time, and the remaining arguments must be
named as the parameters of the density function. The function must return a
matrix with rows corresponding to times, and columns
corresponding to the parameters of the distribution. The derivatives are used, if
available, to speed up the model fitting with optim
.
pars
:location
:formula
supplied to
flexsurvreg
. transforms
:c(log, log)
for a distribution with two positive parameters. inv.transforms
:c(exp)
or list(exp)
. inits
:t
(including right-censoring times, and using the halfway point for
interval-censored times)
which returns a vector of reasonable initial values for maximum
likelihood estimation of each parameter. For example,
function(t){ c(1, mean(t)) }
will always initialize the first
of two parameters at 1, and the second (a scale
parameter, for instance) at the mean of t
.
dEV
and pEV
. See the
Examples below for the custom list in this case, and the
subsequent command to fit the model.optim
function.
Parameters defined to be positive are estimated on the log scale.
Confidence intervals are estimated from the Hessian at the maximum,
and transformed back to the original scale of the parameters. The usage of flexsurvreg
is intended to be similar to
survreg
in the survival package.
Cox, C. (2008) The generalized $F$ distribution: An umbrella for parametric survival analysis. Statistics in Medicine 27:4301-4312.
Cox, C., Chu, H., Schneider, M. F. and Muñoz, A. (2007) Parametric survival analysis and taxonomy of hazard functions for the generalized gamma distribution. Statistics in Medicine 26:4252-4374
Jackson, C. H. and Sharples, L. D. and Thompson, S. G. (2010) Survival models in health economic evaluations: balancing fit and parsimony to improve prediction. International Journal of Biostatistics 6(1):Article 34.
flexsurvspline
for flexible survival modelling using the
spline model of Royston and Parmar. plot.flexsurvreg
and lines.flexsurvreg
to
plot fitted survival, hazards and cumulative hazards from models fitted
by flexsurvreg
and flexsurvspline
.
data(ovarian)
## Compare generalized gamma fit with Weibull
fitg <- flexsurvreg(formula = Surv(futime, fustat) ~ 1, data = ovarian, dist="gengamma")
fitg
fitw <- flexsurvreg(formula = Surv(futime, fustat) ~ 1, data = ovarian, dist="weibull")
fitw
plot(fitg)
lines(fitw, col="blue", lwd.ci=1, lty.ci=1)
## Identical AIC, probably not enough data in this simple example for a
## very flexible model to be worthwhile.
## Custom distribution
## make "dEV" and "pEV" from eha package (if installed)
## available to the working environment
if (require("eha")) {
custom.ev <- list(name="EV",
pars=c("shape","scale"),
location="scale",
transforms=c(log, log),
inv.transforms=c(exp, exp),
inits=function(t){ c(1, median(t)) })
fitev <- flexsurvreg(formula = Surv(futime, fustat) ~ 1, data = ovarian,
dist=custom.ev)
fitev
lines(fitev, col="purple", col.ci="purple")
}
## Custom distribution: supply the hazard function only
hexp2 <- function(x, rate=1){ rate } # exponential distribution
hexp2 <- Vectorize(hexp2)
custom.exp2 <- list(name="exp2", pars=c("rate"), location="rate",
transforms=c(log), inv.transforms=c(exp),
inits=function(t)1/mean(t))
flexsurvreg(Surv(futime, fustat) ~ 1, data = ovarian, dist=custom.exp2)
flexsurvreg(Surv(futime, fustat) ~ 1, data = ovarian, dist="exp")
## should give same answer
Run the code above in your browser using DataLab