LCa.fit: Fit Lee-Carter-type models for rates to arbitrarily shaped observations of rates in a Lexis diagram.

Description

The Lee-Carter model is originally defined as a model for rates observed in A-sets (age by period) of a Lexis diagram, as log(rate(x,t)) = a(x) + b(x)k(t), using one parameter per age(x) and period(t). This function uses natural splines for a(), b() and k(), placing knots for each effect such that the number of events is the same between knots. Also fits the continuous time counterparts of all models supported by the lca.rh function from the ilc package (see details).

Usage

LCa.fit( data, A, P, D, Y,
         model = "APa",    # or one of "ACa", "APaC", "APCa" or "APaCa" 
         a.ref,            # age reference for the interactions
        pi.ref = a.ref,    # age reference for the period interaction
        ci.ref = a.ref,    # age reference for the cohort interaction
         p.ref,            # period reference for the interaction
         c.ref,            # cohort reference for the interactions
          npar = c(a = 6,  # no. knots for main age-effect
                   p = 6,  # no. knots for period-effect
                   c = 6,  # no. knots for cohort-effect
                  pi = 6,  # no. knots for age in the period interaction
                  ci = 6), # no. knots for age in the cohort interaction
            VC = TRUE,     # numerical calculation of the Hessian?
         alpha = 0.05,     # 1 minus confidence level
           eps = 1e-6,     # convergence criterion
         maxit = 100,      # max. no iterations
         quiet = TRUE )    # cut the crap
# S3 method for LCa
print( x, ... )
# S3 method for LCa
summary( object, show.est=FALSE, ... )
# S3 method for LCa
plot( x, ... )
# S3 method for LCa
predict( object, newdata,
                        alpha = 0.05,
                        level = 1-alpha,
                          sim = ( "vcov" %in% names(object) ),
                          ... )

Arguments

data

A data frame. Must have columns A(age), P(period, that is calendar time), D(no. of events) and Y(person-time, exposure). Alternatively these four quantities can be given as separate vectors:

Vector of ages (midpoint of observation).

Vector of period (midpoint of observation).

Vector of no. of events.

Vector of person-time. Demographers would say "exposure", bewildering epidemiologists.

a.ref

Reference age for the age-interaction term(s) pi(x) and/or pi(x), where pi(a.ref)=1 and ci(a.ref)=1.

pi.ref

Same, but specifically for the interaction with period.

ci.ref

Same, but specifically for the interaction with cohort.

p.ref

Reference period for the time-interaction term kp(t) where kp(p.ref)=0.

c.ref

Reference period for the time-interaction term kp(t) where kc(c.ref)=0.

model

Character, either "APa" which is the classical Lee-Carter model for log-rates, other possibilities are "ACa", "APCa", "APaC" or "APaCa", see details.

npar

A (possibly named) vector or list, with either the number of knots or the actual vectors of knots for each term. If unnamed, components are taken to be in the order (a,b,t), if the model is "APaCa" in the order (a,p,c,pi,ci). If a vector, the three integers indicate the number of knots for each term; these will be placed so that there is an equal number of events (D) between each, and half as many below the first and above the last knot. If npar is a list of scalars the behavior is the same. If npar is a list of vectors, these are taken as the knots for the natural splines.

Logical. Should the variance-covariance matrix of the parameters be computed by numerical differentiation? See details.

alpha

1 minus the confidence level used when calculating confidence intervals for estimates in LCa.fit and for predictions by predict.LCa.

eps

Convergence criterion for the deviance, we use the the relative difference between deviance from the two models fitted.

maxit

Maximal number of iterations.

quiet

Shall I shut up or talk extensively to you about iteration progression etc.?

object

An LCa object, see under "Value".

show.est

Logical. Should the estimates be printed?

An LCa object, see under "Value".

newdata

Prediction data frame, must have columns A and P. Any Y column is ignored, predictions are given in units of the Y supplied for the call that generated the LCa object.

level

Confidence level.

sim

Logical or numeric. If TRUE, prediction c.i.s will be based on 1000 simulations from the posterior parameters. If numeric, it will be based on that number of simulations.

...

Additional parameters passed on to the method.

Value

LCa.fit returns an object of class LCa (smooth effects Lee-Carter model); it is a list with the following components:

model

Character, either APa, ACa, APaC, APCa or APaCa, indicating the variable(s) interacting with age.

3-column matrix of age-effects, c.i. from the age-time model. Rownames are the unique occurring ages in the dataset. Estimates are rates.

3-column matrix of age-period interaction effects, c.i. from the age model. Rownames are the actually occurring ages in the dataset. Estimates are multipliers of the log-RRs in kt, centered at 1 at ci.ref.

3-column matrix of period-effects, with c.i.s from the age-time model. Rownames are the actually occurring times in the dataset. Estimates are rate-ratios centered at 1 at p.ref.

3-column matrix of age-cohort interaction effects, c.i. from the age model. Rownames are the actually occurring ages in the dataset. Estimates are multipliers of the log-RRs in kt, centered at 1 at ci.ref.

3-column matrix of period-effects, with c.i.s from the age-time model. Rownames are the actually occurring times in the dataset. Estimates are rate-ratios centered at 1 at p.ref.

mod.at

glm object with the final age-time model. Gives the same fit as the mod.b model.

mod.b

glm object with the final age model. Gives the same fit as the mod.at model.

coef

All coefficients from both models, in the order ax, kp, kc, pi, ci. Only present if LCa.fit were called with VC=TRUE (the default).

vcov

Variance-covariance matrix of coefficients from both models, in the same order as in the coef. Only present if LCa.fit were called with VC=TRUE.

knots

List of vectors of knots used in for the age, period and cohort effects.

refs

List of reference points used for the age, period and cohort terms in the interactions.

deviance

Deviance of the model

df.residual

Residual degrees of freedom

iter

Number of iterations used to reach convergence.

plot.LCa plots the estimated effects in separate panels, using a log-scale for the baseline rates (ax) and the time-RR (kt). For the APaCa model 5 panels are plotted. summary.LCa returns (invisibly) a matrix with the parameters from the models and a column of the conditional se.s and of the se.s derived from the numerically computed Hessian (if LCa.fit were called with VC=TRUE.)

predict.LCa returns a matrix with one row per row in newdata. If LCa.fit were called with VC=TRUE there will be 3 columns, namely prediction (1st column) and c.i.s based on a simulation of parameters from a multivariate normal with mean coef and variance vcov using the median and alpha/2 quantiles from the sim simulations. If LCa.fit were called with VC=FALSE there will be 6 columns, namely estimates and c.i.s from age-time model (mod.at), and from the age-interaction model (mod.b), both using conditional variances.

Details

The Lee-Carter model is non-linear in age and time so does not fit in the classical glm-Poisson framework. But for fixed b(x) it is a glm, and also for fixed a(x), k(t). The function alternately fits the two versions until the same fit is produced (same deviance).

The multiplicative age by period term could equally well have been a multiplicative age by cohort or even both. Thus the most extensive model is:

$$\log(\lambda(a,p)) = f(a) + b_p(a)k_p(a) + b_c(a)k_c(a)$$

The naming convention for the models is a capital P and/or C if the effect is in the model followed by a lower case a if there is an interaction with age. Thus there are 5 different models that can be fitted: APa, ACa, APaC APCa and APaCa.

The standard errors of the parameters from the two model fits are however wrong; they are conditional on some of terms having a fixed value. And the symbolic calculation of the Hessian is a nightmare, so this is done numerically using the hessian function from the numDeriv package if VC=TRUE.

The coefficients and the variance-covariance matrix of these are used in predict.LCa for a parametric bootstrap (that is, a simulation from a multivariate normal with mean equal to the parameter estimates and variance as the estimated variance-covariance) to get confidence intervals for the predictions if sim is TRUE --- which it is by default if they are part of the object.

The plot for LCa objects merely produces between 3 and 5 panels showing each of the terms in the model. These are mainly for preliminary inspection; real reporting of the effects should use proper relative scaling of the effects.

Examples

Run this code

# NOT RUN {
library( Epi )
# Load the testis cancer data by Lexis triangles
data( testisDK )
tc <- subset( testisDK, A>14 & A<60 )
head( tc )

# We want to see rates per 100,000 PY
tc$Y <- tc$Y / 10^5

# Fit the Lee-Carter model with age-period interaction (default)
LCa.tc <- LCa.fit( tc, model="ACa", a.ref=30, p.ref=1980, quiet=FALSE, eps=10e-4, maxit=50 )

LCa.tc
summary( LCa.tc )

# Inspect what we got
names( LCa.tc )

# show the estimated effects
par( mfrow=c(1,3) )
plot( LCa.tc )

# Prediction data frame for ages 15 to 60 for three time points: 
nd <- data.frame( A=15:60 )

p50 <- predict.LCa( LCa.tc, newdata=cbind(nd,P=1950), sim=10000 )
p70 <- predict.LCa( LCa.tc, newdata=cbind(nd,P=1970), sim=10000 )
p90 <- predict.LCa( LCa.tc, newdata=cbind(nd,P=1990), sim=10000 )

# Inspect the curves from the parametric bootstrap (simulation):
par( mfrow=c(1,1) )
matplot( nd$A, cbind(p50,p70,p90), type="l", lwd=c(6,3,3), lty=1,
         col=rep( c("red","green","blue"), each=3 ), log="y",
         ylab="Testis cancer incidence per 100,000 PY, 1970, 80, 90", xlab="Age" )
# }

Run the code above in your browser using DataLab