lms.bcn: LMS Quantile/Expectile Regression with a Box-Cox Transformation to Normality

Description

LMS quantile/expectile regression with the Box-Cox transformation to normality.

Usage

lms.bcn(percentiles = c(25, 50, 75), zero = c(1, 3),
        llambda = "identity", lmu = "identity", lsigma = "loge",
        elambda = list(), emu = list(), esigma = list(),
        dfmu.init = 4, dfsigma.init = 2, ilambda = 1,
        isigma = NULL, expectiles = FALSE)

Arguments

percentiles

A numerical vector containing values between 0 and 100, which are the quantiles or expectiles. They will be returned as `fitted values'.

zero

An integer-valued vector specifying which linear/additive predictors are modelled as intercepts only. The values must be from the set {1,2,3}. The default value usually increases the chance of successful convergence. Setting zero = NULL

llambda, lmu, lsigma

Parameter link functions applied to the first, second and third linear/additive predictors. See Links for more choices, and CommonVGAMffArguments<

elambda, emu, esigma

List. Extra argument for each of the links. See earg in Links for general information, as well as CommonVGAMffArguments.

dfmu.init

Degrees of freedom for the cubic smoothing spline fit applied to get an initial estimate of mu. See vsmooth.spline.

dfsigma.init

Degrees of freedom for the cubic smoothing spline fit applied to get an initial estimate of sigma. See vsmooth.spline. This argument may be assigned NULL to get an initial value

ilambda

Initial value for lambda. If necessary, it is recycled to be a vector of length $n$ where $n$ is the number of (independent) observations.

isigma

Optional initial value for sigma. If necessary, it is recycled to be a vector of length $n$. The default value, NULL, means an initial value is computed in the @initialize slot of the family function.

expectiles

A single logical. If TRUE then the method is LMS-expectile regression; expectiles are returned rather than quantiles. The default is LMS quantile regression based on the normal distribution.

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm, rrvglm and vgam.

Warning

The computations are not simple, therefore convergence may fail. In that case, try different starting values. Also, the estimate may diverge quickly near the solution, in which case try prematurely stopping the iterations by assigning maxits to be the iteration number corresponding to the highest likelihood value.

One trick is to fit a simple model and use it to provide initial values for a more complex model; see in the examples below.

Details

Given a value of the covariate, this function applies a Box-Cox transformation to the response to best obtain normality. The parameters chosen to do this are estimated by maximum likelihood or penalized maximum likelihood.

In more detail, the basic idea behind this method is that, for a fixed value of $x$, a Box-Cox transformation of the response $Y$ is applied to obtain standard normality. The 3 parameters ($\lambda$, $\mu$, $\sigma$, which start with the letters ``L-M-S'' respectively, hence its name) are chosen to maximize a penalized log-likelihood (with vgam). Then the appropriate quantiles of the standard normal distribution are back-transformed onto the original scale to get the desired quantiles. The three parameters may vary as a smooth function of $x$.

The Box-Cox power transformation here of the $Y$, given $x$, is $$Z = [(Y / \mu(x))^{\lambda(x)} - 1] / ( \sigma(x) \, \lambda(x) )$$ for $\lambda(x) \neq 0$. (The singularity at $\lambda(x) = 0$ is handled by a simple function involving a logarithm.) Then $Z$ is assumed to have a standard normal distribution. The parameter $\sigma(x)$ must be positive, therefore VGAM chooses $\eta(x)^T = (\lambda(x), \mu(x), \log(\sigma(x)))$ by default. The parameter $\mu$ is also positive, but while $\log(\mu)$ is available, it is not the default because $\mu$ is more directly interpretable. Given the estimated linear/additive predictors, the $100\alpha$ percentile can be estimated by inverting the Box-Cox power transformation at the $100\alpha$ percentile of the standard normal distribution.

Of the three functions, it is often a good idea to allow $\mu(x)$ to be more flexible because the functions $\lambda(x)$ and $\sigma(x)$ usually vary more smoothly with $x$. This is somewhat reflected in the default value for the argument zero, viz. zero = c(1,3).

References

Cole, T. J. and Green, P. J. (1992) Smoothing Reference Centile Curves: The LMS Method and Penalized Likelihood. Statistics in Medicine, 11, 1305--1319.

Green, P. J. and Silverman, B. W. (1994) Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach, London: Chapman & Hall.

Yee, T. W. (2004) Quantile regression via vector generalized additive models. Statistics in Medicine, 23, 2295--2315.

Documentation accompanying the VGAM package at http://www.stat.auckland.ac.nz/~yee contains further information and examples.

Examples

Run this code

fit = vgam(BMI ~ s(age, df = c(4,2)), lms.bcn(zero = 1), bminz, trace = TRUE)
head(predict(fit))
head(fitted(fit))
head(bminz)
head(cdf(fit)) # Person 1 is near lower BMI quartile amongst his age group
colMeans(c(fit@y) < fitted(fit)) # Sample proportions below the quantiles

# Convergence problems? Try this trick: fit0 is a simpler model used for fit1
fit0 = vgam(BMI ~ s(age, df = 4), lms.bcn(zero = c(1,3)), bminz, trace = TRUE)
fit1 = vgam(BMI ~ s(age, df = c(4, 2)), lms.bcn(zero = 1), bminz, 
            etastart = predict(fit0), trace = TRUE)

# Quantile plot
par(bty = "l", mar = c(5, 4, 4, 3) + 0.1, xpd = TRUE)
qtplot(fit, percentiles = c(5, 50, 90, 99), main = "Quantiles",
       xlim = c(15, 90), las = 1, ylab = "BMI", lwd = 2, lcol = 4)

# Density plot
ygrid = seq(15, 43, len = 100)  # BMI ranges
par(mfrow=c(1, 1), lwd = 2)
(aa = deplot(fit, x0 = 20, y = ygrid, xlab = "BMI", col = "black",
  main = "Density functions at Age = 20 (black), 42 (red) and 55 (blue)"))
aa = deplot(fit, x0 = 42, y = ygrid, add = TRUE, llty = 2, col = "red")
aa = deplot(fit, x0 = 55, y = ygrid, add = TRUE, llty = 4, col = "blue",
            Attach = TRUE)
aa@post$deplot  # Contains density function values

Run the code above in your browser using DataLab