vgam: Fitting Vector Generalized Additive Models

Description

Fit a vector generalized additive model (VGAM). This is a large class of models that includes generalized additive models (GAMs) and vector generalized linear models (VGLMs) as special cases.

Usage

vgam(formula, family, data = list(), weights = NULL, subset = NULL, 
     na.action = na.fail, etastart = NULL, mustart = NULL, 
     coefstart = NULL, control = vgam.control(...), offset = NULL, 
     method = "vgam.fit", model = FALSE, x.arg = TRUE, y.arg = TRUE, 
     contrasts = NULL, constraints = NULL, 
     extra = list(), form2 = NULL, qr.arg = FALSE, smart = TRUE, ...)

Arguments

formula

a symbolic description of the model to be fit. The RHS of the formula is applied to each linear/additive predictor, and usually includes at least one s term. Different variables in each linear/additive

family

Same as for vglm.

data

an optional data frame containing the variables in the model. By default the variables are taken from environment(formula), typically the environment from which vgam is called.

weights, subset, na.action

Same as for vglm. Note that subset may be unreliable and to get around this problem it is best to use subset to create a new smaller data

etastart, mustart, coefstart

Same as for vglm.

control

a list of parameters for controlling the fitting process. See vgam.control for details.

method

the method to be used in fitting the model. The default (and presently only) method vgam.fit uses iteratively reweighted least squares (IRLS).

constraints, model, offset

Same as for vglm.

x.arg, y.arg

logical values indicating whether the model matrix and response vector/matrix used in the fitting process should be assigned in the x and y slots. Note the model matrix is the LM model matrix; to get the VGAM model matrix

contrasts, extra, form2, qr.arg, smart

Same as for vglm.

...

further arguments passed into vgam.control.

Value

An object of class "vgam" (see vgam-class for further information).

WARNING

Currently vgam can only handle constraint matrices cmat, say, such that crossprod(cmat) is diagonal. This is a bug that I will try to fix up soon; see is.buggy.

See warnings in vglm.control.

Details

A vector generalized additive model (VGAM) is loosely defined as a statistical model that is a function of $M$ additive predictors. The central formula is given by $$\eta_j = \sum_{k=1}^p f_{(j)k}(x_k)$$ where $x_k$ is the $k$th explanatory variable (almost always $x_1=1$ for the intercept term), and $f_{(j)k}$ are smooth functions of $x_k$ that are estimated by smoothers. The first term in the summation is just the intercept. Currently only one type of smoother is implemented and this is called a vector (cubic smoothing spline) smoother. Here, $j=1,\ldots,M$ where $M$ is finite. If all the functions are constrained to be linear then the resulting model is a vector generalized linear model (VGLM). VGLMs are best fitted with vglm.

Vector (cubic smoothing spline) smoothers are represented by s() (see s). Local regression via lo() is not supported. The results of vgam will differ from the gam() (in the gam) because vgam() uses a different knot selection algorithm. In general, fewer knots are chosen because the computation becomes expensive when the number of additive predictors $M$ is large.

The underlying algorithm of VGAMs is iteratively reweighted least squares (IRLS) and modified vector backfitting using vector splines. B-splines are used as the basis functions for the vector (smoothing) splines. vgam.fit() is the function that actually does the work. The smoothing code is based on F. O'Sullivan's BART code.

A closely related methodology based on VGAMs called constrained additive ordination (CAO) first forms a linear combination of the explanatory variables (called latent variables) and then fits a GAM to these. This is implemented in the function cao for a very limited choice of family functions.

References

Yee, T. W. and Wild, C. J. (1996) Vector generalized additive models. Journal of the Royal Statistical Society, Series B, Methodological, 58, 481--493.

Yee, T. W. (2008) The VGAM Package. R News, 8, 28--39.

Examples

Run this code

# Nonparametric proportional odds model 
pneumo <- transform(pneumo, let = log(exposure.time))
vgam(cbind(normal, mild, severe) ~ s(let),
     cumulative(parallel = TRUE), data = pneumo, trace = TRUE)

# Nonparametric logistic regression 
fit <- vgam(agaaus ~ s(altitude, df = 2), binomialff, data = hunua)
plot(fit, se = TRUE)
pfit <- predict(fit, type = "terms", raw = TRUE, se = TRUE)
names(pfit)
head(pfit$fitted)
head(pfit$se.fit)
pfit$df
pfit$sigma

# Fit two species simultaneously 
fit2 <- vgam(cbind(agaaus, kniexc) ~ s(altitude, df = c(2, 3)),
             binomialff(multiple.responses = TRUE), data = hunua)
coef(fit2, matrix = TRUE)  # Not really interpretable 
plot(fit2, se = TRUE, overlay = TRUE, lcol = 1:2, scol = 1:2)

ooo <- with(hunua, order(altitude))
with(hunua, matplot(altitude[ooo], fitted(fit2)[ooo,], ylim = c(0, 0.8),
     xlab = "Altitude (m)", ylab = "Probability of presence", las = 1,
     main = "Two plant species' response curves", type = "l", lwd = 2))
with(hunua, rug(altitude))

# The subset argument does not work here. Use subset() instead.
set.seed(1)
zdata <- data.frame(x2 = runif(nn <- 100))
zdata <- transform(zdata, y = rbinom(nn, 1, 0.5))
zdata <- transform(zdata, subS = runif(nn) < 0.7)
sub.zdata <- subset(zdata, subS)  # Use this instead
if (FALSE)
  fit4a <- vgam(cbind(y, y) ~ s(x2, df = 2),
                binomialff(multiple.responses = TRUE), 
                data = zdata, subset = subS)  # This fails!!!
fit4b <- vgam(cbind(y, y) ~ s(x2, df = 2),
              binomialff(multiple.responses = TRUE), 
              data = sub.zdata)  # This succeeds!!!

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

WARNING

Details

References

See Also

Examples