rrvglm: Fitting Reduced-Rank Vector Generalized Linear Models (RR-VGLMs)

Description

A reduced-rank vector generalized linear model (RR-VGLM) is fitted. RR-VGLMs are VGLMs but some of the constraint matrices are estimated. In this documentation, $M$ is the number of linear predictors.

Usage

rrvglm(formula, family, data = list(), weights = NULL, subset = NULL,
       na.action = na.fail, etastart = NULL, mustart = NULL,
       coefstart = NULL, control = rrvglm.control(...), offset = NULL,
       method = "rrvglm.fit", model = FALSE, x.arg = TRUE, y.arg = TRUE,
       contrasts = NULL, constraints = NULL, extra = NULL,
       qr.arg = FALSE, smart = TRUE, ...)

Arguments

formula

a symbolic description of the model to be fit. The RHS of the formula is applied to each linear predictor. Different variables in each linear predictor can be chosen by specifying constraint matrices.

family

a function of class "vglmff" (see vglmff-class) describing what statistical model is to be fitted. This is called a ``VGAM family function''. See

data

an optional data frame containing the variables in the model. By default the variables are taken from environment(formula), typically the environment from which rrvglm is called.

weights

an optional vector or matrix of (prior) weights to be used in the fitting process. If weights is a matrix, then it must be in matrix-band form, whereby the first $M$ columns of the matrix are the diagonals, followed by th

subset

an optional logical vector specifying a subset of observations to be used in the fitting process.

na.action

a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is un

etastart

starting values for the linear predictors. It is a $M$-column matrix. If $M=1$ then it may be a vector.

mustart

starting values for the fitted values. It can be a vector or a matrix. Some family functions do not make use of this argument.

coefstart

starting values for the coefficient vector.

control

a list of parameters for controlling the fitting process. See rrvglm.control for details.

offset

a vector or $M$-column matrix of offset values. These are a priori known and are added to the linear predictors during fitting.

method

the method to be used in fitting the model. The default (and presently only) method rrvglm.fit uses iteratively reweighted least squares (IRLS).

model

a logical value indicating whether the model frame should be assigned in the model slot.

x.arg, y.arg

logical values indicating whether the model matrix and response vector/matrix used in the fitting process should be assigned in the x and y slots. Note the model matrix is the LM model matrix; to get the VGLM model matrix

contrasts

an optional list. See the contrasts.arg of model.matrix.default.

constraints

an optional list of constraint matrices. The components of the list must be named with the term it corresponds to (and it must match in character format). Each constraint matrix must have $M$ rows, and be of full-column rank. By default, const

extra

an optional list with any extra information that might be needed by the family function.

qr.arg

logical value indicating whether the slot qr, which returns the QR decomposition of the VLM model matrix, is returned on the object.

smart

logical value indicating whether smart prediction (smartpred) will be used.

...

further arguments passed into rrvglm.control.

Value

An object of class "rrvglm", which has the the same slots as a "vglm" object. The only difference is that the some of the constraint matrices are estimates rather than known. But VGAM stores the models the same internally. The slots of "vglm" objects are described in vglm-class.

Details

The central formula is given by $$\eta = B_1^T x_1 + A \nu$$ where $x_1$ is a vector (usually just a 1 for an intercept), $x_2$ is another vector of explanatory variables, and $\nu = C^T x_2$ is an $R$-vector of latent variables. Here, $\eta$ is a vector of linear predictors, e.g., the $m$th element is $\eta_m = \log(E[Y_m])$ for the $m$th Poisson response. The matrices $B_1$, $A$ and $C$ are estimated from the data, i.e., contain the regression coefficients. For ecologists, the central formula represents a constrained linear ordination (CLO) since it is linear in the latent variables. It means that the response is a monotonically increasing or decreasing function of the latent variables.

The underlying algorithm of RR-VGLMs is iteratively reweighted least squares (IRLS) with an optimizing algorithm applied within each IRLS iteration (e.g., alternating algorithm).

In theory, any VGAM family function that works for vglm and vgam should work for rrvglm too.

rrvglm.fit is the function that actually does the work. It is vglm.fit with some extra code.

References

Yee, T. W. and Hastie, T. J. (2003) Reduced-rank vector generalized linear models. Statistical Modelling, 3, 15--41.

Yee, T. W. (2004) A new technique for maximum-likelihood canonical Gaussian ordination. Ecological Monographs, 74, 685--701.

Anderson, J. A. (1984) Regression and ordered categorical variables. Journal of the Royal Statistical Society, Series B, Methodological, 46, 1--30.

Documentation accompanying the VGAM package at http://www.stat.auckland.ac.nz/~yee contains further information and examples.

Examples

Run this code

# Example 1: negative binomial with Var(Y) = mu + mu^s2, s2 unknown
nn = 500
s2 = 1.5    # Specify this
c2 = 2 - s2
ndata = data.frame(x2 = runif(nn), x3 = runif(nn))
ndata = transform(ndata, mu = exp(2 + 1 * x2 + 0 * x3))
ndata = transform(ndata, y2 = rnbinom(nn, mu=mu, size=mu^c2))
plot(y2 ~ x2, data = ndata, pch = "+", col = 'blue',
     main=paste("Var(Y) = mu + mu^", s2, sep=""))
Fit2 = rrvglm(y2 ~ x2 + x3, negbinomial(zero = NULL),
              data = ndata, Norrr = NULL)
c2hat = (Coef(Fit2)@A)["log(k)", 1]
s2hat = 2 - c2hat
s2hat # Estimate of s2


# Example 2
data(car.all)
index = with(car.all, Country == "Germany" | Country == "USA" |
                      Country == "Japan"   | Country == "Korea")
scar = car.all[index, ]  # standardized car data
fcols = c(13,14,18:20,22:26,29:31,33,34,36)  # These are factors
scar[,-fcols] = scale(scar[,-fcols]) # Standardize all numerical vars
ones = matrix(1, 3, 1)
cms = list("(Intercept)" = diag(3), Width = ones, Weight = ones,
           Disp. = diag(3), Tank = diag(3), Price = diag(3), 
           Frt.Leg.Room = diag(3))
set.seed(111)
fit = rrvglm(Country ~ Width + Weight + Disp. + Tank + Price + Frt.Leg.Room,
             multinomial, data =  scar, Rank = 2, trace = TRUE,
             constraints = cms, Norrr = ~ 1 + Width + Weight,
             Uncor = TRUE, Corner = FALSE, Bestof = 2)
fit@misc$deviance  # A history of the fits
Coef(fit)
biplot(fit, chull = TRUE, scores = TRUE, clty = 2, Ccex = 2,
       ccol = "blue", scol = "red", Ccol = "darkgreen", Clwd = 2,
       main = "1=Germany, 2=Japan, 3=Korea, 4=USA")

Run the code above in your browser using DataLab