rrvglm: Fitting Reduced-Rank Vector Generalized Linear Models (RR-VGLMs) and Doubly Constrained RR-VGLMs (DRR-VGLMs)

Description

A reduced-rank vector generalized linear model (RR-VGLM) is fitted. RR-VGLMs are VGLMs but some of the constraint matrices are estimated. Doubly constrained RR-VGLMs (DRR-VGLMs) can also be fitted, and these provide structure for the two other outer product matrices.

Usage

rrvglm(formula, family = stop("'family' is unassigned"),
       data = list(), weights = NULL, subset = NULL,
       na.action = na.fail, etastart = NULL, mustart = NULL,
       coefstart = NULL, control = rrvglm.control(...),
       offset = NULL, method = "rrvglm.fit", model = FALSE,
       x.arg = TRUE, y.arg = TRUE, contrasts = NULL,
       constraints = NULL, extra = NULL, qr.arg = FALSE,
       smart = TRUE, ...)

Value

For RR-VGLMs, an object of class "rrvglm", which has the the same slots as a "vglm"

object. The only difference is that the some of the constraint matrices are estimates rather than known. But VGAM stores the models the same internally. The slots of "vglm" objects are described in

vglm-class.

For DRR-VGLMs, an object of class "drrvglm".

Arguments

formula, family

See vglm.

weights, data

See vglm.

subset, na.action

See vglm.

etastart, mustart, coefstart

See vglm.

control

a list of parameters for controlling the fitting process. See rrvglm.control for details.

offset, model, contrasts

See vglm.

method

the method to be used in fitting the model. The default (and presently only) method rrvglm.fit uses iteratively reweighted least squares (IRLS).

x.arg, y.arg

logical values indicating whether the model matrix and response vector/matrix used in the fitting process should be assigned in the x and y slots. Note the model matrix is the LM model matrix; to get the VGLM model matrix type model.matrix(vglmfit) where vglmfit is a vglm object.

constraints

See vglm.

extra, smart, qr.arg

See vglm.

...

further arguments passed into rrvglm.control.

Author

Thomas W. Yee

Details

In this documentation, $M$ is the number of linear predictors. For RR-VGLMs, the central formula is given by $$\eta = B_1^T x_1 + A \nu$$ where $x_1$ is a vector (usually just a 1 for an intercept), $x_2$ is another vector of explanatory variables, and $\nu = C^T x_2$ is an $R$-vector of latent variables. Here, $\eta$ is a vector of linear predictors, e.g., the $m$th element is $\eta_m = \log(E[Y_m])$ for the $m$th Poisson response. The dimension of $\eta$ is $M$ by definition. The matrices $B_1$, $A$ and $C$ are estimated from the data, i.e., contain the regression coefficients. For ecologists, the central formula represents a constrained linear ordination (CLO) since it is linear in the latent variables. It means that the response is a monotonically increasing or decreasing function of the latent variables.

For identifiability it is common to enforce corner constraints on A: by default, for RR-VGLMs, the top $R$ by $R$ submatrix is fixed to be the order-$R$ identity matrix and the remainder of A is estimated. And by default, for DRR-VGLMs, there is also an order-$R$ identity matrix embedded in A because the RRR must be separable (this is so that any existing structure in A is preserved).

The underlying algorithm of RR-VGLMs is iteratively reweighted least squares (IRLS) with an optimizing algorithm applied within each IRLS iteration (e.g., alternating algorithm).

In theory, any VGAM family function that works for vglm and vgam should work for rrvglm too. The function that actually does the work is rrvglm.fit; it is essentially vglm.fit with some extra code.

References

Yee, T. W. and Hastie, T. J. (2003). Reduced-rank vector generalized linear models. Statistical Modelling, 3, 15--41.

Yee, T. W. (2004). A new technique for maximum-likelihood canonical Gaussian ordination. Ecological Monographs, 74, 685--701.

Anderson, J. A. (1984). Regression and ordered categorical variables. Journal of the Royal Statistical Society, Series B, Methodological, 46, 1--30.

Yee, T. W. (2014). Reduced-rank vector generalized linear models with two linear predictors. Computational Statistics and Data Analysis, 71, 889--902.

Yee, T. W., Frigau, L. and Ma, C. (2024). Heaping and seeping, GAITD regression and doubly constrained reduced rank vector generalized linear models, in smoking studies. In preparation.

Examples

Run this code

if (FALSE) {
# Example 1: RR NB with Var(Y) = mu + delta1 * mu^delta2
nn <- 1000       # Number of observations
delta1 <- 3.0    # Specify this
delta2 <- 1.5    # Specify this; should be greater than 1
a21 <- 2 - delta2
mydata <- data.frame(x2 = runif(nn), x3 = runif(nn))
mydata <- transform(mydata, mu = exp(2 + 3 * x2 + 0 * x3))
mydata <- transform(mydata,
    y2 = rnbinom(nn, mu = mu, size = (1/delta1)*mu^a21))
plot(y2 ~ x2, mydata, pch = "+", col = 'blue', las = 1,
  main = paste0("Var(Y) = mu + ", delta1, " * mu^", delta2))
rrnb2 <- rrvglm(y2 ~ x2 + x3, negbinomial(zero = NULL),
                data = mydata, trace = TRUE)

a21.hat <- (Coef(rrnb2)@A)["loglink(size)", 1]
beta11.hat <- Coef(rrnb2)@B1["(Intercept)", "loglink(mu)"]
beta21.hat <- Coef(rrnb2)@B1["(Intercept)", "loglink(size)"]
(delta1.hat <- exp(a21.hat * beta11.hat - beta21.hat))
(delta2.hat <- 2 - a21.hat)
# delta1.hat:
# exp(a21.hat * predict(rrnb2)[1,1] - predict(rrnb2)[1,2])
summary(rrnb2)

# Obtain a 95 percent CI for delta2:
se.a21.hat <- sqrt(vcov(rrnb2)["I(latvar.mat)", "I(latvar.mat)"])
ci.a21 <- a21.hat +  c(-1, 1) * 1.96 * se.a21.hat
(ci.delta2 <- 2 - rev(ci.a21))  # The 95 percent CI

Confint.rrnb(rrnb2)  # Quick way to get it

# Plot the abundances and fitted values vs the latent variable
plot(y2 ~ latvar(rrnb2), data = mydata, col = "blue",
     xlab = "Latent variable", las = 1)
ooo <- order(latvar(rrnb2))
lines(fitted(rrnb2)[ooo] ~ latvar(rrnb2)[ooo], col = "red")

# Example 2: stereotype model (RR multinomial logit model)
data(car.all)
scar <- subset(car.all,
    is.element(Country, c("Germany", "USA", "Japan", "Korea")))
fcols <- c(13,14,18:20,22:26,29:31,33,34,36)  # These are factors
scar[, -fcols] <- scale(scar[, -fcols])  # Stdze all numerical vars
ones <- CM.ones(3)  # matrix(1, 3, 1)
clist <- list("(Intercept)" = diag(3), Width = ones, Weight = ones,
              Disp. = diag(3), Tank = diag(3), Price = diag(3),
              Frt.Leg.Room = diag(3))
set.seed(111)
fit <- rrvglm(Country ~ Width + Weight + Disp. + Tank +
              Price + Frt.Leg.Room,
              multinomial, data = scar, Rank = 2, trace = TRUE,
              constraints = clist, noRRR = ~ 1 + Width + Weight,
#             Uncor = TRUE, Corner = FALSE,  # orig.
              Index.corner = c(1, 3),  # Less correlation
              Bestof = 3)
fit@misc$deviance  # A history of the fits
Coef(fit)
biplot(fit, chull = TRUE, scores = TRUE, clty = 2, Ccex = 2,
       ccol = "blue", scol = "orange", Ccol = "darkgreen",
       Clwd = 2, main = "1=Germany, 2=Japan, 3=Korea, 4=USA")
}

Run the code above in your browser using DataLab

Description

Usage

Value

Arguments

Author

Details

References

See Also

Examples