grc: Row-Column Association Models including Goodman's RC Association Model

Description

Fits a Goodman's RC association model to a matrix of counts, and more generally, a sub-class of row-column association models.

Usage

grc(y, Rank = 1, Index.corner = 2:(1 + Rank),
    szero = 1, summary.arg = FALSE, h.step = 1e-04, ...)
rcam(y, family = poissonff, Rank = 0, Musual = NULL,
     weights = NULL, which.lp = 1,
     Index.corner = if (!Rank) NULL else 1 + Musual * (1:Rank),
     rprefix = "Row.", cprefix = "Col.", offset = 0,
     szero = if (!Rank) NULL else {
       if (Musual == 1) 1 else setdiff(1:(Musual * ncol(y)),
                    c(1 + (1:ncol(y)) * Musual, Index.corner))
     },
     summary.arg = FALSE, h.step = 0.0001,
     rbaseline = 1, cbaseline = 1, ...)

Arguments

For grc a matrix of counts. For rcam a general matrix response depending on family. Output from table() is acceptable; it is converted into a matrix. Note that y should be at least 3 b

family

A VGAM family function. By default, the first linear/additive predictor is fitted using main effects plus an optional rank-Rank interaction term. Not all family functions are suitable or make sense. All other linear/addit

Rank

An integer from the set {0,...,min(nrow(y), ncol(y))}. This is the dimension of the fit in terms of the interaction. For grc() this argument must be positive. A value of 0 means no interactions (i.e., main effects only);

weights

Prior weights. Fed into rrvglm or vglm.

which.lp

Single integer. Specifies which linear predictor is modelled as the sum of an intercept, row effect, column effect plus an optional interaction term. It should be one value from the set 1:Musual.

Index.corner

A vector of Rank integers. These are used to store the Rank by Rank identity matrix in the A matrix; corner constraints are used.

rprefix, cprefix

Character, for rows and columns resp. For labelling the indicator variables.

offset

Numeric. Either a matrix of the right dimension, else a single numeric expanded into such a matrix.

szero

An integer from the set {1,...,min(nrow(y), ncol(y))}, specifying the row that is used as the structural zero.

summary.arg

Logical. If TRUE, a summary is returned. If TRUE, y may be the output (fitted object) of grc().

h.step

A small positive value that is passed into summary.rrvglm(). Only used when summary.arg = TRUE.

...

Arguments that are passed into rrvglm.control().

Musual

The number of linear predictors of the VGAM family function for an ordinary (univariate) response. Then the number of linear predictors of the rcam() fit is usually the number of columns of y multipl

rbaseline, cbaseline

Baseline reference levels for the rows and columns. Currently stored on the object but not used.

Value

An object of class "grc", which currently is the same as an "rrvglm" object. Currently, a rank-0 rcam() object is of class rcam0-class, else of class "rcam" (this may change in the future).

Warning

The function rcam() is experimental at this stage and may have bugs. Quite a lot of expertise is needed when fitting and in its interpretion thereof. For example, the constraint matrices applies the reduced-rank regression to the first (see which.lp) linear predictor and the other linear predictors are intercept-only and have a common value throughout the entire data set. This means that, by default, family = zipoissonff is appropriate but not family = zipoisson. Else set family = zipoisson and which.lp = 2. To understand what is going on, do examine the constraint matrices of the fitted object, and reconcile this with Equations (4.3) to (4.5) of Yee and Hastie (2003).

The functions temporarily create a permanent data frame called .grc.df or .rcam.df, which used to be needed by summary.rrvglm(). Then these data frames are deleted before exiting the function. If an error occurs, then the data frames may be present in the workspace.

Details

Goodman's RC association model fits a reduced-rank approximation to a table of counts. The log of each cell mean is decomposed as an intercept plus a row effect plus a column effect plus a reduced-rank component. The latter can be collectively written A %*% t(C), the product of two `thin' matrices. Indeed, A and C have Rank columns. By default, the first column and row of the interaction matrix A %*% t(C) is chosen to be structural zeros, because szero = 1. This means the first row of A are all zeros.

This function uses options()$contrasts to set up the row and column indicator variables. In particular, Equation (4.5) of Yee and Hastie (2003) is used. These are called Row. and Col. (by default) followed by the row or column number.

The function rcam() is more general than grc(). Its default is a no-interaction model of grc(), i.e., rank-0 and a Poisson distribution. This means that each row and column has a dummy variable associated with it. The first row and column is baseline. The power of rcam() is that many VGAM family functions can be assigned to its family argument. For example, normal1 fits something in between a 2-way ANOVA with and without interactions, alaplace2 with Rank = 0 is something like medpolish. Others include zipoissonff, negbinomial. Hopefully one day all VGAM family functions will work when assigned to the family argument, although the result may not have meaning.

References

Yee, T. W. and Hastie, T. J. (2003) Reduced-rank vector generalized linear models. Statistical Modelling, 3, 15--41.

Yee, T. W. and Hadi, A. F. (2012) Row-column association models In preparation.

Goodman, L. A. (1981) Association models and canonical correlation in the analysis of cross-classifications having ordered categories. Journal of the American Statistical Association, 76, 320--334.

Documentation accompanying the VGAM package at http://www.stat.auckland.ac.nz/~yee contains further information about the setting up of the indicator variables.

Examples

Run this code

grc1 <- grc(auuc) # Undergraduate enrolments at Auckland University in 1990
fitted(grc1)
summary(grc1)

grc2 <- grc(auuc, Rank = 2, Index.corner = c(2, 5))
fitted(grc2)
summary(grc2)


# 2008 Summer Olympic Games in Beijing
top10 <- head(olympic, n = 10)
oly1 <- with(top10, grc(cbind(gold, silver, bronze)))
round(fitted(oly1))
round(resid(oly1, type = "response"), dig = 1) # Response residuals
summary(oly1)
Coef(oly1)


# Roughly median polish
rcam0 <- rcam(auuc, fam = alaplace2(tau = 0.5, intparloc = TRUE), trace = TRUE)
round(fitted(rcam0), dig = 0)
round(100 * (fitted(rcam0) - auuc) / auuc, dig = 0) # Discrepancy
rcam0@y
round(coef(rcam0, matrix = TRUE), dig = 2)
print(Coef(rcam0, matrix = TRUE), dig = 3)
# constraints(rcam0)
names(constraints(rcam0))

# Compare with medpolish():
(med.a <- medpolish(auuc))
fv <- med.a$overall + outer(med.a$row, med.a$col, "+")
round(100 * (fitted(rcam0) - fv) / fv) # Hopefully should be all 0s

Run the code above in your browser using DataLab