Fits a Goodman's RC association model (GRC) to a matrix of counts, and more generally, row-column interaction models (RCIMs). RCIMs allow for unconstrained quadratic ordination (UQO).
grc(y, Rank = 1, Index.corner = 2:(1 + Rank),
str0 = 1, summary.arg = FALSE, h.step = 1e-04, ...)
rcim(y, family = poissonff, Rank = 0, M1 = NULL,
weights = NULL, which.linpred = 1,
Index.corner = ifelse(is.null(str0), 0, max(str0)) + 1:Rank,
rprefix = "Row.", cprefix = "Col.", iprefix = "X2.",
offset = 0, str0 = if (Rank) 1 else NULL,
summary.arg = FALSE, h.step = 0.0001,
rbaseline = 1, cbaseline = 1,
has.intercept = TRUE, M = NULL,
rindex = 2:nrow(y), cindex = 2:ncol(y), iindex = 2:nrow(y), ...)
For grc()
: a matrix of counts.
For rcim()
: a general matrix response depending on family
.
Output from table()
is acceptable; it is converted into a matrix.
Note that y
should be at least 3 by 3 in dimension.
A VGAM family function.
By default, the first linear/additive predictor
is fitted
using main effects plus an optional rank-Rank
interaction term.
Not all family functions are suitable or make sense.
All other linear/additive predictors are fitted using an intercept-only,
so it has a common value over all rows and columns.
For example,
zipoissonff
may be suitable for counts but not
zipoisson
because of the ordering of the
linear/additive predictors.
If the VGAM family function does not have an infos
slot then M1
needs to be inputted (the number of
linear predictors for an ordinary (usually univariate) response,
aka
An integer from the set
{0,…,min(nrow(y), ncol(y))
}.
This is the dimension of the fit in terms of the interaction.
For grc()
this argument must be positive.
A value of 0 means no interactions (i.e., main effects only);
each row and column is represented by an indicator variable.
Single integer.
Specifies which linear predictor is modelled as the sum of an
intercept, row effect, column effect plus an optional interaction
term. It should be one value from the set 1:M1
.
A vector of Rank
integers.
These are used to store the Rank
by Rank
identity matrix in the
A
matrix; corner constraints are used.
Character, for rows and columns and interactions respectively. For labelling the indicator variables.
Numeric. Either a matrix of the right dimension, else a single numeric expanded into such a matrix.
Ignored if Rank = 0
, else
an integer from the set {1,…,min(nrow(y), ncol(y))
},
specifying the row that is used as the structural zero.
Passed into rrvglm.control
if Rank > 0
.
Set str0 = NULL
for none.
Logical. If TRUE
then a summary is returned.
If TRUE
then y
may be the output (fitted
object) of grc()
.
A small positive value that is passed into
summary.rrvglm()
. Only used when summary.arg = TRUE
.
Arguments that are passed
into rrvglm.control()
.
The number of linear predictors of the VGAM family
function
for an ordinary (univariate) response.
Then the number of linear predictors of the rcim()
fit is
usually the number of columns of y
multiplied by M1
.
The default is to evaluate the infos
slot of the
VGAM family
function to try to evaluate it;
see vglmff-class
.
If this information is not yet supplied by the family function then
the value needs to be inputted manually using this argument.
Baseline reference levels for the rows and columns. Currently stored on the object but not used.
Logical. Include an intercept?
cindex
means column index, and these point to the columns
of y
which are part of the vector of linear/additive predictor
main effects.
For family = multinomial
it is necessary to input these arguments
as M = ncol(y)-1
and
cindex = 2:(ncol(y)-1)
.
rindex
means row index, and these are similar to cindex
.
iindex
means interaction index, and these are similar to cindex
.
An object of class "grc"
, which currently is the same as
an "rrvglm"
object.
Currently,
a rank-0 rcim()
object is of class rcim0-class
,
else of class "rcim"
(this may change in the future).
The function rcim()
is experimental at this stage and
may have bugs.
Quite a lot of expertise is needed when fitting and in its
interpretion thereof. For example, the constraint
matrices applies the reduced-rank regression to the first
(see which.linpred
)
linear predictor and the other linear predictors are intercept-only and
have a common value throughout the entire data set.
This means that, by default, family =
zipoissonff
is
appropriate but not
family =
zipoisson
.
Else set family =
zipoisson
and which.linpred = 2
.
To understand what is going on, do examine the constraint
matrices of the fitted object, and reconcile this with
Equations (4.3) to (4.5) of Yee and Hastie (2003).
The functions temporarily create a permanent data frame
called .grc.df
or .rcim.df
, which used
to be needed by summary.rrvglm()
. Then these
data frames are deleted before exiting the function.
If an error occurs then the data frames may be present
in the workspace.
Goodman's RC association model fits a reduced-rank approximation
to a table of counts.
A Poisson model is assumed.
The log of each cell mean is decomposed as an
intercept plus a row effect plus a column effect plus a reduced-rank
component. The latter can be collectively written A %*% t(C)
,
the product of two `thin' matrices.
Indeed, A
and C
have Rank
columns.
By default, the first column and row of the interaction matrix
A %*% t(C)
is chosen
to be structural zeros, because str0 = 1
.
This means the first row of A
are all zeros.
This function uses options()$contrasts
to set up the row and
column indicator variables.
In particular, Equation (4.5) of Yee and Hastie (2003) is used.
These are called Row.
and Col.
(by default) followed
by the row or column number.
The function rcim()
is more general than grc()
.
Its default is a no-interaction model of grc()
, i.e.,
rank-0 and a Poisson distribution. This means that each
row and column has a dummy variable associated with it.
The first row and first column are baseline.
The power of rcim()
is that many VGAM family functions
can be assigned to its family
argument.
For example,
uninormal
fits something in between a 2-way
ANOVA with and without interactions,
alaplace2
with Rank = 0
is something like
medpolish
.
Others include
zipoissonff
and
negbinomial
.
Hopefully one day all VGAM family functions will
work when assigned to the family
argument, although the
result may not have meaning.
Unconstrained quadratic ordination (UQO) can be performed
using rcim()
and grc()
.
This has been called unconstrained Gaussian ordination
in the literature, however the word Gaussian has two
meanings which is confusing; it is better to use
quadratic because the bell-shape response surface is meant.
UQO is similar to CQO (cqo
) except there are
no environmental/explanatory variables.
Here, a GLM is fitted to each column (species)
that is a quadratic function of hypothetical latent variables
or gradients.
Thus each row of the response has an associated site score,
and each column of the response has an associated optimum
and tolerance matrix.
UQO can be performed here under the assumption that all species
have the same tolerance matrices.
See Yee and Hadi (2014) for details.
It is not recommended that presence/absence data be inputted
because the information content is so low for each site-species
cell.
The example below uses Poisson counts.
Yee, T. W. and Hastie, T. J. (2003) Reduced-rank vector generalized linear models. Statistical Modelling, 3, 15--41.
Yee, T. W. and Hadi, A. F. (2014) Row-column interaction models, with an R implementation. Computational Statistics, 29, 1427--1445.
Goodman, L. A. (1981) Association models and canonical correlation in the analysis of cross-classifications having ordered categories. Journal of the American Statistical Association, 76, 320--334.
rrvglm
,
rrvglm.control
,
rrvglm-class
,
summary.grc
,
moffset
,
Rcim
,
Select
,
Qvar
,
plotrcim0
,
cqo
,
multinomial
,
alcoff
,
crashi
,
auuc
,
olym08
,
olym12
,
poissonff
,
medpolish
.
# NOT RUN {
# Example 1: Undergraduate enrolments at Auckland University in 1990
fitted(grc1 <- grc(auuc))
summary(grc1)
grc2 <- grc(auuc, Rank = 2, Index.corner = c(2, 5))
fitted(grc2)
summary(grc2)
model3 <- rcim(auuc, Rank = 1, fam = multinomial,
M = ncol(auuc)-1, cindex = 2:(ncol(auuc)-1), trace = TRUE)
fitted(model3)
summary(model3)
# Median polish but not 100 percent reliable. Maybe call alaplace2()...
# }
# NOT RUN {
rcim0 <- rcim(auuc, fam = alaplace1(tau = 0.5), trace = FALSE, maxit = 500)
round(fitted(rcim0), digits = 0)
round(100 * (fitted(rcim0) - auuc) / auuc, digits = 0) # Discrepancy
depvar(rcim0)
round(coef(rcim0, matrix = TRUE), digits = 2)
Coef(rcim0, matrix = TRUE)
# constraints(rcim0)
names(constraints(rcim0))
# Compare with medpolish():
(med.a <- medpolish(auuc))
fv <- med.a$overall + outer(med.a$row, med.a$col, "+")
round(100 * (fitted(rcim0) - fv) / fv) # Hopefully should be all 0s
# }
# NOT RUN {
# Example 2: 2012 Summer Olympic Games in London
# }
# NOT RUN {
top10 <- head(olym12, 10)
grc1.oly12 <- with(top10, grc(cbind(gold, silver, bronze)))
round(fitted(grc1.oly12))
round(resid(grc1.oly12, type = "response"), digits = 1) # Response residuals
summary(grc1.oly12)
Coef(grc1.oly12)
# }
# NOT RUN {
# Example 3: Unconstrained quadratic ordination (UQO); see Yee and Hadi (2014)
# }
# NOT RUN {
n <- 100; p <- 5; S <- 10
pdata <- rcqo(n, p, S, es.opt = FALSE, eq.max = FALSE,
eq.tol = TRUE, sd.latvar = 0.75) # Poisson counts
true.nu <- attr(pdata, "latvar") # The 'truth'; site scores
attr(pdata, "tolerances") # The 'truth'; tolerances
Y <- Select(pdata, "y", sort = FALSE) # Y matrix (n x S); the "y" vars
uqo.rcim1 <- rcim(Y, Rank = 1,
str0 = NULL, # Delta covers entire n x M matrix
iindex = 1:nrow(Y), # RRR covers the entire Y
has.intercept = FALSE) # Suppress the intercept
# Plot 1
par(mfrow = c(2, 2))
plot(attr(pdata, "optimums"), Coef(uqo.rcim1)@A,
col = "blue", type = "p", main = "(a) UQO optimums",
xlab = "True optimums", ylab = "Estimated (UQO) optimums")
mylm <- lm(Coef(uqo.rcim1)@A ~ attr(pdata, "optimums"))
abline(coef = coef(mylm), col = "orange", lty = "dashed")
# Plot 2
fill.val <- NULL # Choose this for the new parameterization
plot(attr(pdata, "latvar"), c(fill.val, concoef(uqo.rcim1)),
las = 1, col = "blue", type = "p", main = "(b) UQO site scores",
xlab = "True site scores", ylab = "Estimated (UQO) site scores" )
mylm <- lm(c(fill.val, concoef(uqo.rcim1)) ~ attr(pdata, "latvar"))
abline(coef = coef(mylm), col = "orange", lty = "dashed")
# Plots 3 and 4
myform <- attr(pdata, "formula")
p1ut <- cqo(myform, family = poissonff,
eq.tol = FALSE, trace = FALSE, data = pdata)
c1ut <- cqo(Select(pdata, "y", sort = FALSE) ~ scale(latvar(uqo.rcim1)),
family = poissonff, eq.tol = FALSE, trace = FALSE, data = pdata)
lvplot(p1ut, lcol = 1:S, y = TRUE, pcol = 1:S, pch = 1:S, pcex = 0.5,
main = "(c) CQO fitted to the original data",
xlab = "Estimated (CQO) site scores")
lvplot(c1ut, lcol = 1:S, y = TRUE, pcol = 1:S, pch = 1:S, pcex = 0.5,
main = "(d) CQO fitted to the scaled UQO site scores",
xlab = "Estimated (UQO) site scores")
# }
Run the code above in your browser using DataLab