Fits a skew-symmetric association model proposed in van der Heijden & Mooijaart (1995) to describe asymmetry of square tables. Skew-symmetric association can be combined with quasi-symmetry (the default), quasi-independence, or symmetric (homogeneous) RC(M) associations.
hmskew(tab, nd.symm = NA, diagonal = FALSE,
weighting = c("marginal", "uniform", "none"),
rowsup = NULL, colsup = NULL,
se = c("none", "jackknife", "bootstrap"),
nreplicates = 100, ncpus = getOption("boot.ncpus"),
family = poisson, weights = NULL,
start = NULL, etastart = NULL, tolerance = 1e-8,
iterMax = 5000, trace = FALSE, verbose = TRUE, ...)
a square two-way table, or an object (such as a matrix) that can be coerced into a table; if present, dimensions above two will be collapsed.
the number of dimensions to include in the symmetric RC(M) association. Cannot exceed
2 * min(nrow(tab) - 1, ncol(tab) - 1)
(quasi-symmetry model).
If NA
(the default), a full quasi-symmetric association is used instead of a RC(M) model; if 0
,
quasi-independence is used.
should the model include parameters specific to each diagonal cell? This amounts to taking quasi-independence, rather than independence, as the baseline model.
what weights should be used when normalizing the scores.
if present, a matrix with the same columns as tab
and rows corresponding to the columns
of colsup
, giving supplementary (passive) rows.
if present, a matrix with the same rows as tab
and columns corresponding to the rows of
colsup
, giving supplementary (passive) columns.
which method to use to compute standard errors for parameters.
the number of bootstrap replicates, if enabled.
the number of processes to use for jackknife or bootstrap parallel computing. Defaults to
the number of cores (see detectCores
), with a maximum of 5, but falls back to 1
(no parallelization) if package parallel
is not available.
a specification of the error distribution and link function
to be used in the model. This can be a character string naming
a family function; a family function, or the result of a call
to a family function. See family
details of family functions.
an optional vector of weights to be used in the fitting process.
either NA
to use optimal starting values, NULL
to use
random starting values, or a vector of starting values for the parameters in the model.
starting values for the linear predictor; set to NULL
to use either default
starting values (if start = NA
), or random starting values (in all other cases).
a positive numeric value specifying the tolerance level for convergence; higher values will speed up the fitting process, but beware of numerical instability of estimated scores!
a positive integer specifying the maximum number of main iterations to perform; consider raising this value if your model does not converge.
a logical value indicating whether the deviance should be printed after each iteration.
a logical value indicating whether progress indicators should be printed, including a diagnostic error message if the algorithm restarts.
more arguments to be passed to gnm
A hmskew
object, which is a subclass of an rc.symm
object (see rc
) if
nd.symm
is strictly positive. In addition to this class, it contains a assoc.hmskew
component
holding information about the skew-symmetric association:
The intrisic association parameters, one per dimension.
Row scores, normalized so that their (weighted) sum is 0, their (weighted) sum of squares is 1, and their (weighted) cross-dimensional correlation is null.
Column scores, normalized so that their (weighted) sum is 0, their (weighted) sum of squares is 1, and their (weighted) cross-dimensional correlation is null.
The row weights used for the identification of scores, as specified by the
weighting
argument.
The column weights used for the identification of scores, as specified by the
weighting
argument.
The variance-covariance matrix for phi coefficients and normalized row and column
scores. Only present if se
was not “none”.
An array stacking on its third dimension one variance-covariance matrix for
the adjusted scores of each layer in the model (used for plotting). Only present if se
was not “none”.
The method used to compute the variance-covariance matrix (corresponding to the
se
argument.
The original model presented by van der Heijden & Mooijaart (1995), called “quasi-symmetry plus
skew-symmetry”, combines a skew-symmetric association with a quasi-symmetry baseline; it is the variant
fitted by default by this function. If nd.symm
is set to a positive integer value, though, variants
using a RC(M) model to describe the symmetric association are used, with our without
diagonal-specific parameters (depending on the value of the diagonal
argument).
These models follow the equation:
$$ log F_{ij} = q_{ij} + \phi (\nu_i \mu_j - \mu_i \nu_j) $$
where \(F_{ij}\) is the expected frequency for the cell at the intersection of row i and column j of
tab
, and \(q_{ij}\) a quasi-symmetric specification, with either full interaction parameters, or
a RC(M) association. See reference for detailed information about the degrees of freedom and the identification
constraints applied to the scores.
Another model presented in the paper, the “symmetry plus skew-symmetry model” is not currently supported
out of the box, but should be relatively straightforward to implement using the underlying assoc.hmskew
function combined with a symmetric association model.
Actual model fitting is performed using gnm
, which implements the Newton-Raphson algorithm.
This function simply ensures correct start values are used, in addition to allowing for identification
of scores even with several dimensions, computation of their jackknife or bootstrap standard errors, and plotting.
The default starting values for skew association parameters are computed using an eigen value decomposition from the
results of the model without skew association component (“base model”); if nd.symm
is not NA
and
strictly positive, random starting values are used. In some complex cases, using start = NULL
to start with
random values can be more efficient, but it is also less stable and can converge to non-optimal solutions.
van der Heijden, P.G.M., and Mooijaart, A. (1995). Some new log bilinear models for the analysis of asymmetry in a square contingency table. Sociol. Methods and Research 24, 7-29.
# NOT RUN {
## van der Heijden & Mooijaart (1995), Table 2c, p. 23
data(ocg1973)
# 5:1 is here to take "Farmers" as reference category (angle 0)
model <- hmskew(ocg1973[5:1, 5:1], weighting="uniform")
model
ass <- model$assoc.hmskew
# First column of the table
round(ass$row[,,1] * sqrt(ass$phi[1,1]), d=2)[5:1,]
# Right part of the table
round(ass$phi[1] * (ass$row[,2,1] %o% ass$row[,1,1] -
ass$row[,1,1] %o% ass$row[,2,1]), d=3)[5:1, 5:1]
# Plot
plot(model, coords="cartesian")
# }
Run the code above in your browser using DataLab