Fit a skew-symmetric association model proposed in Yamaguchi (1990) to describe asymmetry of square tables. This model can be combined with symmetric association models like a quasi-symmetry (the default) or symmetric (homogeneous) RC(M) models.
yrcskew(tab, nd.symm = NA, nd.skew = 1, diagonal = FALSE,
weighting = c("marginal", "uniform", "none"),
se = c("none", "jackknife", "bootstrap"),
nreplicates = 100, ncpus = getOption("boot.ncpus"),
family = poisson, weights = NULL,
start = NA, etastart = NULL, tolerance = 1e-8,
iterMax = 15000, trace = FALSE, verbose = TRUE, ...)
a two-way table, or an object (such as a matrix) that can be coerced into a table; if present, dimensions above two will be collapsed.
the number of dimensions to include in the symmetric RC(M) association. Cannot exceed
2 * min(nrow(tab) - 1, ncol(tab) - 1)
.
If NA
, a quasi-symmetric association is used instead of a RC(M) model.
the number of dimensions to include in the skew-symmetric RC(M) association.
should the model include parameters specific to each diagonal cell? This amounts to taking quasi-independence, rather than independence, as the baseline model.
what weights should be used when normalizing the scores.
which method to use to compute standard errors for parameters.
the number of bootstrap replicates, if enabled.
the number of processes to use for jackknife or bootstrap parallel computing. Defaults to
the number of cores (see detectCores
), with a maximum of 5, but falls back to 1
(no parallelization) if package parallel
is not available.
a specification of the error distribution and link function
to be used in the model. This can be a character string naming
a family function; a family function, or the result of a call
to a family function. See family
details of family functions.
an optional vector of weights to be used in the fitting process.
either NA
(the default) to use reasonable starting values, NULL
to use
random starting values, or a vector of starting values for the parameters in the model.
starting values for the linear predictor; set to NULL
to use either default
starting values (if start = NA
), or random starting values (in all other cases).
a positive numeric value specifying the tolerance level for convergence; higher values will speed up the fitting process, but beware of numerical instability of estimated scores!
a positive integer specifying the maximum number of main iterations to perform; consider raising this value if your model does not converge.
a logical value indicating whether the deviance should be printed after each iteration.
a logical value indicating whether progress indicators should be printed, including a diagnostic error message if the algorithm restarts.
more arguments to be passed to gnm
A yrcskew
object, which is a subclass of an rc.symm
object (seerc
) unless
nd.symm
is NA
. In addition to this class, it contains a assoc.yrcskew
component
holding information about the skew-symmetric association:
The intrisic association parameters, one per dimension.
Row scores, normalized so that their (weighted) sum is 0, their (weighted) sum of squares is 1, and their (weighted) cross-dimensional correlation is null.
Column scores, normalized so that their (weighted) sum is 0, their (weighted) sum of squares is 1, and their (weighted) cross-dimensional correlation is null.
The row weights used for the identification of scores, as specified by the
weighting
argument.
The column weights used for the identification of scores, as specified by the
weighting
argument.
The variance-covariance matrix for phi coefficients and normalized row and column
scores. Only present if se
was not “none”.
An array stacking on its third dimension one variance-covariance matrix for
the adjusted scores of each layer in the model (used for plotting). Only present if se
was not “none”.
The method used to compute the variance-covariance matrix (corresponding to the
se
argument.
This family of model sometimes converges to a non-optimal solution, in which case the reported scores are wrong. To protect yourself from this problem, you are advised to run the models several times to find out which convergence point is the true one. Furthermore, when model converges slowly, restarting the fitting procedure may produce much better random starting values.
The original presented by Yamaguchi (1990), called “row-column-effect skew-symmetric
association (logbilinear) model with full quasi-symmetry (QS+RC_SK)”, combines a skew-symmetric association
with a quasi-symmetry baseline; it is the variant fitted by default by this function. If nd.symm
is
set to a positive integer value, though, variants using a RC(M) model to describe the symmetric association
are used, with our without diagonal-specific parameters (depending on the value of the diagonal
argument);
among them is the HM_RC+RC_SK variant, when nd.symm
is 1.
These models follow the equation:
$$ log F_{ij} = q_{ij} + \delta_{i<j} \nu_i (\nu_j - \nu_i) - \delta_{i>j} \nu_j (\nu_i - \nu_j) $$
where \(F_{ij}\) is the expected frequency for the cell at the intersection of row i and column j of
tab
, and \(q_{ij}\) a quasi-symmetric or a RC(M) association. See reference for detailed information
about the degrees of freedom and the identification constraints applied to the scores.
Please note that contrary to other association models, this model is sensitive to reorderings of rows and columns. You have to take care of passing a table whose categories follow a hierarchical order with a substantive meaning.
Another model presented in the paper, the homogeneous symmetric and skew-symmetric associations models (HM_(S+SK)) is not currently supported.
Actual model fitting is performed using gnm
, which implements the Newton-Raphson algorithm.
This function simply ensures correct start values are used, in addition to allowing for identification
of scores even with several dimensions, computation of their jackknife or bootstrap standard errors, and plotting.
The default starting values for main parameters are taken from the model without association parameters
(“base model”); association parameters start with random starting values. In some complex
cases, using start = NULL
to get completely random starting values can be more efficient, but it is also
less stable and can converge to non-optimal solutions.
Yamaguchi, K. (1990). Some Models for the Analysis of Asymmetric Association in Square Contingency Tables with Ordered Categories. Sociol. Methodology 20, 181-212.
# NOT RUN {
## Yamaguchi (1990), Table 5, p. 202, and Table 6B, p. 205
data(ocg1973)
# Simple symmetric RC(1) model ("Null skew-symmetry")
rc.model <- rc(ocg1973, diagonal=TRUE, symmetric=TRUE, weighting="none")
# Reported phi is slightly different, coefficients agree
rc.model
# Note model does not always converge, several attempts may be needed
# Here we set known starting values to be sure it works
set.seed(5)
model <- yrcskew(ocg1973, nd.symm=1, nd.skew=1, diagonal=TRUE, weighting="none")
# We do not get the same results as the author, but the smaller deviance
# indicates a better fit in our version (!)
model
# }
Run the code above in your browser using DataLab