hmskewL: Fitting van der Heijen & Mooijaart Skew-Symmetric Association Model With Layer Effect

Description

Fits an extension of the skew-symmetric association model proposed in van der Heijden & Mooijaart (1995) to describe asymmetry of square tables. This model introduces a layer effect by which the strength of skew-symmetric association, and optionnally scores, can vary over the levels of the third dimension of the table. Skew-symmetric association can be combined with quasi-symmetry (the default), quasi-independence, or symmetric (homogeneous) RC(M) associations, with or without layer effect.

Usage

hmskewL(tab, nd.symm = NA,
        layer.effect.skew = c("homogeneous.scores", "heterogeneous",
                              "none"),
        layer.effect.symm = c("heterogeneous", "uniform",
                              "regression.type",
                              "homogeneous.scores", "none"),
        diagonal = c("none", "heterogeneous", "homogeneous"),
        weighting = c("marginal", "uniform", "none"),
        se = c("none", "jackknife", "bootstrap"),
        nreplicates = 100, ncpus = getOption("boot.ncpus"),
        family = poisson, weights = NULL,
        start = NULL, etastart = NULL, tolerance = 1e-8,
        iterMax = 5000, trace = FALSE, verbose = TRUE, ...)

Value

A hmskewL object, which is a subclass of an rcL.symm object (see rcL) if

nd.symm is strictly positive. In addition to this class, it contains a assoc.hmskew component holding information about the skew-symmetric association:

phi: The intrisic association parameters, one per dimension and per layer.
row: Row scores, normalized so that their (weighted) sum is 0, their (weighted) sum of squares is 1, and their (weighted) cross-dimensional correlation is null.
col: Column scores, normalized so that their (weighted) sum is 0, their (weighted) sum of squares is 1, and their (weighted) cross-dimensional correlation is null.
weighting: The name of the weighting method used, reflected by row.weights and col.weights.
row.weights: The row weights used for the identification of scores, as specified by the weighting argument.
col.weights: The column weights used for the identification of scores, as specified by the weighting argument.
covmat: The variance-covariance matrix for phi coefficients and normalized row and column scores. Only present if se was not “none”.
adj.covmats: An array stacking on its third dimension one variance-covariance matrix for the adjusted scores of each layer in the model (used for plotting). Only present if se was not “none”.
covtype: The method used to compute the variance-covariance matrix (corresponding to the se argument.

Arguments

tab: a three-way table, or an object (such as a matrix) that can be coerced into a table; if present, dimensions above three will be collapsed. First two dimensions must be symmetric (i.e. of the same length).
nd.symm: the number of dimensions to include in the symmetric RC(M) association. Cannot exceed 2 * min(nrow(tab) - 1, ncol(tab) - 1) (quasi-symmetry model). If NA (the default), a full quasi-symmetric association is used instead of a RC(M) model; if 0, quasi-independence is used.
layer.effect.skew: determines the form of the interaction between skew-symmetric association and layers. See “Details” below.
layer.effect.symm: determines the form of the interaction between symmetric row-column association, or quasi-symmetric association (if nd.symm = NA) and layers. See “Details” below.
diagonal: what type of diagonal-specific parameters to include in the model, if any. Only makes sense when nd.symm is not NA (else, diagonal parameters are already included).
weighting: what weights should be used when normalizing the scores.
se: which method to use to compute standard errors for parameters.
nreplicates: the number of bootstrap replicates, if enabled.
ncpus: the number of processes to use for jackknife or bootstrap parallel computing. Defaults to the number of cores (see detectCores), with a maximum of 5, but falls back to 1 (no parallelization) if package parallel is not available.
family: a specification of the error distribution and link function to be used in the model. This can be a character string naming a family function; a family function, or the result of a call to a family function. See family details of family functions.
weights: an optional vector of weights to be used in the fitting process.
start: either NA to use optimal starting values, NULL to use random starting values, or a vector of starting values for the parameters in the model.
etastart: starting values for the linear predictor; set to NULL to use either default starting values (if start = NA), or random starting values (in all other cases).
tolerance: a positive numeric value specifying the tolerance level for convergence; higher values will speed up the fitting process, but beware of numerical instability of estimated scores!
iterMax: a positive integer specifying the maximum number of main iterations to perform; consider raising this value if your model does not converge.
trace: a logical value indicating whether the deviance should be printed after each iteration.
verbose: a logical value indicating whether progress indicators should be printed, including a diagnostic error message if the algorithm restarts.
...: more arguments to be passed to gnm

Author

Milan Bouchet-Valat

Details

This model follows an equation inspired from that presented by van der Heijden & Mooijaart (1995) for two-way tables (see hmskew): $$ log F_{ijk} = q_{ijk} + \phi_k (\nu_{ik} \mu_{jk} - \mu_{ik} \nu_{jk}) $$ where $F_{ijk}$ is the expected frequency for the cell at the intersection of row i, column j and layer k of tab, and $q_{ij}$ a quasi-symmetric specification, with either full interaction parameters, or a RC(M) association. See reference for detailed information about the degrees of freedom and the identification constraints applied to the scores.

If layer.effect.skew is set to ‘heterogeneous’, different scores will be computed for each level, which is equivalent to fitting separate models using hmskew on the k two-way tables. If it is set to ‘homogeneous.scores’, then $\mu_{ik} = \mu_i$ and $\nu_{ik} = \nu_i$ for all layers k: only the $\phi_k$ are allowed to vary across layers. If it is set to ‘none’, then in addition to the previous conditions all $\phi_{mk}$ are forced to be equal for all layers k, which amounts to a stability of the association across layers.

When nd.symm is different from NA, the symmetric association works exactly like a call to rcL, with parameters nd.symm and layer.effect.symm translated respectively to nd and layer.effect. When nd.symm == NA, symmetric association parameters are either stable across layers, are multiplied by a layer coefficient (UNIDIFF model, see unidiff), follow a regression-type (Goodman-Hout) specification, or are different for each layer, when layer.effect.symm is respectively none, uniform, regression.type and heterogeneous.

Actual model fitting is performed using gnm, which implements the Newton-Raphson algorithm. This function simply ensures correct start values are used, in addition to allowing for identification of scores even with several dimensions, computation of their jackknife or bootstrap standard errors, and plotting. The default starting values for skew association parameters are computed using an eigen value decomposition from the results of the model without skew association component (“base model”); if nd.symm is not NA and strictly positive, random starting values are used. In some complex cases, using start = NULL to start with random values can be more efficient, but it is also less stable and can converge to non-optimal solutions.

References

van der Heijden, P.G.M., and Mooijaart, A. (1995). Some new log bilinear models for the analysis of asymmetry in a square contingency table. Sociol. Methods and Research 24, 7-29.