Compute and test the share of discrepancy (defined from a dissimilarity matrix) explained by a categorical variable.
dissassoc(diss, group, weights=NULL, R=1000,
weight.permutation="replicate", squared=FALSE)
An object of class dissassoc
with the
following components:
A data frame with the number of cases and the discrepancy of each group
The pseudo ANOVA table
The value of the statistics (Pseudo F, Pseudo Fbf, Pseudo R2, Bartlett, and Levene) and their p-values
The permutation object, containing the values computed for each permutation
A dissimilarity matrix or a dist object (see dist
)
A categorical variable. For a numerical variable use dissmfacw
.
optional numerical vector containing weights.
Number of permutations for computing the p-value. If equal to 1, no permutation test is performed.
Weighted permutation method: "diss"
(attach weights to the dissimilarity matrix), "replicate"
(replicate case using weights
), "rounded-replicate"
(replicate case using rounded weights
), "random-sampling"
(random assignment of covariate profiles to the objects using distributions defined by the weights.)
Logical. If TRUE
the dissimilarities diss
are squared.
Matthias Studer (with Gilbert Ritschard for the help page)
The dissassoc
function assesses the association
between objects characterized by their dissimilarity matrix and a
discrete covariate. It provides a generalization of the ANOVA
principle to any kind of distance metric. The function returns a pseudo F statistic,
a pseudo Brown-Forsythe Fbf statistic, and
a pseudo R-square that can be interpreted as a usual R-square. The
statistical significance of the association is computed by means of
permutation tests. The function performs also a test of discrepancy
homogeneity (equality of within variances) using a generalization of
the Levene statistic and the Bartlett statistic.
There are
print
and hist
methods (the latter producing an
histogram of the permuted values used for testing the significance).
If a numeric group
variable is provided, it will be treated as categorical, i.e., each different value will be considered as a different category. To measure the `linear' effect of a numerical variable, use dissmfacw
.
Studer, M., G. Ritschard, A. Gabadinho and N. S. Müller (2011). Discrepancy analysis of state sequences, Sociological Methods and Research, Vol. 40(3), 471-510, tools:::Rd_expr_doi("10.1177/0049124111415372").
Studer, M., G. Ritschard, A. Gabadinho and N. S. Müller (2010) Discrepancy analysis of complex objects using dissimilarities. In F. Guillet, G. Ritschard, H. Briand, and D. A. Zighed (Eds.), Advances in Knowledge Discovery and Management, Studies in Computational Intelligence, Volume 292, pp. 3-19. Berlin: Springer.
Studer, M., G. Ritschard, A. Gabadinho and N. S. Müller (2009). Analyse de dissimilarités par arbre d'induction. In EGC 2009, Revue des Nouvelles Technologies de l'Information, Vol. E-15, pp. 7--18.
Anderson, M. J. (2001) A new method for non-parametric multivariate analysis of variance. Austral Ecology 26, 32--46.
Batagelj, V. (1988) Generalized Ward and related clustering problems. In H. Bock (Ed.), Classification and related methods of data analysis, Amsterdam: North-Holland, pp. 67--74.
dissvar
to compute the pseudo variance from dissimilarities and for a basic introduction to concepts of
pseudo variance analysis.
disstree
for an induction tree analyse of objects characterized by a dissimilarity matrix.
disscenter
to compute the distance of each object to its group center from pairwise dissimilarities.
dissmfacw
to perform multi-factor analysis of variance from pairwise dissimilarities.
## Defining a state sequence object
data(mvad)
mvad.seq <- seqdef(mvad[, 17:86])
## Building dissimilarities (any dissimilarity measure can be used)
mvad.ham <- seqdist(mvad.seq, method="HAM")
## R=1 implies no permutation test
da <- dissassoc(mvad.ham, group=mvad$gcse5eq, R=10)
print(da)
hist(da)
Run the code above in your browser using DataLab