seqimplic: Position wise group-typical states

Description

Visualization and identification of the states that best characterize a group of sequences versus the others at each position (time point). The typical states are identified at each position as those for which we have a high implication strength to be in when belonging to the group.

Usage

seqimplic(seqdata, group, with.missing = FALSE, weighted = TRUE, na.rm = TRUE)
# S3 method for seqimplic
plot(x, main = NULL, ylim = NULL, xaxis = TRUE,
    ylab = "Implication", yaxis = TRUE, axes = "all", xtlab = NULL,
    xtstep = NULL, tick.last = NULL, cex.axis = 1, with.legend = "auto",
    ltext = NULL, cex.legend = 1, legend.prop = NA, rows = NA, cols = NA,
    conf.level = 0.95, lwd = 1, only.levels = NULL, ...)
# S3 method for seqimplic
print(x, xtstep = NULL, tick.last = NULL, round = NULL,
    conf.level = NULL, na.print = "", ...)

Value

seqimplic returns a "seqimplic" object that can be plotted and printed. The values of the implicative statistics at each time point are in the element indices of the object.

Arguments

seqdata: a state sequence object (see seqdef).
group: a factor giving the group membership of each sequence in seqdata.
with.missing: Logical. If FALSE (default), missing values are discarded. If TRUE, missing values are coded as a specific state.
weighted: Logical. If TRUE (default), the implicative strength of the rules are computed using the weights assigned to the state sequence object (see seqdef). Set as FALSE to ignore the weights.
na.rm: Logical. If TRUE (default), observations with missing values on the group variable are discarded. If FALSE, the missing group value defines a specific group.
x: A sequence of typical state object as generated by seqimplic.
xtstep: Integer. Optional interval at which the tick-marks and labels of the x-axis are displayed. For example, with xtstep=3 a tick-mark is drawn at position 1, 4, 7, etc... The display of the corresponding labels depends on the available space and is dealt with automatically. If unspecified, the xtstep attribute of the x object is used.
tick.last: Logical. Should a tick mark be enforced at the last position on the x-axis? If unspecified, the tick.last attribute of the x object is used.
main: title for the graphic. Default is NULL.
ylim: the y limits of the plot.
xaxis: Logical. Should the x-axis (time) be plotted?.
ylab: Optional label for the y-axis. If set as NA, no label is drawn.
yaxis: Logical. Should the y axis be plotted?. When set as TRUE, sequence indexes are displayed.
axes: If set as "all" (default value) x-axes are drawn for each plot in the graphic. If set as "bottom", axes are drawn only under the plots located at the bottom of the graphic area. If FALSE, no x-axis is drawn.
xtlab: optional labels for the x-axis ticks labels. If unspecified, the column names of the seqdata sequence object are used (see seqdef).
cex.axis: expansion factor for setting the size of the font for the axis labels and names. The default value is 1. Values lesser than 1 will reduce the size of the font, values greater than 1 will increase the size.
with.legend: One of "auto" (default), "right" or FALSE. Defines if and where the legend of the state colors is plotted. With "auto" sets the position of the legend is set automatically. The obsolete value TRUE is equivalent to "auto".
ltext: optional description of the states to appear in the legend. Must be a vector of character strings with number of elements equal to the size of the alphabet. If unspecified, the label attribute of the seqdata sequence object is used (see seqdef).
cex.legend: expansion factor for setting the size of the font for the labels in the legend. The default value is 1. Values smaller than 1 reduce the size of the font, values greater than 1 increase the size.
legend.prop: Proportion (between 0 and 1) of the graphic area used for plotting the legend when use.layout=TRUE and withlegend=TRUE. The default value is set according to the place (bottom or right of the graphic area) where the legend is plotted.
rows,cols: optional arguments to arrange plots when use.layout=TRUE.
lwd: The line width, a positive number. See lines
only.levels: Optional list of levels of the group variable to be plotted. By default all levels are plotted.
round: Optional number of decimals when printing a seqimplic object.
conf.level: Confidence levels thresholds (default is 0.95).
na.print: Character string (or NULL) used for NA values in printed output, see print.default.
...: further arguments passed to print.default (for print method) or lines (for plot method).

Author

Matthias Studer.

Details

The seqimplic function builds an object with the position wise typical states. It can be used to visualize or identify the differences between each group of trajectories and the other ones. It presents at each time point the typical states of a subpopulation (for instance women, as opposed to men). A state at a given time point is considered to be typical of a group if the rule "Being in this group implies to be in that state at this time point" is relevant according to the implicative statistic.

The implicative statistic assesses the statistical relevance of a rule of the form "A implies B" (Gras et al., 2008). It does so by measuring the gap between the expected and observed numbers of counter examples. The rule is considered to be strongly implicative if we observe much less counter examples than expected under the independence assumption. This gap and its significance are computed using adjusted residuals of a contingency table with continuity correction as proposed by Ritschard (2005). In order to improve the readability of the graphs, we use here the opposite of the implicative statistic, which is highly negative for significant rules. The statistic $I(A\rightarrow B)$ measuring the relevance of the rule "A implies B" reads as follows:

$$I(A\rightarrow B)=-\frac{n_{\bar{B}A}+0.5-n^e_{\bar{B}A}}{\sqrt{n^{e}_{\bar{B}A}(n_{B.}/n)(1-n_{.A}/n)}}$$ Where $n_{\bar{B}A}$ is the observed number of counter-examples, $n^{e}_{\bar{B}A}$ the expected number of counter-examples under the independence assumption, $n_{B.}$ the number of times that B is observed, $n_{.A}$ the number of times that A is observed and $n$ the total number of cases.

The plot function can be used to visualize the results. It produces a separate plot for each level of the group variable. In each plot, it presents at each time point $t$, the relevance of the rule "Being in this group implies to be in this state at this time point". The higher the plotted value, the higher the relevance of the rule. The horizontal dashed lines indicate the confidence thresholds. A rule is considered as statistically significant at the 5% level if it exceeds the 95% confidence horizontal line. The strength of rules with negative implicative statistic are not displayed because they have no meaningful interpretation.

References

Studer, Matthias (2015), Comment: On the Use of Globally Interdependent Multiple Sequence Analysis, Sociological Methodology 45, tools:::Rd_expr_doi("10.1177/0081175015588095").

Gras, Régis and Kuntz, Pascale. (2008), An overview of the Statistical Implicative Analysis (SIA) development, in Gras, R., Suzuki, E., Guillet, F. and Spagnolo, F. (eds), Statistical Implicative Analysis: Theory and application, Series Studies in Computational Intelligence, Vol 127, Berlin: Springer-Verlag, pp 11-40.

Ritschard, G. (2005). De l'usage de la statistique implicative dans les arbres de classification. In Gras, R., Spagnolo, F., and David, J., editors, Actes des Troisièmes Rencontres Internationale ASI Analyse Statistique Implicative, volume Secondo supplemento al N.15 of Quaderni di Ricerca in Didattica, pages 305–314. Università a degli Studi di Palermo, Palermo.

Examples

Run this code

data(mvad)

## Building a state sequence object
mvad.seq <- seqdef(mvad, 17:86)
## Sequence of typical states
mvad.si.gcse5eq <- seqimplic(mvad.seq, group=mvad$gcse5eq)

##Plotting the typical states
plot(mvad.si.gcse5eq, lwd=3, conf.level=c(0.95, 0.99))

## Printing the results
print(mvad.si.gcse5eq, xtstep=12)

Run the code above in your browser using DataLab