Learn R Programming

TraMineRextras (version 0.6.8)

seqimplic: Position wise group-typical states

Description

Visualization and identification of the states that best characterize a group of sequences versus the others at each position (time point). The typical states are identified at each position as those for which we have a high implication strength to be in when belonging to the group.

Usage

seqimplic(seqdata, group, with.missing = FALSE, weighted = TRUE, na.rm = TRUE)
# S3 method for seqimplic
plot(x, main = NULL, ylim = NULL, xaxis = TRUE,
    ylab = "Implication", yaxis = TRUE, axes = "all", xtlab = NULL,
    xtstep = NULL, tick.last = NULL, cex.axis = 1, with.legend = "auto",
    ltext = NULL, cex.legend = 1, legend.prop = NA, rows = NA, cols = NA,
    conf.level = 0.95, lwd = 1, only.levels = NULL, ...)
# S3 method for seqimplic
print(x, xtstep = NULL, tick.last = NULL, round = NULL,
    conf.level = NULL, na.print = "", ...)

Value

seqimplic returns a "seqimplic" object that can be plotted and printed. The values of the implicative statistics at each time point are in the element indices of the object.

Arguments

seqdata

a state sequence object (see seqdef).

group

a factor giving the group membership of each sequence in seqdata.

with.missing

Logical. If FALSE (default), missing values are discarded. If TRUE, missing values are coded as a specific state.

weighted

Logical. If TRUE (default), the implicative strength of the rules are computed using the weights assigned to the state sequence object (see seqdef). Set as FALSE to ignore the weights.

na.rm

Logical. If TRUE (default), observations with missing values on the group variable are discarded. If FALSE, the missing group value defines a specific group.

x

A sequence of typical state object as generated by seqimplic.

xtstep

Integer. Optional interval at which the tick-marks and labels of the x-axis are displayed. For example, with xtstep=3 a tick-mark is drawn at position 1, 4, 7, etc... The display of the corresponding labels depends on the available space and is dealt with automatically. If unspecified, the xtstep attribute of the x object is used.

tick.last

Logical. Should a tick mark be enforced at the last position on the x-axis? If unspecified, the tick.last attribute of the x object is used.

main

title for the graphic. Default is NULL.

ylim

the y limits of the plot.

xaxis

Logical. Should the x-axis (time) be plotted?.

ylab

Optional label for the y-axis. If set as NA, no label is drawn.

yaxis

Logical. Should the y axis be plotted?. When set as TRUE, sequence indexes are displayed.

axes

If set as "all" (default value) x-axes are drawn for each plot in the graphic. If set as "bottom", axes are drawn only under the plots located at the bottom of the graphic area. If FALSE, no x-axis is drawn.

xtlab

optional labels for the x-axis ticks labels. If unspecified, the column names of the seqdata sequence object are used (see seqdef).

cex.axis

expansion factor for setting the size of the font for the axis labels and names. The default value is 1. Values lesser than 1 will reduce the size of the font, values greater than 1 will increase the size.

with.legend

One of "auto" (default), "right" or FALSE. Defines if and where the legend of the state colors is plotted. With "auto" sets the position of the legend is set automatically. The obsolete value TRUE is equivalent to "auto".

ltext

optional description of the states to appear in the legend. Must be a vector of character strings with number of elements equal to the size of the alphabet. If unspecified, the label attribute of the seqdata sequence object is used (see seqdef).

cex.legend

expansion factor for setting the size of the font for the labels in the legend. The default value is 1. Values smaller than 1 reduce the size of the font, values greater than 1 increase the size.

legend.prop

Proportion (between 0 and 1) of the graphic area used for plotting the legend when use.layout=TRUE and withlegend=TRUE. The default value is set according to the place (bottom or right of the graphic area) where the legend is plotted.

rows,cols

optional arguments to arrange plots when use.layout=TRUE.

lwd

The line width, a positive number. See lines

only.levels

Optional list of levels of the group variable to be plotted. By default all levels are plotted.

round

Optional number of decimals when printing a seqimplic object.

conf.level

Confidence levels thresholds (default is 0.95).

na.print

Character string (or NULL) used for NA values in printed output, see print.default.

...

further arguments passed to print.default (for print method) or lines (for plot method).

Author

Matthias Studer.

Details

The seqimplic function builds an object with the position wise typical states. It can be used to visualize or identify the differences between each group of trajectories and the other ones. It presents at each time point the typical states of a subpopulation (for instance women, as opposed to men). A state at a given time point is considered to be typical of a group if the rule "Being in this group implies to be in that state at this time point" is relevant according to the implicative statistic.

The implicative statistic assesses the statistical relevance of a rule of the form "A implies B" (Gras et al., 2008). It does so by measuring the gap between the expected and observed numbers of counter examples. The rule is considered to be strongly implicative if we observe much less counter examples than expected under the independence assumption. This gap and its significance are computed using adjusted residuals of a contingency table with continuity correction as proposed by Ritschard (2005). In order to improve the readability of the graphs, we use here the opposite of the implicative statistic, which is highly negative for significant rules. The statistic \(I(A\rightarrow B)\) measuring the relevance of the rule "A implies B" reads as follows:

$$I(A\rightarrow B)=-\frac{n_{\bar{B}A}+0.5-n^e_{\bar{B}A}}{\sqrt{n^{e}_{\bar{B}A}(n_{B.}/n)(1-n_{.A}/n)}}$$ Where \(n_{\bar{B}A}\) is the observed number of counter-examples, \(n^{e}_{\bar{B}A}\) the expected number of counter-examples under the independence assumption, \(n_{B.}\) the number of times that B is observed, \(n_{.A}\) the number of times that A is observed and \(n\) the total number of cases.

The plot function can be used to visualize the results. It produces a separate plot for each level of the group variable. In each plot, it presents at each time point \(t\), the relevance of the rule "Being in this group implies to be in this state at this time point". The higher the plotted value, the higher the relevance of the rule. The horizontal dashed lines indicate the confidence thresholds. A rule is considered as statistically significant at the 5% level if it exceeds the 95% confidence horizontal line. The strength of rules with negative implicative statistic are not displayed because they have no meaningful interpretation.

References

Studer, Matthias (2015), Comment: On the Use of Globally Interdependent Multiple Sequence Analysis, Sociological Methodology 45, tools:::Rd_expr_doi("10.1177/0081175015588095").

Gras, Régis and Kuntz, Pascale. (2008), An overview of the Statistical Implicative Analysis (SIA) development, in Gras, R., Suzuki, E., Guillet, F. and Spagnolo, F. (eds), Statistical Implicative Analysis: Theory and application, Series Studies in Computational Intelligence, Vol 127, Berlin: Springer-Verlag, pp 11-40.

Ritschard, G. (2005). De l'usage de la statistique implicative dans les arbres de classification. In Gras, R., Spagnolo, F., and David, J., editors, Actes des Troisièmes Rencontres Internationale ASI Analyse Statistique Implicative, volume Secondo supplemento al N.15 of Quaderni di Ricerca in Didattica, pages 305–314. Università a degli Studi di Palermo, Palermo.

Examples

Run this code
data(mvad)

## Building a state sequence object
mvad.seq <- seqdef(mvad, 17:86)
## Sequence of typical states
mvad.si.gcse5eq <- seqimplic(mvad.seq, group=mvad$gcse5eq)

##Plotting the typical states
plot(mvad.si.gcse5eq, lwd=3, conf.level=c(0.95, 0.99))

## Printing the results
print(mvad.si.gcse5eq, xtstep=12)

Run the code above in your browser using DataLab