seqsamm: Sequence Analysis Multistate Model (SAMM) procedure

Description

Sequence Analysis Multistate Model (SAMM) procedure aims to simultaneously study the occurrence of transitions out of (an exit from) a spell in a given state along trajectories and the subsequence (or subtrajectory) immediately following it over a pre-defined period of time. This strategy allows including time-varying covariates in the sequence analysis framework.

Usage

seqsamm(seqdata, sublength, covar = NULL)
# S3 method for SAMM
plot(x, type="d", ...)
seqsammseq(samm, spell)
seqsammeha(samm, spell, typology, persper = TRUE)

Value

A SAMM object (data.frame), storing the reorganized data in person period form. Column variables are:

id: Numeric. The ID of the observation as the row number in the original seqdata.
time: Numeric. The time unit of the current observation (from the beginning of the original sequence).
begin: Numeric. The time of the beginning of the current spell (from the beginning of the original sequence).
spell.time: Numeric. The time elapsed from the beginning of the current spell.
transition: Logical. Whether a transition out of the current spell occurred within this time unit.
s.1 until s.sublength: The state sequence following the current observation starting from 1 (current state) until sublength time units after the current observation.
lastobs: Logical. Whether this is the last observation of the current spell, censored or not. This is useful when one wants only one row per individual, for instance to plot survival curves (see example).
x: object of class SAMM as produced by seqsamm
Optional covariate list: The covariates provided with the covar argument.

The function seqsammseq returns an stslist sequence object (see seqdef) of the trajectories following an ending spell.

The function seqsammeha returns a data.frame storing the person period data of a specific ending spell (see spell argument) considering the given typology as competing risks (see typology argument). Several variables are added to the SAMM objects (see above):

SAMMtypology: Factor. The events ending the specified spell using "None" when no event occurs.
SAMM...: Logical. A logical vector specifying whether the current observation ends the spell with the following ... type of trajectory.

Arguments

seqdata: State sequence object created with the seqdef function. Sequences representing any temporal process can be of different length.
sublength: Numeric. The length of the subsequence (or subtrajectory) following a transition to be considered.
covar: Optional data.frame storing covariates of interest. These covariates are added to the final data set and can be used in subsequent analyses.
x: A SAMM object produced by seqsamm
samm: A SAMM object produced by seqsamm.
type: the type of the plot seqplot. Default "d" for state distribution plots (chronograms).
spell: Character. The (ending) spell in a given spell to consider. It should be one of the states of the alphabet of the sequences. A spell is a series of time points in the same state.
typology: Factor or character. The typology of the trajectories out of the specified ending spell generated by a cluster analyses (see example). It should contain one observation per observed ending spell.
persper: Logical. If TRUE, the data are returned in person-period format. Otherwise, only one line per observed spell is returned.
...: additional plot parameters passed to seqplot.

Author

Matthias Studer

Details

The Sequence Analysis Multistate Model (SAMM) procedure works in three steps. First, the substrings over a given time span sublength following any transition out of (exit from) a spell in a given state of the alphabet are extracted from the trajectories seqdata. This step is achieved using the seqsamm function. Each substring starts with the last time-point of the spell in the state. Second, these substrings are clustered using SA to identify typical substrings of medium-term changes. This is achieved separately for each ending spell (see spell argument). The seqsammseq function can be used to retrieve the sub-trajectories following each ending spell. Third, multistate models are used to estimate the chance (or risk) to end a spell in a given spell by distinguishing the type of trajectory that follows (and identified with cluster analysis). This allows estimating the effect of covariates on the chances to start each type of sub-sequence. The seqsammeha prepare the data to estimate the competing risk models for each ending spell. Then usual competing risks models can be used.

Generally speaking, the SAMM procedure allows studying the time spent in each state as well as the patterns of medium-term changes after an exit from that state appears along the trajectories. The example section below provides a step by step example of how to use it.

References

Studer, M., Struffolino, E., & Fasang, A. E. (2018). Estimating the Relationship between Time-varying Covariates and Trajectories: The Sequence Analysis Multistate Model Procedure. Sociological Methodology, 48(1), 103–135. tools:::Rd_expr_doi("10.1177/0081175017747122")

Examples

Run this code

data(mvad)
mvad.seq <- seqdef(mvad, 17:86)

## For sake of simplicity we recode all "education" states to only one common state.
mvad.seq  <- seqrecode(mvad.seq, list("education"=c("FE", "HE", "school", "training")))
## We now have three states
seqdplot(mvad.seq)

###########################################################################
##  STEP I: Subsequence extraction
###########################################################################

## We start by extracting all subsequence of length 6
## We also add covariates from the mvad data frame
mvad.samm <- seqsamm(mvad.seq, 6, covar=mvad[, c("Grammar", "funemp", "gcse5eq")])
## Plotting the results to visualize the transitions out of each states.
plot(mvad.samm)
## Descriptive information on the seqsamm object
summary(mvad.samm)


###########################################################################
### STEP II: Typology of trajectory out of joblessness
###########################################################################
## We retrieve the subsequences following a transition out of a joblessness spell
jlseq <- seqsammseq(mvad.samm, "joblessness")


## Now we create a typology of these subsequences.

## Compute the clustering using LCS
jldist <- seqdist(jlseq, method="LCS")
## For sake of simplicity, use only 2 groups
library(cluster)
jlclust <- pam(jldist, diss=TRUE, k=2, cluster.only=TRUE)
## Specify the names of the types in the 2-cluster typology (here joblessness1 or joblessness2).
jltype <- paste0("joblessness", jlclust)


###########################################################################
### STEP III: Competing risks model of trajectories out of joblessness
###########################################################################

## Get the data to estimate competing risks models of the kind of trajectory
## out of jobjlessness
## We specify the SAMM object, the ending spell (joblessness) and our typology.
jleha <- seqsammeha(mvad.samm, "joblessness", jltype)

if (FALSE) {
## Now jleha stores the data in person period format for competing risks
## Discrete time model using multinomial regression
## SAMMtypology and spell.time are variables created and stored in the jleha dataset
library(nnet)
multinom(SAMMtypology~spell.time+Grammar+funemp+gcse5eq, data=jleha)

## We can also have only one line per ending spell
## Plot the results
library(survival)
jleha <- seqsammeha(mvad.samm, "joblessness", jltype, persper=FALSE)
plot(survfit(Surv(spell.time, SAMMjoblessness1)~1, data=jleha))
## Cox model
summary(coxph(Surv(spell.time, SAMMjoblessness1)~gcse5eq+Grammar+funemp, data=jleha))
## Most of the time methods for recurrent events should be used.
## See for instance the coxme library to do so.

library(coxme)
summary(coxme(Surv(spell.time, SAMMjoblessness1)~gcse5eq+Grammar+funemp+(1|id), data=jleha))
}

###########################################################################
### Now repeat steps II and III for employment and then education
### (Not shown here)
###########################################################################

Run the code above in your browser using DataLab