Learn R Programming

TraMineRextras (version 0.6.8)

seqsamm: Sequence Analysis Multistate Model (SAMM) procedure

Description

Sequence Analysis Multistate Model (SAMM) procedure aims to simultaneously study the occurrence of transitions out of (an exit from) a spell in a given state along trajectories and the subsequence (or subtrajectory) immediately following it over a pre-defined period of time. This strategy allows including time-varying covariates in the sequence analysis framework.

Usage

seqsamm(seqdata, sublength, covar = NULL)
# S3 method for SAMM
plot(x, type="d", ...)
seqsammseq(samm, spell)
seqsammeha(samm, spell, typology, persper = TRUE)

Value

A SAMM object (data.frame), storing the reorganized data in person period form. Column variables are:

id

Numeric. The ID of the observation as the row number in the original seqdata.

time

Numeric. The time unit of the current observation (from the beginning of the original sequence).

begin

Numeric. The time of the beginning of the current spell (from the beginning of the original sequence).

spell.time

Numeric. The time elapsed from the beginning of the current spell.

transition

Logical. Whether a transition out of the current spell occurred within this time unit.

s.1 until s.sublength

The state sequence following the current observation starting from 1 (current state) until sublength time units after the current observation.

lastobs

Logical. Whether this is the last observation of the current spell, censored or not. This is useful when one wants only one row per individual, for instance to plot survival curves (see example).

x

object of class SAMM as produced by seqsamm

Optional covariate list

The covariates provided with the covar argument.

The function seqsammseq returns an stslist sequence object (see seqdef) of the trajectories following an ending spell.

The function seqsammeha returns a data.frame storing the person period data of a specific ending spell (see spell argument) considering the given typology as competing risks (see typology argument). Several variables are added to the SAMM objects (see above):

SAMMtypology

Factor. The events ending the specified spell using "None" when no event occurs.

SAMM...

Logical. A logical vector specifying whether the current observation ends the spell with the following ... type of trajectory.

Arguments

seqdata

State sequence object created with the seqdef function. Sequences representing any temporal process can be of different length.

sublength

Numeric. The length of the subsequence (or subtrajectory) following a transition to be considered.

covar

Optional data.frame storing covariates of interest. These covariates are added to the final data set and can be used in subsequent analyses.

x

A SAMM object produced by seqsamm

samm

A SAMM object produced by seqsamm.

type

the type of the plot seqplot. Default "d" for state distribution plots (chronograms).

spell

Character. The (ending) spell in a given spell to consider. It should be one of the states of the alphabet of the sequences. A spell is a series of time points in the same state.

typology

Factor or character. The typology of the trajectories out of the specified ending spell generated by a cluster analyses (see example). It should contain one observation per observed ending spell.

persper

Logical. If TRUE, the data are returned in person-period format. Otherwise, only one line per observed spell is returned.

...

additional plot parameters passed to seqplot.

Author

Matthias Studer

Details

The Sequence Analysis Multistate Model (SAMM) procedure works in three steps. First, the substrings over a given time span sublength following any transition out of (exit from) a spell in a given state of the alphabet are extracted from the trajectories seqdata. This step is achieved using the seqsamm function. Each substring starts with the last time-point of the spell in the state. Second, these substrings are clustered using SA to identify typical substrings of medium-term changes. This is achieved separately for each ending spell (see spell argument). The seqsammseq function can be used to retrieve the sub-trajectories following each ending spell. Third, multistate models are used to estimate the chance (or risk) to end a spell in a given spell by distinguishing the type of trajectory that follows (and identified with cluster analysis). This allows estimating the effect of covariates on the chances to start each type of sub-sequence. The seqsammeha prepare the data to estimate the competing risk models for each ending spell. Then usual competing risks models can be used.

Generally speaking, the SAMM procedure allows studying the time spent in each state as well as the patterns of medium-term changes after an exit from that state appears along the trajectories. The example section below provides a step by step example of how to use it.

References

Studer, M., Struffolino, E., & Fasang, A. E. (2018). Estimating the Relationship between Time-varying Covariates and Trajectories: The Sequence Analysis Multistate Model Procedure. Sociological Methodology, 48(1), 103–135. tools:::Rd_expr_doi("10.1177/0081175017747122")

See Also

seqcta, seqsha

Examples

Run this code
data(mvad)
mvad.seq <- seqdef(mvad, 17:86)

## For sake of simplicity we recode all "education" states to only one common state.
mvad.seq  <- seqrecode(mvad.seq, list("education"=c("FE", "HE", "school", "training")))
## We now have three states
seqdplot(mvad.seq)

###########################################################################
##  STEP I: Subsequence extraction
###########################################################################

## We start by extracting all subsequence of length 6
## We also add covariates from the mvad data frame
mvad.samm <- seqsamm(mvad.seq, 6, covar=mvad[, c("Grammar", "funemp", "gcse5eq")])
## Plotting the results to visualize the transitions out of each states.
plot(mvad.samm)
## Descriptive information on the seqsamm object
summary(mvad.samm)


###########################################################################
### STEP II: Typology of trajectory out of joblessness
###########################################################################
## We retrieve the subsequences following a transition out of a joblessness spell
jlseq <- seqsammseq(mvad.samm, "joblessness")


## Now we create a typology of these subsequences.

## Compute the clustering using LCS
jldist <- seqdist(jlseq, method="LCS")
## For sake of simplicity, use only 2 groups
library(cluster)
jlclust <- pam(jldist, diss=TRUE, k=2, cluster.only=TRUE)
## Specify the names of the types in the 2-cluster typology (here joblessness1 or joblessness2).
jltype <- paste0("joblessness", jlclust)


###########################################################################
### STEP III: Competing risks model of trajectories out of joblessness
###########################################################################

## Get the data to estimate competing risks models of the kind of trajectory
## out of jobjlessness
## We specify the SAMM object, the ending spell (joblessness) and our typology.
jleha <- seqsammeha(mvad.samm, "joblessness", jltype)

if (FALSE) {
## Now jleha stores the data in person period format for competing risks
## Discrete time model using multinomial regression
## SAMMtypology and spell.time are variables created and stored in the jleha dataset
library(nnet)
multinom(SAMMtypology~spell.time+Grammar+funemp+gcse5eq, data=jleha)

## We can also have only one line per ending spell
## Plot the results
library(survival)
jleha <- seqsammeha(mvad.samm, "joblessness", jltype, persper=FALSE)
plot(survfit(Surv(spell.time, SAMMjoblessness1)~1, data=jleha))
## Cox model
summary(coxph(Surv(spell.time, SAMMjoblessness1)~gcse5eq+Grammar+funemp, data=jleha))
## Most of the time methods for recurrent events should be used.
## See for instance the coxme library to do so.

library(coxme)
summary(coxme(Surv(spell.time, SAMMjoblessness1)~gcse5eq+Grammar+funemp+(1|id), data=jleha))
}

###########################################################################
### Now repeat steps II and III for employment and then education
### (Not shown here)
###########################################################################

Run the code above in your browser using DataLab