Learn R Programming

WeightedCluster (version 1.8-0)

seqnull: Generate nonclustered sequence data according to different null models.

Description

This function generates sequence data that is similar to the original sequence data, but nonclusterd on specific aspects related to the sequencing, timing or time spend in the different states. The function is typically used by only specifying a model among "combined", "duration", "sequencing", "stateindep" or "Markov". The "userpos" model allows to fully specify a sequence generating model using a starting distribution and a transition rate matrix.

Usage

seqnull(seqdata, model = c("combined", "duration", "sequencing", 
        "stateindep", "Markov", "userpos"), imp.trans = NULL, 
		imp.trans.limit = -1, trate = "trate", begin = "freq", 
		time.varying = TRUE, weighted = TRUE)

Value

A state sequence object of class stslist.

Arguments

seqdata

State sequence object of class stslist. The sequence data to use. Use seqdef to create such an object.

model

String. The model used to generate the nonclustered data. It can be one of "combined", "duration", "sequencing", "stateindep", "Markov" or "userpos". See the Details section.

imp.trans

Optional named character vector listing impossible transitions. Names indicates starting states, while value destinations. Only used for "combined", "duration" and "sequencing" models.

imp.trans.limit

Numeric. Optional. All transitions with a transition rates below (or equal) this value are considered impossible. Only used for "combined", "duration" and "sequencing" models.

trate

String, matrix or array. Only used to specify the "userpos" model. It can be either a method to compute the time-varying transition rates, a matrix of transition rates used for all time points, or a time-varying transition rates matrix specified as an array. String values "freq" to use state distribution or "trate" to use transition rates.

begin

String or vector. Only used to specify the "userpos" model. Either a vector of probability for the first state in the sequence, or a method to compute it. String values "freq" to use state distribution at first time point or "ofreq" to use the overall (time-independent) state distribution.

time.varying

Logical. If TRUE, the state distribution or the transition rate specified by the trate arguement (using a string) are computed separately for each time point.

weighted

Logicel. If TRUE, state distribution and transition rates are computed using the weights specified in seqdata.

Details

This function generates sequence data that is similar to the original sequence data, but nonclusterd on specific aspects related to the sequencing, timing or time spend in the different states. The function is typically used by only specifying a model among "combined", "duration", "sequencing", "stateindep" or "Markov". The models are shortly described below. More information about their usefulness can be found in Studer (2021) (see below).

The "combined", "duration" and "sequencing" models generate sequence in spell format, by generating a vector of state and their attached durations. The "combined" model generate random sequencing and duration. The "duration" model only randomizes duration, while keeping the original sequencing of the states found in the data. Finally, the "sequencing" only randomizes the sequencing of the states and keep the time spent in a state as found in the data.

The "stateindep" model generate sequence by randomly selecting a state at each time point without taking into account the previous one. It can generate highly unlikely sequence because it doesn't account for coherence of trajectories over time.

The "Markov" model use a time-invariant (homogeneouns) transition rate matrix to generate the sequences. It can reveals difference in the timing of transitions.

References

Studer, M. (2021). Validating Sequence Analysis Typologies Using Parametric Bootstrap. Sociological Methodology. tools:::Rd_expr_doi("10.1177/00811750211014232")

See Also

See Also seqnullcqi.

Examples

Run this code

data(biofam)

bf.seq <- seqdef(biofam[1:200,10:25])

##Plot the sequences generated by different null models.
seqdplot(seqnull(bf.seq, model="combined"))

seqdplot(seqnull(bf.seq, model="duration"))

seqdplot(seqnull(bf.seq, model="sequencing"))

seqdplot(seqnull(bf.seq, model="stateindep"))

seqdplot(seqnull(bf.seq, model="Markov"))

Run the code above in your browser using DataLab