Logarithm of the probabilities of state sequences. The probability of a sequence is defined as the product of the probabilities of the successive states in the sequence. State probabilities can either be provided or be computed with one of a few basic models.
seqlogp(seqdata, prob="trate", time.varying=TRUE,
begin="freq", weighted=TRUE, with.missing=FALSE)
Vector of the negative logarithm \(-\log P(s)\) of the sequence probabilities.
A state sequence object as produced by seqdef
.
String or numeric array. If a string, either "trate"
or "freq"
to select a probability model to compute the state probabilities. If a numeric array, a matrix or 3-dimensional array of transition probabilities. See details.
Logical. If TRUE
, the probabilities (transitions or frequencies) are computed separately for each time \(t\) point.
String of numeric vector. Distribution used to determine the probability of the first state. If a vector, the probabilites to use. If a string, either "freq"
or global.freq
. With freq
, the observed distribution at first position is used. If global.freq
, the overall distribution is used. Default is "freq"
.
Logical. Should we account for the weights when present in seqdata
? Default is TRUE
.
Logical. Should non void missing states be treated as regular values? Default is FALSE
.
Matthias Studer, Alexis Gabadinho, and Gilbert Ritschard
The sequence likelihood \(P(s)\) is defined as the product of the probability with which each of its observed successive state is supposed to occur at its position. Let \(s=s_{1}s_{2} \cdots s_{\ell}\) be a sequence of length \(\ell\). Then $$ P(s)=P(s_{1},1) \cdot P(s_{2},2) \cdots P(s_{\ell},\ell) $$ with \(P(s_{t},t)\) the probability to observe state \(s_t\) at position \(t\).
There are different ways to determine the state probabilities \(P(s_t,t)\). The method is chosen by means of the prob
argument.
With prop = "freq"
, the probability \(P(s_{t},t)\) is set as the observed relative frequency at position \(t\). In that case, the probability does not depend on the probabilities of transition. By default (time.varying=TRUE
), the relative frequencies are computed separately for each position \(t\). With time.varying=FALSE
, the relative frequencies are computed over the entire covered period, i.e. the same frequencies are used at each \(t\).
Option prop = "trate"
assumes that each \(P(s_t,t)\), \(t>1\) is set as the transition probability \(p(s_t|s_{t-1})\). The state distribution used to determine the probability of the first state \(s_1\) is set by means of the begin
argument (see below). With the default time.varying=TRUE
), the transition probabilities are estimated separately at each position, yielding an array of transition matrices. With time.varying=FALSE
, the transition probabilities are assumed to be constant over the successive positions and are estimated over the entire sequence duration, i.e. from all observed transitions.
Custom transition probabilities can be provided by passing a matrix or a 3-dimensional array as prob
argument.
The distribution used at the first position is set by means of the begin
argument. You can either pass the distribution (probabilities of the states in the alphabet including the missing value when with.missing=TRUE
), or specify "freq"
for the observed distribution at the first position, or global.freq
for the overall state distribution.
The likelihood \(P(s)\) being generally very small, seqlogp
returns \(-\log P(s)\). The latter quantity is minimal when \(P(s)\) is equal to \(1\).
## Creating the sequence objects using weigths
data(biofam)
biofam.seq <- seqdef(biofam, 10:25, weights=biofam$wp00tbgs)
## Computing sequence probabilities
biofam.prob <- seqlogp(biofam.seq)
## Comparing the probability of each cohort
cohort <- biofam$birthyr>1940
boxplot(biofam.prob~cohort)
Run the code above in your browser using DataLab