The dataset simulates a labour market programme. People entering the dataset are without a job.
They experience two hazards, i.e. probabilities per time period. They can either get a job and exit from
the dataset, or they can enter a labour market programme, e.g. a subsidised job or similar, and remain
in the dataset and possibly get a job later.
In the terms of this package, there are two transitions, "job"
and "program"
.
The two hazards are influenced by covariates observed by the researcher, called "x1"
and
"x2"
. In addition there are unobserved characteristics influencing the hazards. Being
on a programme also influences the hazard to get a job. In the generated dataset, being on
a programme is the indicator variable alpha
. While on a programme, the only transition that can
be made is "job"
.
The dataset is organized as a series of rows for each individual. Each row is a time period
with constant covariates.
The length of the time period is in the covariate duration
.
The transition being made at the end of the period is coded in the covariate d
. This
is an integer which is 0 if no transition occurs (e.g. if a covariate changes), it is 1 for
the first transition, 2 for the second transition. It can also be a factor, in which case the
level marking no transition must be called "none"
.
The covariate alpha
is zero when unemployed, and 1 if on a programme. It is used
for two purposes. It is used as an explanatory variable for transition to job, this yields
a coefficient which can be interpreted as the effect of being on the programme. It is also
used as a "state variable", as an index into a "risk set". I.e. when estimating, the
mphcrm
function must be told which risks/hazards are present.
When on a programme the "toprogram"
transition can not be made. This is implemented
by specifying a list of risksets and using alpha+1
as an index into this set.
The two hazards are modeled as \(exp(X \beta + \mu)\), where \(X\) is a matrix of covariates
\(\beta\) is a vector of coefficients to be estimated, and \(\mu\) is an intercept. All of
these quantities are transition specific. This yields an individual likelihood which we call
\(M_i(\mu)\). The idea behind the mixed proportional hazard model is to model the
individual heterogeneity as a probability distribution of intercepts. We obtain the individual
likelihood \(L_i = \sum_j p_j M_i(\mu_j)\), and, thus, the likelihood \(L = \sum_j L_j\).
The likelihood is to be maximized over the parameter vectors \(\beta\) (one for each transition),
the masspoints \(\mu_j\), and probabilites \(p_j\).
The probability distribution is built up in steps. We start with a single masspoint, with
probability 1. Then we search for another point with a small probability, and maximize the
likelihood from there. We continue with adding masspoints until we no longer can improve
the likelihood.