The dataset simulates a labour market programme. People entering the dataset are without a job.
They experience two hazards, i.e. probabilities per time period. They can either get a job and exit from
the dataset, or they can enter a labour market programme, e.g. a subsidised job or similar, and remain
in the dataset and possibly get a job later.
In the terms of this package, there are two transitions, "job" and "program".
The two hazards are influenced by covariates observed by the researcher, called "x1" and
"x2". In addition there are unobserved characteristics influencing the hazards. Being
on a programme also influences the hazard to get a job. In the generated dataset, being on
a programme is the indicator variable alpha. While on a programme, the only transition that can
be made is "job".
The dataset is organized as a series of rows for each individual. Each row is a time period
with constant covariates.
The length of the time period is in the covariate duration.
The transition being made at the end of the period is coded in the covariate d. This
is an integer which is 0 if no transition occurs (e.g. if a covariate changes), it is 1 for
the first transition, 2 for the second transition. It can also be a factor, in which case the
level marking no transition must be called "none".
The covariate alpha is zero when unemployed, and 1 if on a programme. It is used
for two purposes. It is used as an explanatory variable for transition to job, this yields
a coefficient which can be interpreted as the effect of being on the programme. It is also
used as a "state variable", as an index into a "risk set". I.e. when estimating, the
mphcrm function must be told which risks/hazards are present.
When on a programme the "toprogram" transition can not be made. This is implemented
by specifying a list of risksets and using alpha+1 as an index into this set.
The two hazards are modeled as \(exp(X \beta + \mu)\), where \(X\) is a matrix of covariates
\(\beta\) is a vector of coefficients to be estimated, and \(\mu\) is an intercept. All of
these quantities are transition specific. This yields an individual likelihood which we call
\(M_i(\mu)\). The idea behind the mixed proportional hazard model is to model the
individual heterogeneity as a probability distribution of intercepts. We obtain the individual
likelihood \(L_i = \sum_j p_j M_i(\mu_j)\), and, thus, the likelihood \(L = \sum_j L_j\).
The likelihood is to be maximized over the parameter vectors \(\beta\) (one for each transition),
the masspoints \(\mu_j\), and probabilites \(p_j\).
The probability distribution is built up in steps. We start with a single masspoint, with
probability 1. Then we search for another point with a small probability, and maximize the
likelihood from there. We continue with adding masspoints until we no longer can improve
the likelihood.