The main method of the package is mphcrm
. It has an interface
somewhat similar to lm
. There is an example of use in datagen
, with
a generated dataset similiar to the ones in Gaure et al. (2007). For those who have
used the program used in that paper, a mixture of R, Fortran, C, and python,
this is an entirely new self-contained package, written from scratch with 12 years of experience.
Currently not all functionality from that behemoth has been implemented, but most of it.
A short description of the model follows.
There are some individuals with some observed covariates \(X_i\). The individuals are observed for some time, so there is typically more than one observation of each individual. At any point they experience one or more hazards. The hazards are assumed to be of the form \(h_i^j = exp(X_i \beta_j)\), where \(\beta_j\) are coefficients for hazard \(j\). The hazards themselves are not observed, but an event associated with them is, i.e. a transition of some kind. The time of the transition, either exactly recorded, or within an interval, must also be in the data set. With enough observations it is then possible to estimate the coefficients \(\beta_j\).
However, it just so happens that contrary to ordinary linear models, any unobserved heterogeneity may bias the estimates, not just increase uncertainty. To account for unobserved heterogeneity, a random intercept is introduced, so that the hazards are of the form \(h_i^j(\mu_k) = exp(X_i \beta_j + \mu_k)\) for \(k\) between 1 and some \(n\). The intercept may of course be written multiplicatively as \(exp(X_i \beta_j) exp(\mu_k)\), that is why they are called proportional hazards.
The individual likelihood depends on the intercept, i.e. \(L_i(\mu_k)\), but we integrate it out so that the individual likelihood becomes \(\sum p_k L_i(\mu_k)\). The resulting mixture likelihood is maximized over all the \(\beta\)s, \(n\), the \(\mu_k\)s, and the probabilities \(p_k\).
Besides the function mphcrm
which does the actual estimation, there are functions for
extracting the estimated mixture, they are mphdist
, mphmoments
and a few more.
There's a summary function for the fitted model, and there is a data set available with data(durdata)
which
is used for demonstration purposes. Also, an already fitted model is available there, as fit
.
The package may use more than one cpu, the default is taken from getOption("durmod.threads")
which is initialized from the environment variable DURMOD_THREADS
, OMP_THREAD_LIMIT
,
OMP_NUM_THREADS
or NUMBER_OF_PROCESSORS
, or parallel::detectCores() upon loading the package.
For more demanding problems, a cluster of machines (from packages parallel or snow) can be used, in combination with the use of threads.
There is a vignette (vignette("whatmph")
) with more details about durmod and data layout.
Gaure, S., K. R<U+00F8>ed and T. Zhang (2007) Time and causality: A Monte-Carlo Assessment of the timing-of-events approach, Journal of Econometrics 141(2), 1159-1195. https://doi.org/10.1016/j.jeconom.2007.01.015