LMest (version 3.2.5)

lmest: Estimate Latent Markov models for categorical responses


Main function for estimating Latent Markov (LM) models for categorical responses.


lmest(responsesFormula = NULL, latentFormula = NULL,
      data, index, k = 1:4, start = 0,
      modSel = c("BIC", "AIC"), modBasic = 0,
      modManifest = c("LM", "FM"),
      paramLatent = c("multilogit", "difflogit"),
      weights = NULL, tol = 10^-8, maxit = 1000,
      out_se = FALSE, q = NULL, output = FALSE,
      parInit = list(piv = NULL, Pi = NULL, Psi = NULL,
                     Be = NULL, Ga = NULL, mu = NULL,
                     al = NULL, be = NULL, si = NULL,
                     rho = NULL, la = NULL, PI = NULL,
                     fixPsi = FALSE),
      fort = TRUE, seed = NULL, ntry = 0)


Returns an object of class 'LMbasic' for the model without covariates (see LMbasic-class), or an object of class 'LMmanifest' for the model with covariates on the manifest model (see LMmanifest-class), or an object of class 'LMlatent' for the model with covariates on the latent model (see LMlatent-class).



a symbolic description of the model to fit. A detailed description is given in the ‘Details’ section


a data.frame in long format


a character vector with two elements, the first indicating the name of the unit identifier, and the second the time occasions


an integer vector specifying the number of latent states (default: 1:4)


type of starting values (0 = deterministic, 1 = random, 2 = initial values in input)


a string indicating the model selection criteria: "BIC" for Bayesian Information Criterion and "AIC" for Akaike Information Criterion Criterion


model on the transition probabilities (0 for time-heterogeneity, 1 for time-homogeneity, from 2 to (TT-1) partial time-homogeneity of a certain order)


model for manifest distribution when covariates are included in the measurement model ("LM" = Latent Markov with stationary transition, "FM" = finite mixture model where a mixture of AR(1) processes is estimated with common variance and specific correlation coefficients).


type of parametrization for the transition probabilities ("multilogit" = standard multinomial logit for every row of the transition matrix, "difflogit" = multinomial logit based on the difference between two sets of parameters)


an optional vector of weights for the available responses


tolerance level for convergence


maximum number of iterations of the algorithm


to compute the information matrix and standard errors


number of support points for the AR(1) process (if modManifest ="FM")


to return additional output: V, Ul, S, yv, Pmarg for the basic LM model and for the LM with covariates on the latent model (LMbasic-class and LMlatent-class) and V, PRED1, S, yv, Pmarg for the LM model with covariates in the measurement model (LMmanifest-class)


list of initial model parameters when "start = 2". For the list of parameters look at LMbasic-class, LMlatent-class and LMmanifest-class


to use fortran routines when possible


an integer value with the random number generator state


to set the number of random initializations


Francesco Bartolucci, Silvia Pandolfi, Fulvia Pennoni, Alessio Farcomeni, Alessio Serafini


lmest is a general function for estimating LM models for categorical responses. The function requires data in long format and two additional columns indicating the unit identifier and the time occasions.

Covariates are allowed to affect manifest distribution (measurement model) or the initial and transition probabilities (latent model). Two different formulas are employed to specify the different LM models, responsesFormula and latentFormula:

  • responsesFormula is used to specify the measurament model:

    • responsesFormula = y1 + y2 ~ NULL
      the LM model without covariates and two responses (y1 and y2) is specified;

    • responsesFormula = NULL
      all the columns in the data except the "id" and "time" columns are used as responses to estimate the LM model without covariates;

    • responsesFormula = y1 ~ x1 + x2
      the univariate LM model with response (y1) and two covariates (x1 and x2) in the measurement model is specified;

  • latentFormula is used to specify the LM model with covariates in the latent model:

    • responsesFormula = y1 + y2 ~ NULL
      latentFormula = ~ x1 + x2 | x3 + x4
      the LM model with two responses (y1 and y2) and two covariates affecting the initial probabilities (x1 and x2) and other two affecting the transition probabilities (x3 and x4) is specified;

    • responsesFormula = y1 + y2 ~ NULL
      latentFormula = ~ 1 | x1 + x2
      (or latentFormula = ~ NULL | x1 + x2)
      the covariates affect only the transition probabilities and an intercept is specified for the intial probabilities;

    • responsesFormula = y1 + y2 ~ NULL
      latentFormula = ~ x1 + x2
      the LM model with two covariates (x1 and x2) affecting both the initial and transition probabilities is specified;

    • responsesFormula = y1 + y2 ~ NULL
      latentFormula = ~ NULL | NULL
      (or latentFormula = ~ 1 | 1)
      the LM model with only an intercept on the initial and transition probabilities is specified.

The function also allows us to deal with missing responses, including drop-out and non-monotonic missingness, under the missing-at-random assumption. Missing values for the covariates are not allowed.

The LM model with individual covariates in the measurement model is estimated only for complete univariate responses. In such a case, two possible formulations are allowed: modManifest="LM" is used to estimate the model illustrated in Bartolucci et al. (2017), where the latent process is of first order with initial probabilities equal to those of the stationary distribution of the chain; modManifest="FM" is used to estimate a model relying on the assumption that the distribution of the latent process is a mixture of AR(1) processes with common variance and specific correlation coefficients. This model is illustrated in Bartolucci et al. (2014).

For continuous outcomes see the function lmestCont.


Bartolucci, F., Bacci, S., and Pennoni, F. (2014). Longitudinal analysis of the self-reported health status by mixture latent autoregressive models, Journal of the Royal Statistical Society - series C, 63, pp. 267-288.

Bartolucci F., Pandolfi S., and Pennoni F. (2017) LMest: An R Package for Latent Markov Models for Longitudinal Categorical Data, Journal of Statistical Software, 81(4), 1-38.

Bartolucci, F., Farcomeni, A., and Pennoni, F. (2013) Latent Markov Models for Longitudinal Data, Chapman and Hall/CRC press.


### Basic LM model

SRHS <- data_SRHS_long[1:2400,]

# Categories rescaled to vary from 0 (“poor”) to 4 (“excellent”)

SRHS$srhs <- 5 - SRHS$srhs

out <- lmest(responsesFormula = srhs ~ NULL,
             index = c("id","t"),
             data = SRHS,
             k = 3,
             start = 1,
             modBasic = 1,
             seed = 123)

if (FALSE) {

## Basic LM model with model selection using BIC

out1 <- lmest(responsesFormula = srhs ~ NULL,
              index = c("id","t"),
              data = SRHS,
              k = 1:5,
              tol = 1e-8,
              modBasic = 1,
              seed = 123, ntry = 2)

# Basic LM model with model selection using AIC

out2 <- lmest(responsesFormula = srhs ~ NULL,
              index = c("id","t"),
              data = SRHS,
              k = 1:5,
              tol = 1e-8,
              modBasic = 1,
              modSel = "AIC",
              seed = 123, ntry = 2)

# Criminal data

data_criminal_sim = data.frame(data_criminal_sim)

responsesFormula <- lmestFormula(data = data_criminal_sim,response = "y")$responsesFormula

out3 <- lmest(responsesFormula = responsesFormula,
              index = c("id","time"),
              data =data_criminal_sim,
              k = 1:7,
              modBasic = 1,
              tol = 10^-4)

# Example of drug consumption data

long <- data_drug[,-6]-1
long <- data.frame(id = 1:nrow(long),long)
long <- reshape(long,direction = "long",
                idvar = "id",
                varying = list(2:ncol(long)))

out4 <- lmest(index = c("id","time"),
              k = 3, 
              data = long,
              weights = data_drug[,6],
              modBasic = 1)


### LM model with covariates in the latent model
# Covariates: gender, race, educational level (2 columns), age and age^2

out5 <- lmest(responsesFormula = srhs ~ NULL,
              latentFormula =  ~
              I(gender - 1) +
              I( 0 + (race == 2) + (race == 3)) +
              I(0 + (education == 4)) +
              I(0 + (education == 5)) +
              I(age - 50) + I((age-50)^2/100),
              index = c("id","t"),
              data = SRHS,
              k = 2,
              paramLatent = "multilogit",
              start = 0)


### LM model with the above covariates in the measurement model (stationary model)

out6 <- lmest(responsesFormula = srhs ~ -1 +
              I(gender - 1) +
              I( 0 + (race == 2) + (race == 3)) +
              I(0 + (education == 4)) +
              I(0 + (education == 5)) + I(age - 50) +
              index = c("id","t"),
              data = SRHS,
              k = 2,
              modManifest = "LM",
              out_se = TRUE,
              tol = 1e-8,
              start = 1,
              seed = 123)

#### LM model with covariates in the measurement model (mixture latent auto-regressive model)

out7 <- lmest(responsesFormula = srhs ~ -1 +
              I(gender - 1) +
              I( 0 + (race == 2) + (race == 3)) +
              I(0 + (education == 4)) +
              I(0 + (education == 5)) + I(age - 50) +
              index = c("id","t"),
              data = SRHS,
              k = 2,
              modManifest = "FM", q = 61,
              out_se = TRUE,
              tol = 1e-8)

