Learn R Programming

HiddenMarkov (version 1.8-13)

dthmm: Discrete Time HMM Object (DTHMM)

Description

Creates a discrete time hidden Markov model object with class "dthmm". The observed process is univariate.

Usage

dthmm(x, Pi, delta, distn, pm, pn = NULL, discrete = NULL,
      nonstat = TRUE)

Arguments

x

is a vector of length \(n\) containing the univariate observed process. Alternatively, x could be specified as NULL, meaning that the data will be added later (e.g. simulated).

Pi

is the \(m \times m\) transition probability matrix of the homogeneous hidden Markov chain.

delta

is the marginal probability distribution of the \(m\) hidden states at the first time point.

distn

is a character string with the abbreviated distribution name. Distributions provided by the package are Beta ("beta"), Binomial ("binom"), Exponential ("exp"), GammaDist ("gamma"), Lognormal ("lnorm"), Logistic ("logis"), Normal ("norm"), and Poisson ("pois"). See topic Mstep, Section “Modifications and Extensions”, to extend to other distributions.

pm

is a list object containing the (Markov dependent) parameter values associated with the distribution of the observed process (see below).

pn

is a list object containing the observation dependent parameter values associated with the distribution of the observed process (see below).

discrete

is logical, and is TRUE if distn is a discrete distribution. Set automatically for distributions already contained in the package.

nonstat

is logical, TRUE if the homogeneous Markov chain is assumed to be non-stationary, default. See section “Stationarity” below.

Value

A list object with class "dthmm", containing the above arguments as named components.

Notation

  1. MacDonald & Zucchini (1997) use \(t\) to denote the time, where \(t = 1, \cdots, T\). To avoid confusion with other uses of t and T in R, we use \(i = 1, \cdots, n\).

  2. We denote the observed sequence as \(\{X_i\},\ i = 1, \cdots, n\); and the hidden Markov chain as \(\{C_i\},\ i = 1, \cdots, n\).

  3. The history of the observed process up to time \(i\) is denoted by \(X^{(i)}\), i.e. $$ X^{(i)} = (X_1, \cdots, X_i) $$ where \(i = 1, \cdots, n\). Similarly for \(C^{(i)}\).

  4. The hidden Markov chain has \(m\) states denoted by \(1, \cdots, m\).

  5. The Markov chain transition probability matrix is denoted by \(\Pi\), where the \((j, k)\)th element is $$ \pi_{jk} = \Pr\{ C_{i+1}=k \, | \, C_i=j \} $$ for all \(i\) (i.e. all time points), and \(j,k = 1, \cdots, m\).

  6. The Markov chain is assumed to be homogeneous, i.e. for each \(j\) and \(k\), \(\pi_{jk}\) is constant over time.

  7. The Markov chain is said to be stationary if the marginal distribution is the same over time, i.e. for each \(j\), \(\delta_j = \Pr\{ C_i = j \}\) is constant for all \(i\). The marginal distribution is denoted by \(\delta = (\delta_1, \cdots, \delta_m)\).

List Object pm

The list object pm contains parameter values for the probability distribution of the observed process that are dependent on the hidden Markov state. These parameters are generally required to be estimated. See “Modifications” in topic Mstep when some do not require estimation.

Assume that the hidden Markov chain has \(m\) states, and that there are \(\ell\) parameters that are dependent on the hidden Markov state. Then the list object pm should contain \(\ell\) named vector components each of length \(m\). The names are determined by the required probability distribution.

For example, if distn == "norm", the arguments names must coincide with those used by the functions dnorm or rnorm, which are mean and sd. Each must be specified in either pm or pn. If they both vary according to the hidden Markov state then pm should have the named components mean and sd. These are both vectors of length \(m\) containing the means and standard deviations of the observed process when the hidden Markov chain is in each of the \(m\) states. If, for example, sd was “time” dependent, then sd would be contained in pn (see below).

If distn == "pois", then pm should have one component named lambda, being the parameter name in the function dpois. Even if there is only one parameter, the vector component should still be within a list and named.

List Object pn

The list object pn contains parameter values of the probability distribution for the observed process that are dependent on the observation number or “time”. These parameters are assumed to be known.

Assume that the observed process is of length \(n\), and that there are \(\ell\) parameters that are dependent on the observation number or time. Then the list object pn should contain \(\ell\) named vector components each of length \(n\). The names, as in pm, are determined by the required probability distribution.

For example, in the observed process we may count the number of successes in a known number of Bernoulli trials, i.e. the number of Bernoulli trials is known at each time point, but the probability of success varies according to a hidden Markov state. The prob parameter of rbinom (or dbinom) would be specified in pm and the size parameter would specified in pn.

One could also have a situation where the observed process was Gaussian, with the means varying according to the hidden Markov state, but the variances varying non-randomly according to the observation number (or vice versa). Here mean would be specified within pm and sd within pn. Note that a given parameter can only occur within one of pm or pn.

Complete Data Likelihood

The “complete data likelihood”, \(L_c\), is $$ L_c = \Pr\{ X_1=x_1, \cdots, X_n=x_n, C_1=c_1, \cdots, C_n=c_n \}\,. $$ This can be shown to be $$ \Pr\{ X_1=x_1 \,|\, C_1=c_1 \} \Pr\{ C_1=c_1 \} \prod_{i=2}^n \Pr\{ X_i=x_i \,|\, C_i=c_i \} \Pr\{ C_i=c_i \,|\, C_{i-1}=c_{i-1} \}\,, $$ and hence, substituting model parameters, we get $$ L_c = \delta_{c_1} \pi_{c_1c_2} \pi_{c_2c_3} \cdots \pi_{c_{n-1}c_n} \prod_{i=1}^n \Pr\{ X_i=x_i \,|\, C_i=c_i \}\,, $$ and so $$ \log L_c = \log \delta_{c_1} + \sum_{i=2}^n \log \pi_{c_{i-1}c_i} + \sum_{i=1}^n \log \Pr\{ X_i=x_i \,|\, C_i=c_i \}\,. $$ Hence the “complete data likelihood” is split into three terms: the first relates to parameters of the marginal distribution (Markov chain), the second to the transition probabilities, and the third to the distribution parameters of the observed random variable. When the Markov chain is non-stationary, each term can be maximised separately.

Stationarity

When the hidden Markov chain is assumed to be non-stationary, the complete data likelihood has a neat structure, in that \(\delta\) only occurs in the first term, \(\Pi\) only occurs in the second term, and the parameters associated with the observed probabilities only occur in the third term. Hence, the likelihood can easily be maximised by maximising each term individually. In this situation, the estimated parameters using BaumWelch will be the “exact” maximum likelihood estimates.

When the hidden Markov chain is assumed to be stationary, \(\delta = \Pi^\prime \delta\) (see topic compdelta), and then the first two terms of the complete data likelihood determine the transition probabilities \(\Pi\). This raises more complicated numerical problems, as the first term is effectively a constraint. In our implementation of the EM algorithm, we deal with this in a slightly ad-hoc manner by effectively disregarding the first term, which is assumed to be relatively small. In the M-step, the transition matrix is determined by the second term, then \(\delta\) is estimated using the relation \(\delta = \delta \Pi\). Hence, using the BaumWelch function will only provide approximate maximum likelihood estimates. Exact solutions can be calculated by directly maximising the likelihood function, see first example in neglogLik.

References

Cited references are listed on the HiddenMarkov manual page.

Examples

Run this code
# NOT RUN {
#-----  Test Gaussian Distribution -----

Pi <- matrix(c(1/2, 1/2,   0,
               1/3, 1/3, 1/3,
                 0, 1/2, 1/2),
             byrow=TRUE, nrow=3)

delta <- c(0, 1, 0)

x <- dthmm(NULL, Pi, delta, "norm",
           list(mean=c(1, 6, 3), sd=c(0.5, 1, 0.5)))

x <- simulate(x, nsim=1000)

#    use above parameter values as initial values
y <- BaumWelch(x)

print(summary(y))
print(logLik(y))
hist(residuals(y))

#   check parameter estimates
print(sum(y$delta))
print(y$Pi %*% rep(1, ncol(y$Pi)))


#-----  Test Poisson Distribution  -----

Pi <- matrix(c(0.8, 0.2,
               0.3, 0.7),
             byrow=TRUE, nrow=2)

delta <- c(0, 1)

x <- dthmm(NULL, Pi, delta, "pois", list(lambda=c(4, 0.1)),
           discrete = TRUE)

x <- simulate(x, nsim=1000)

#    use above parameter values as initial values
y <- BaumWelch(x)

print(summary(y))
print(logLik(y))
hist(residuals(y))

#   check parameter estimates
print(sum(y$delta))
print(y$Pi %*% rep(1, ncol(y$Pi)))


#-----  Test Exponential Distribution  -----

Pi <- matrix(c(0.8, 0.2,
               0.3, 0.7),
             byrow=TRUE, nrow=2)

delta <- c(0, 1)

x <- dthmm(NULL, Pi, delta, "exp", list(rate=c(2, 0.1)))

x <- simulate(x, nsim=1000)

#    use above parameter values as initial values
y <- BaumWelch(x)

print(summary(y))
print(logLik(y))
hist(residuals(y))

#   check parameter estimates
print(sum(y$delta))
print(y$Pi %*% rep(1, ncol(y$Pi)))


#-----  Test Beta Distribution  -----

Pi <- matrix(c(0.8, 0.2,
               0.3, 0.7),
             byrow=TRUE, nrow=2)

delta <- c(0, 1)

x <- dthmm(NULL, Pi, delta, "beta", list(shape1=c(2, 6), shape2=c(6, 2)))

x <- simulate(x, nsim=1000)

#    use above parameter values as initial values
y <- BaumWelch(x)

print(summary(y))
print(logLik(y))
hist(residuals(y))

#   check parameter estimates
print(sum(y$delta))
print(y$Pi %*% rep(1, ncol(y$Pi)))


#-----  Test Binomial Distribution  -----

Pi <- matrix(c(0.8, 0.2,
               0.3, 0.7),
             byrow=TRUE, nrow=2)

delta <- c(0, 1)

#   vector of "fixed & known" number of Bernoulli trials
pn <- list(size=rpois(1000, 10)+1)

x <- dthmm(NULL, Pi, delta, "binom", list(prob=c(0.2, 0.8)), pn,
           discrete=TRUE)

x <- simulate(x, nsim=1000)

#    use above parameter values as initial values
y <- BaumWelch(x)

print(summary(y))
print(logLik(y))
hist(residuals(y))

#   check parameter estimates
print(sum(y$delta))
print(y$Pi %*% rep(1, ncol(y$Pi)))


#-----  Test Gamma Distribution  -----

Pi <- matrix(c(0.8, 0.2,
               0.3, 0.7),
             byrow=TRUE, nrow=2)

delta <- c(0, 1)

pm <- list(rate=c(4, 0.5), shape=c(3, 3))

x <- seq(0.01, 10, 0.01)
plot(x, dgamma(x, rate=pm$rate[1], shape=pm$shape[1]),
     type="l", col="blue", ylab="Density")
points(x, dgamma(x, rate=pm$rate[2], shape=pm$shape[2]),
       type="l", col="red")

x <- dthmm(NULL, Pi, delta, "gamma", pm)

x <- simulate(x, nsim=1000)

#    use above parameter values as initial values
y <- BaumWelch(x)

print(summary(y))
print(logLik(y))
hist(residuals(y))

#   check parameter estimates
print(sum(y$delta))
print(y$Pi %*% rep(1, ncol(y$Pi)))
# }

Run the code above in your browser using DataLab