Mstep: M-Step of EM Algorithm for DTHMM

Description

Performs the maximisation step of the EM algorithm for a dthmm process. This function is called by the BaumWelch function. The Baum-Welch algorithm used in the HMM literature is a version of the EM algorithm.

Usage

Mstep.beta(x, cond, pm, pn, maxiter = 200)
Mstep.binom(x, cond, pm, pn)
Mstep.exp(x, cond, pm, pn)
Mstep.gamma(x, cond, pm, pn, maxiter = 200)
Mstep.glm(x, cond, pm, pn, family, link)
Mstep.lnorm(x, cond, pm, pn)
Mstep.logis(x, cond, pm, pn, maxiter = 200)
Mstep.norm(x, cond, pm, pn)
Mstep.pois(x, cond, pm, pn)

Value

A list object with the same structure as pm (see topic dthmm).

Arguments

x: is a vector of length $n$ containing the observed process.
cond: is an object created by Estep.
family: character string, the GLM family, one of "gaussian", "poisson", "Gamma" or "binomial".
link: character string, the link function. If family == "Binomial", then one of "logit", "probit" or "cloglog"; else one of "identity", "inverse" or "log".
pm: is a list object containing the current (Markov dependent) parameter estimates associated with the distribution of the observed process (see dthmm). These are only used as initial values if the algorithm within the Mstep is iterative.
pn: is a list object containing the observation dependent parameter values associated with the distribution of the observed process (see dthmm).
maxiter: maximum number of Newton-Raphson iterations.

Modifications and Extensions

The HiddenMarkov package calls the associated functions belonging to the specified probability distribution in a generic way. For example, if the argument distn in dthmm is "xyz", it will expect to find functions pxyz, dxyz, and Mstep.xyz. And if simulations are to be performed, it will require rxyz. In this section we describe the required format for the distribution related functions pxyz, dxyz, and rxyz; and for the function Mstep.xyz required for the M-step in the EM algorithm.

Consider the examples below of distribution related functions and their arguments. Note that the probability functions all have a first argument of q, and the last two arguments are all the same, with the same default values. Similarly, the density functions have a first argument of x, and the last argument is the same, with the same defaults. The arguments in the middle are peculiar to the given distribution, one argument for each distribution parameter. Note that the observed process x is univariate.


pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
pbeta(q, shape1, shape2, ncp = 0, lower.tail = TRUE, log.p = FALSE)
ppois(q, lambda, lower.tail = TRUE, log.p = FALSE)
pbinom(q, size, prob, lower.tail = TRUE, log.p = FALSE)

dnorm(x, mean = 0, sd = 1, log = FALSE) dbeta(x, shape1, shape2, ncp = 0, log = FALSE) dpois(x, lambda, log = FALSE) dbinom(x, size, prob, log = FALSE)

rnorm(n, mean = 0, sd = 1) rbeta(n, shape1, shape2, ncp = 0) rpois(n, lambda) rbinom(n, size, prob)

The functions pxyz (distribution function), dxyz (density) and rxyz (random number generator) must be consistent with the conventions used in the above examples. The software will deduce the distribution argument names from what is specified in pm and pn, and it will call these functions assuming that their argument list is consistent with those described above. The functions pxyz and dxyz are used in the forward and backward equations.

The functions dxyz, pxyz and rxyz must also behave in the same vectorised way as dnorm. For example, if x is a vector, and mean and sd are scalars, then dnorm(x, mean, sd) calculates the density for each element in x using the scalar values of mean and sd; thus the returned value is the same length as x. Alternatively, if x is a scalar and mean and sd are vectors, both of the same length, then the returned value is the same length as mean and is the density of x evaluated at the corresponding pairs of values of mean and sd. The third possibility is that x and one of the distribution parameters, say sd, are vectors of the same length, and mu is a scalar. Then the returned vector will be of the same length as x, where the $i$th value is the density at x[i] with mean mean and standard deviation sd[i]. Note that the functions for the Multinomial distribution do not have this behaviour. Here the vector x contains the counts for one multinomial experiment, so the vector is used to characterise the multivariate character of the random variable rather than multiple univariate realisations. Further, the distribution parameters (i.e. category probabilities) are characterised as one vector rather than a sequence of separate function arguments.

The other calculation, that is specific to the chosen distribution, is the maximisation in the M-step. If we have distn="xyz", then there should be a function called Mstep.xyz. Further, it should have arguments (x, cond, pm, pn); see for example Mstep.norm. The parameters that are estimated within this function are named in a consistent way with those that are defined within the dthmm arguments pm and pn. Notice that the estimates of mean and sd in Mstep.norm are weighted by cond$u. The calculations for cond$u are performed in the E-step, and utilise the distribution related functions "dxyz" and "pxyz". The values of cond$u are essentially probabilities that the process belongs to the given Markov state, hence, when we calculate the distributional parameters (like mu and sd in Mstep.norm) we calculate weighted sums using the probabilities cond$u. This procedure can be shown to give the maximum likelihood estimates of mu and sd, and hence a similar weighting should be used for the distribution "xyz" (see Harte, 2006, for further mathematical detail). One needs to take a little more care when dealing with a distributions like the beta, where the cross derivatives of the log likelihood between the parameters, i.e. $\partial^2 \log L /(\partial \alpha_1 \partial \alpha_2)$ are non-zero. See Mstep.beta for further details.

Now consider a situation where we want to modify the way in which a normal distribution is fitted. Say we know the Markov dependent means, and we only want to estimate the standard deviations. Since both parameters are Markov dependent, they both need to be specified in the pm argument of dthmm. The estimation of the distribution specific parameters takes place in the M-step, in this case Mstep.norm. To achieve what we want, we need to modify this function. In this case it is relatively easy (see code in “Examples” below). From the function Mstep.norm, take the code under the section if (all(nms==c("mean", "sd"))), i.e. both of the parameters are Markov dependent. However, replace the line where the mean is estimated to mean <- pm$mean, i.e. leave it as was initially specified. Unfortunately, one cannot easily modify the functions in a package namespace. The simple work-around here is to define a new distribution, say "xyz", then define a new function with the above changes called Mstep.xyz. However, the distribution related functions are just the same as those for the normal distribution, hence, define them as follows:


rxyz <- rnorm
dxyz <- dnorm
pxyz <- pnorm
qxyz <- qnorm

See the 2nd example below for full details.

Details

The functions Mstep.beta, Mstep.binom, Mstep.exp, Mstep.gamma, Mstep.lnorm, Mstep.logis, Mstep.norm and Mstep.pois perform the maximisation step for the Beta, Binomial, Exponential, Gamma, Log Normal, Logistic, Normal and Poisson distributions, respectively. Each function has the same argument list, even if specific arguments are redundant, because the functions are called from within other functions in a generic like manner. Specific notes for some follow.

Mstep.beta: The R functions for the Beta Distribution have arguments shape1, shape2 and ncp. We only use shape1 and shape2, i.e. ncp is assumed to be zero. Different combinations of "shape1" and "shape2" can be “time” dependent (specified in pn) and Markov dependent (specified in pm). However, each should only be specified in one (see topic dthmm).
Mstep.binom: The R functions for the Binomial Distribution have arguments size and prob. The size argument of the Binomial Distribution should always be specified in the pn argument (see topic dthmm).
Mstep.gamma: The R functions for the GammaDist have arguments shape, rate and scale. Since scale is redundant, we only use shape and rate. Different combinations of "shape" and "rate" can be “time” dependent (specified in pn) and Markov dependent (specified in pm). However, each should only be specified in one (see topic dthmm).
Mstep.lnorm: The R functions for the Lognormal Distribution have arguments meanlog and sdlog. Different combinations of "meanlog" and "sdlog" can be “time” dependent (specified in pn) and Markov dependent (specified in pm). However, each should only be specified in one (see topic dthmm).
Mstep.logis: The R functions for the Logistic Distribution have arguments location and scale. Different combinations of "location" and "scale" can be “time” dependent (specified in pn) and Markov dependent (specified in pm). However, each should only be specified in one (see topic dthmm).
Mstep.norm: The R functions for the Normal Distribution have arguments mean and sd. Different combinations of "mean" and "sd" can be “time” dependent (specified in pn) and Markov dependent (specified in pm). However, each should only be specified in one (see topic dthmm).

Examples

Run this code

#    Fit logistic distribution to a simple single sample

#    Simulate data
n <- 20000
location <- -2
scale <- 1.5

x <- rlogis(n, location, scale)

#    give each datum equal weight
cond <- NULL
cond$u <- matrix(rep(1/n, n), ncol=1)

#    calculate maximum likelihood parameter estimates
#    start iterations at values used to simulate
print(Mstep.logis(x, cond,
                  pm=list(location=location,
                          scale=scale)))

#-----------------------------------------------------
#   Example with Gaussian Observations
#   Assume that both mean and sd are Markov dependent, but the means
#   are known and sd requires estimation (See "Modifications" above).
#   One way is to define a new distribution called "xyz", say.

Mstep.xyz <- function(x, cond, pm, pn){
    #   this function is a modified version of Mstep.norm
    #   here the mean is fixed to the values specified in pm$mean
    nms <- sort(names(pm))
    n <- length(x)
    m <- ncol(cond$u)
    if (all(nms==c("mean", "sd"))){
        mean <- pm$mean
        sd <- sqrt(apply((matrix(x, nrow=n, ncol=m) - 
                   matrix(mean,
                   nrow=n, ncol=m, byrow=TRUE))^2 * cond$u, MARGIN=2,
                   FUN=sum)/apply(cond$u, MARGIN=2, FUN=sum))
        return(list(mean=mean, sd=sd))
    }
}

#   define the distribution related functions for "xyz"
#   they are the same as those for the Normal distribution
rxyz <- rnorm
dxyz <- dnorm
pxyz <- pnorm
qxyz <- qnorm

Pi <- matrix(c(1/2, 1/2,   0,
               1/3, 1/3, 1/3,
                 0, 1/2, 1/2),
             byrow=TRUE, nrow=3)
p1 <- c(1, 6, 3)
p2 <- c(0.5, 1, 0.5)
n <- 1000

pm <- list(mean=p1, sd=p2)

x <- dthmm(NULL, Pi, c(0, 1, 0), "xyz", pm, discrete=FALSE)

x <- simulate(x, n, seed=5)

#    use above parameter values as initial values
y <- BaumWelch(x)

print(y$delta)
print(y$pm)
print(y$Pi)

Run the code above in your browser using DataLab