Learn R Programming

VGAM (version 0.8-2)

mix2normal1: Mixture of Two Univariate Normal Distributions

Description

Estimates the five parameters of a mixture of two univariate normal distributions by maximum likelihood estimation.

Usage

mix2normal1(lphi="logit", lmu="identity", lsd="loge",
            ephi=list(), emu1=list(), emu2=list(), esd1=list(), esd2=list(),
            iphi=0.5, imu1=NULL, imu2=NULL, isd1=NULL, isd2=NULL,
            qmu=c(0.2, 0.8), equalsd=TRUE, nsimEIM=100, zero=1)

Arguments

lphi
Link function for the parameter $\phi$. See Links for more choices.
lmu
Link function applied to each $\mu$ parameter. See Links for more choices.
lsd
Link function applied to each $\sigma$ parameter. See Links for more choices.
ephi, emu1, emu2, esd1, esd2
List. Extra argument for each of the links. See earg in Links for general information. If equalsd=TRUE then esd1 must equal esd2.
iphi
Initial value for $\phi$, whose value must lie between 0 and 1.
imu1, imu2
Optional initial value for $\mu_1$ and $\mu_2$. The default is to compute initial values internally using the argument qmu.
isd1, isd2
Optional initial value for $\sigma_1$ and $\sigma_2$. The default is to compute initial values internally based on the argument qmu. Currently these are not great, therefore using these arguments where practical is a good ide
qmu
Vector with two values giving the probabilities relating to the sample quantiles for obtaining initial values for $\mu_1$ and $\mu_2$. The two values are fed in as the probs argument into
equalsd
Logical indicating whether the two standard deviations should be constrained to be equal. If TRUE then the appropriate constraint matrices will be used.
zero
An integer specifying which linear/additive predictor is modelled as intercepts only. If given, the value or values must be from the set ${1,2,\ldots,5}$. The default is the first one only, meaning $\phi$ is a single parameter even when there ar

Value

  • An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm, and vgam.

Warning

Numerical problems can occur and half-stepping is not uncommon. If failure to converge occurs, try inputting better initial values, e.g., by using iphi, qmu, imu1, imu2, isd1, isd2, etc. This VGAM family function should be used with care.

Details

The probability density function can be loosely written as $$f(y) = \phi \, N(\mu_1,\sigma_1) + (1-\phi) \, N(\mu_2, \sigma_2)$$ where $\phi$ is the probability an observation belongs to the first group. The parameters $\mu_1$ and $\mu_2$ are the means, and $\sigma_1$ and $\sigma_2$ are the standard deviations. The parameter $\phi$ satisfies $0 < \phi < 1$. The mean of $Y$ is $\phi \mu_1 + (1-\phi) \mu_2$ and this is returned as the fitted values. By default, the five linear/additive predictors are $(logit(\phi), \mu_1, \log(\sigma_1), \mu_2, \log(\sigma_2))^T$. If equalsd=TRUE then $\sigma_1 = \sigma_2$ is enforced.

References

McLachlan, G. J. and Peel, D. (2000) Finite Mixture Models. New York: Wiley.

Everitt, B. S. and Hand, D. J. (1981) Finite Mixture Distributions. London: Chapman & Hall.

See Also

normal1, Normal, mix2poisson.

Examples

Run this code
mu1 =  99; mu2 = 150
sd1 = sd2 = exp(3)
(phi = logit(-1, inverse=TRUE))
nn = 1000
mdata = data.frame(y = ifelse(runif(nn) < phi, rnorm(nn, mu1, sd1),
                                               rnorm(nn, mu2, sd2)))
fit = vglm(y ~ 1, mix2normal1(equalsd=TRUE), mdata)

# Compare the results
cfit = coef(fit)
round(rbind('Estimated'=c(logit(cfit[1], inv=TRUE),
    cfit[2], exp(cfit[3]), cfit[4]), 'Truth'=c(phi, mu1, sd1, mu2)), dig=2)

# Plot the results
xx = with(mdata, seq(min(y), max(y), len=200))
plot(xx, (1-phi)*dnorm(xx, mu2, sd2), type="l", xlab="y",
     main="Red=estimate, blue=truth", col="blue", ylab="Density")
phi.est = logit(coef(fit)[1], inverse=TRUE)
sd.est = exp(coef(fit)[3])
lines(xx, phi*dnorm(xx, mu1, sd1), col="blue")
lines(xx, phi.est * dnorm(xx, Coef(fit)[2], sd.est), col="red")
lines(xx, (1-phi.est) * dnorm(xx, Coef(fit)[4], sd.est), col="red")
abline(v=Coef(fit)[c(2,4)], lty=2, col="red")
abline(v=c(mu1, mu2), lty=2, col="blue")

Run the code above in your browser using DataLab