Fit mixtures of von Mises-Fisher distributions.
movMF(x, k, control = list(), ...)
An object of class "movMF"
representing the fitted mixture of
von Mises-Fisher distributions, which is a list containing at least
the following components:
a matrix with rows giving the fitted parameters of the mixture components.
a numeric vector with the fitted mixture probabilities.
See vMF for the employed parametrization of the von Mises-Fisher distribution.
a numeric data matrix, with rows corresponding to observations. Standardized to unit row lengths if necessary. Can be a dense matrix, a simple triplet matrix (package slam), or a dgTMatrix (package Matrix).
an integer giving the desired number of mixture components (classes).
a list of control parameters. See Details.
a list of control parameters (overriding those specified
in control
).
movMF
returns an object of class "movMF"
representing
the fitted mixture of von Mises-Fisher distributions model. Available
methods for such objects include coef
,
logLik
, print
and predict
.
predict
has an extra type
argument with possible
values "class_ids"
(default) and "memberships"
for
indicating hard or soft prediction, respectively.
The mixture of von Mises-Fisher distributions is fitted using EM
variants as specified by control option E
(specifying the
E-step employed), with possible values "softmax"
(default),
"hardmax"
or "stochmax"
where the first two implement
algorithms soft-moVMF and hard-moVMF of Banerjee et al (2005). For
"stochmax"
, class assignments are drawn from the posteriors for
each observation in the E-step as outlined as SEM in Celeux and
Govaert (1992). The stopping criterion for this algorithm is by
default changed to not check for convergence (logical control option
converge
), but to return the parameters with the maximum
likelihood encountered. E
may be abbreviated.
In the M-step, the parameters \(\theta\) of the respective component
distributions are estimated via maximum likelihood, which is
accomplished by taking \(\theta\) proportional to suitable weighted
sample means \(\bar{x}\), with length \(\kappa\) solving the
equation \(A_d(\kappa) = \|\bar{x}\|\), where
\(A_d(\kappa) = I_{d/2}(\kappa) / I_{d/2-1}(\kappa)\) with \(I\)
the modified Bessel function of the first kind. Via control argument
kappa
, one can specify how to (approximately) solve these
equations, and whether a common (possibly given) length \(\kappa\)
should be employed. If kappa
is a number, it gives a common
length to be employed. If it is a character string, it specifies the
method to be used for solving the \(\kappa\) equation. The possible
methods are:
"Banerjee_et_al_2005"
uses the approximation of Banerjee et al (2005).
"Tanabe_et_al_2007"
uses the fixed-point iteration of
Tanabe et al (2007) with starting point for \(\kappa\) in the
interval established by Tanabe et al (2007) implied by a given
c
with values in [0, 2]. The default is c
= 1, the
mid-point of the interval.
"Sra_2012"
uses two Newton steps as suggested in Sra (2012) starting in the approximation of Banerjee et al (2005).
"Song_et_al_2012"
uses two Halley steps as suggested in Song et al (2012) starting in the approximation of Banerjee et al (2005).
"uniroot"
uses a straightforward call to
uniroot
with the bounds established in Hornik and
Grün (2014).
"Newton"
uses a full Newton algorithm started in the approximation of Hornik and Grün (2014).
"Halley"
uses a full Halley algorithm started in the approximation of Hornik and Grün (2014).
"hybrid"
implements a combination of a
derivative-based step (Newton or Halley) and a bisection step as
outlined in Press et al. (2002). The derivative-based step can be
specified via the argument step
which expects a function
performing this step. Currently step_Newton
and
step_Halley
(default) are available.
"Newton_Fourier"
(default)uses a variant of the Newton-Fourier method for strictly increasing concave functions as for example given in Atkinson (1989, pp. 62--64). Concavity can be established using Hornik and Grün (2013).
The lower-cased version of the given kappa
specification is
matched against the lower-cased names of the available methods using
pmatch
. Finally, to indicate using a common (but not
given) \(\kappa\) for all component distributions, kappa
should be a list with element common = TRUE
(and optionally a
character string giving the estimation method).
Additional control parameters are as follows.
maxiter
an integer giving the maximal number of EM iterations to be performed. Default: 100.
reltol
the minimum relative improvement of the
objective function per iteration. If improvement is less, the EM
algorithm will stop under the assumption that no further
significant improvement can be made. Defaults to
sqrt(.Machine$double.eps)
.
ids
either a vector of class memberships or
TRUE
which implies that the class memberships are obtained
from the attribute named "z"
of the input data; these class
memberships are used for initializing the EM algorithm and the
algorithm is stopped after the first iteration.
start
a specification of the starting values to be
employed. Can be a list of matrices giving the memberships of
objects to components, or of vectors giving component ids
(numbers from 1 to the given k
). Can also be a character
vector with elements "i"
(randomly pick component ids for
the observations), or one of "p"
, "S"
or "s"
.
The latter first determine component “prototypes”, and then
determine an optimal “fuzzy” membership matrix from the
implied cosine dissimilarities between observations and
prototypes. Prototypes are obtained as follows: for "p"
,
observations are randomly picked. For "S"
, one takes the
first prototype to minimize total cosine dissimilarity to the
observations, and then successively picks observations farthest
away from the already picked prototypes. For "s"
, one
takes a randomly chosen observation as the first prototype, and
then proceeds as for "S"
.
By default, initialization method "p"
is used.
If several starting values are specified, the EM algorithm is performed individually to each starting value, and the best solution found is returned.
nruns
an integer giving the number of EM runs to be
performed. Default: 1.
Only used if start
is not given.
minalpha
a numeric indicating the minimum prior probability. Components falling below this threshold are removed during the iteration. If \(\ge 1\), this is taken as the minimal number of observations in a component. Default: 0.
converge
a logical, if TRUE
the EM algorithm is
stopped if the reltol
criterion is met and the current
parameter estimate is returned. If FALSE
the EM algorithm
is run for maxiter
iterations and the parametrizations
with the maximum likelihood encountered during the EM algorithm is
returned. Default: TRUE
, changed to FALSE
if
E="stochmax"
.
verbose
a logical indicating whether to provide
some output on algorithmic progress.
Defaults to getOption("verbose")
.
One popular application context of mixtures of von Mises-Fisher distributions is text mining, where the data matrices are typically very large and sparse. The provided implementation should be able to handle such large corpora with reasonable efficiency by employing suitable sparse matrix representations and computations. In addition, straightforward computations of the normalizing constants in the von Mises-Fisher densities (see movMF_distribution) by directly employing the modified Bessel functions of the first kind are computationally infeasible for large \(d\) (dimension of the observations) and/or values of the parameter lengths \(\kappa\). Instead, we use suitably scaled hypergeometric-type power series for computing (the logarithms of) the normalizing constants.
K. E. Atkinson (1989). An Introduction to Numerical Analysis. 2nd edition. John Wiley & Sons.
A. Banerjee, I. S. Dhillon, J. Ghosh, and S. Sra (2005). Clustering on the unit hypersphere using von Mises-Fisher distributions. Journal of Machine Learning Research, 6, 1345--1382. https://jmlr.csail.mit.edu/papers/v6/banerjee05a.html.
G. Celeux, and G. Govaert (1992). A classification EM algorithm for clustering and two stochastic versions. Computational Statistics & Data Analysis, 14, 315--332. tools:::Rd_expr_doi("10.1016/0167-9473(92)90042-E").
K. Hornik, and B. Grün (2013). Amos-type bounds for modified Bessel function ratios. Journal of Mathematical Analysis and Applications, 408(1), 91--101. tools:::Rd_expr_doi("10.1016/j.jmaa.2013.05.070").
K. Hornik, and B. Grün (2014). On maximum likelihood estimation of the concentration parameter of von Mises-Fisher distributions. Computational Statistics, 29, 945--957. tools:::Rd_expr_doi("10.1007/s00180-013-0471-0").
W. H. Press, S. A. Teukolsky, W. T. Vetterling and Brian P. Flannery (2002). Numerical Recipes in C: The Art of Scientific Computing. 2nd edition. Cambridge University Press.
H. Song, J. Liu, and G. Wang. High-order parameter approximation for von Mises-Fisher distributions. Applied Mathematics and Computation, 218, 11880--11890. tools:::Rd_expr_doi("10.1016/j.amc.2012.05.050").
S. Sra (2012). A short note on parameter approximation for von Mises-Fisher distributions: and a fast implementation of \(I_s(x)\). Computational Statistics, 27, 177--190. tools:::Rd_expr_doi("10.1007/s00180-011-0232-x").
A. Tanabe, K. Fukumizu, S. Oba, T. Takenouchi, and S. Ishii. Parameter estimation for von Mises-Fisher distributions. Computational Statistics, 22, 145--157. tools:::Rd_expr_doi("10.1007/s00180-007-0030-7").
## Generate and fit a "small-mix" data set a la Banerjee et al.
mu <- rbind(c(-0.251, -0.968),
c(0.399, 0.917))
kappa <- c(4, 4)
theta <- kappa * mu
theta
alpha <- c(0.48, 0.52)
## Generate a sample of size n = 50 from the von Mises-Fisher mixture
## with the above parameters.
set.seed(123)
x <- rmovMF(50, theta, alpha)
## Fit a von Mises-Fisher mixture with the "right" number of components,
## using 10 EM runs.
set.seed(123)
y2 <- movMF(x, 2, nruns = 10)
## Inspect the fitted parameters:
y2
## Compare the fitted classes to the true ones:
table(True = attr(x, "z"), Fitted = predict(y2))
## To use a common kappa:
y2cv <- movMF(x, 2, nruns = 10, kappa = list(common = TRUE))
## To use a common kappa fixed to the true value of 4:
y2cf <- movMF(x, 2, nruns = 10, kappa = 4)
## Comparing solutions via BIC:
sapply(list(y2, y2cf, y2cv), BIC)
## Use a different kappa solver:
set.seed(123)
y2a <- movMF(x, 2, nruns = 10, kappa = "uniroot")
y2a
Run the code above in your browser using DataLab