lnre.fzm: The finite Zipf-Mandelbrot (fZM) LNRE Model (zipfR)

Description

The finite Zipf-Mandelbrot (fZM) LNRE model of Evert (2004).

The constructor function lnre.fzm is not user-visible. It is invoked implicitly when lnre is called with LNRE model type "fzm".

Usage

lnre.fzm(alpha=.8, A=1e-9, B=.01, param=list())
  ## user call: lnre("fzm", spc=spc) or lnre("fzm", alpha=.8, A=1e-9, B=.01)

Arguments

alpha

the shape parameter $\alpha$, a number in the range $(0,1)$

the lower cutoff parameter $A$, a positive number. Note that a valid set of parameters must satisfy $0 < A < B$.

the upper cutoff parameter $B$, a positive number ($B > 1$ is allowed although it is inconsistent with the interpretation of $B$)

param

a list of parameters given as name-value pairs (alternative method of parameter specification)

Value

A partially initialized object of class lnre.fzm, which is completed and passed back to the user by the lnre function. See lnre for a detailed description of lnre.fzm objects (as a subclass of lnre).

Mathematical Details

Similar to ZM, the fZM model is a LNRE re-formulation of the Zipf-Mandelbrot law for a population with a finite vocabulary size $S$, i.e.

$$ \pi_k = \frac{C}{(k + b) ^ a} $$

for $k = 1, \ldots, S$. The parameters of the Zipf-Mandelbrot law are $a > 1$, $b \ge 0$ and $S$ (see also Baayen 2001, 101ff). The fZM model is given by the type density function

$$ g(\pi) := C\cdot \pi^{-\alpha-1} $$

for $A \le \pi \le B$ (and $\pi = 0$ otherwise), and has three parameters $0 < \alpha < 1$ and $0 < A < B \le 1$. The normalizing constant is

$$ C = \frac{ 1 - \alpha }{ B^{1 - \alpha} - A^{1 - \alpha} } $$

and the population vocabulary size is

$$ S = \frac{1 - \alpha}{\alpha} \cdot \frac{ A^{-\alpha} - B^{-\alpha} }{ B^{1 - \alpha} - A^{1 - \alpha} } $$

See Evert (2004) and the lnre.zm manpage for further details.

Details

The parameters of the fZM model can either be specified as immediate arguments:

    lnre.fzm(alpha=.5, A=5e-12, B=.1)

or as a list of name-value pairs:

    lnre.fzm(param=list(alpha=.5, A=5e-12, B=.1))

which is usually more convenient when the constructor is invoked by another function (such as lnre). If both immediate arguments and the param list are given, the immediate arguments override conflicting values in param. For any parameters that are neither specified as immediate arguments nor listed in param, the defaults from the function prototype are inserted.

The lnre.fzm constructor also checks the types and ranges of parameter values and aborts with an error message if an invalid parameter is detected.

NB: parameter estimation is faster and more robust for the inexact fZM model, so you might consider passing the exact=FALSE option to lnre unless you intend to make predictions for small sample sizes $N$ and/or high spectrum elements $E[V_m(N)]$ ($m \gg 1$) with the model.

References

Baayen, R. Harald (2001). Word Frequency Distributions. Kluwer, Dordrecht.

Evert, Stefan (2004). A simple LNRE model for random character sequences. Proceedings of JADT 2004, 411-422.