lnre.gigp: The Generalized Inverse Gauss-Poisson (GIGP) LNRE Model (zipfR)

Description

The Generalized Inverse Gauss-Poisson (GIGP) LNRE model of Sichel (1971).

The constructor function lnre.gigp is not user-visible. It is invoked implicitly when lnre is called with LNRE model type "gigp".

Usage

lnre.gigp(gamma=-.5, B=.01, C=.01, param=list())
  ## user call: lnre("gigp", spc=spc) or lnre("gigp", gamma=-.5, B=.01, C=.01)

Arguments

gamma

the shape parameter $\gamma$, a negative number in the range $(-1,0)$. $\gamma$ corresponds to $-\alpha$ in the Zipf-Mandelbrot notation.

the low-frequency decay parameter $b$, a non-negative number. This parameter determines how quickly the type density function vanishes for $\pi \to 0$, with larger values corresponding to faster decay.

the high-frequency decay parameter $c$, a non-negative number. This parameter determines how quickly the type density function vanishes for large values of $\pi$, with smaller values corresponding to faster decay.

param

a list of parameters given as name-value pairs (alternative method of parameter specification)

Value

A partially initialized object of class lnre.gigp, which is completed and passed back to the user by the lnre function. See lnre for a detailed description of lnre.gigp objects (as a subclass of lnre).

Mathematical Details

Despite its fance name, the Generalized Inverse Gauss-Poisson or GIGP model belongs to the same class of LNRE models as ZM and fZM. This class of models is characterized by a power-law in the type density function and derives from the Zipf-Mandelbrot law (see lnre.zm for details on the relationship between power-law LNRE models and the Zipf-Mandelbrot law).

The GIGP model is given by the type density function

$$ g(\pi) := C\cdot \pi^{\gamma - 1} \cdot e^{- \frac{\pi}{c} - \frac{b^2 c}{4 \pi}} $$

with parameters $-1 < \gamma < 0$ and $b, c \ge 0$. The normalizing constant is

$$ C = \frac{(2 / bc)^{\gamma+1}}{K_{\gamma+1}(b)} $$

and the population vocabulary size is

$$ S = \frac{2}{bc} \cdot \frac{K_{\gamma}(b)}{K_{\gamma+1}(b)} $$

Note that the "shape" parameter $\gamma$ corresponds to $-\alpha$ in the ZM and fZM models. The GIGP model was introduced by Sichel (1971). See Baayen (2001, 89-93) for further details.

Details

The parameters of the GIGP model can either be specified as immediate arguments:

    lnre.gigp(gamma=-.47, B=.001, C=.001)

or as a list of name-value pairs:

    lnre.gigp(param=list(gamma=-.47, B=.001, C=.001))

which is usually more convenient when the constructor is invoked by another function (such as lnre). If both immediate arguments and the param list are given, the immediate arguments override conflicting values in param. For any parameters that are neither specified as immediate arguments nor listed in param, the defaults from the function prototype are inserted.

The lnre.gigp constructor also checks the types and ranges of parameter values and aborts with an error message if an invalid parameter is detected.

Notice that the implementation of GIGP leads to numerical problems when estimating the expected frequency of high spectrum elements (you might start worrying if you need to go above $m=150$).

Note that the parameters $b$ and $c$ are normally written in lowercase (e.g. Baayen 2001). For the technical reasons, it was necessary to use uppercase letters B and C in this implementation.

References

Baayen, R. Harald (2001). Word Frequency Distributions. Kluwer, Dordrecht.

Sichel, H. S. (1971). On a family of discrete distributions particularly suited to represent long-tailed frequency data. Proceedings of the Third Symposium on Mathematical Statistics, 51-97.