The Generalized Inverse Gauss-Poisson (GIGP) LNRE model of Sichel (1971).
The constructor function lnre.gigp
is not user-visible. It is
invoked implicitly when lnre
is called with LNRE model type
"gigp"
.
lnre.gigp(gamma=-.5, B=.01, C=.01, param=list()) ## user call: lnre("gigp", spc=spc) or lnre("gigp", gamma=-.5, B=.01, C=.01)
the shape parameter \(\gamma\), a negative number in the range \((-1,0)\). \(\gamma\) corresponds to \(-\alpha\) in the Zipf-Mandelbrot notation.
the low-frequency decay parameter \(b\), a non-negative number. This parameter determines how quickly the type density function vanishes for \(\pi \to 0\), with larger values corresponding to faster decay.
the high-frequency decay parameter \(c\), a non-negative number. This parameter determines how quickly the type density function vanishes for large values of \(\pi\), with smaller values corresponding to faster decay.
a list of parameters given as name-value pairs (alternative method of parameter specification)
A partially initialized object of class lnre.gigp
, which is
completed and passed back to the user by the lnre function.
See lnre
for a detailed description of lnre.gigp
objects (as a subclass of lnre
).
Despite its fance name, the Generalized Inverse Gauss-Poisson
or GIGP model belongs to the same class of LNRE models as ZM
and fZM. This class of models is characterized by a power-law in the
type density function and derives from the Zipf-Mandelbrot law
(see lnre.zm
for details on the relationship between
power-law LNRE models and the Zipf-Mandelbrot law).
The GIGP model is given by the type density function
$$ g(\pi) := C\cdot \pi^{\gamma - 1} \cdot e^{- \frac{\pi}{c} - \frac{b^2 c}{4 \pi}} $$
with parameters \(-1 < \gamma < 0\) and \(b, c \ge 0\). The normalizing constant is
$$ C = \frac{(2 / bc)^{\gamma+1}}{K_{\gamma+1}(b)} $$
and the population vocabulary size is
$$ S = \frac{2}{bc} \cdot \frac{K_{\gamma}(b)}{K_{\gamma+1}(b)} $$
Note that the "shape" parameter \(\gamma\) corresponds to \(-\alpha\) in the ZM and fZM models. The GIGP model was introduced by Sichel (1971). See Baayen (2001, 89-93) for further details.
The parameters of the GIGP model can either be specified as immediate arguments:
lnre.gigp(gamma=-.47, B=.001, C=.001)
or as a list of name-value pairs:
lnre.gigp(param=list(gamma=-.47, B=.001, C=.001))
which is usually more convenient when the constructor is invoked by
another function (such as lnre
). If both immediate arguments
and the param
list are given, the immediate arguments override
conflicting values in param
. For any parameters that are
neither specified as immediate arguments nor listed in param
,
the defaults from the function prototype are inserted.
The lnre.gigp
constructor also checks the types and ranges of
parameter values and aborts with an error message if an invalid
parameter is detected.
Notice that the implementation of GIGP leads to numerical problems when estimating the expected frequency of high spectrum elements (you might start worrying if you need to go above \(m=150\)).
Note that the parameters \(b\) and \(c\) are normally written in
lowercase (e.g. Baayen 2001). For the technical reasons, it was
necessary to use uppercase letters B
and C
in this
implementation.
Baayen, R. Harald (2001). Word Frequency Distributions. Kluwer, Dordrecht.
Sichel, H. S. (1971). On a family of discrete distributions particularly suited to represent long-tailed frequency data. Proceedings of the Third Symposium on Mathematical Statistics, 51-97.
lnre
for pointers to relevant methods and functions for
objects of class lnre
, as well as a complete listing of LNRE
models implemented in the zipfR
library.