This manpage describes technical details of LNRE models and parameter
estimation. It is intended developers who want to implement new LNRE
models, improve the parameter estimation algorithms, or work directly
with the internals of lnre
objects. All information required
for standard applications of LNRE models can be found on the
lnre
manpage.
A LNRE model with estimated (or manually specified) parameter values
is represented by an object belonging to a suitable subclass of
lnre
. The specific class depends on the type of LNRE model, as
specified in the type
argument to the lnre
constructor
function (e.g. lnre.fzm
for a fZM model selected with
type="fzm"
).
All subtypes of lnre
object share the same data format, viz. a
list with the following components:
a character string specifying the class of LNRE model,
e.g. "fzm"
for a finite Zipf-Mandelbrot model
a character string specifying a human-readable name for
the LNRE model, e.g. "finite Zipf-Mandelbrot"
list of named model parameters, e.g. (alpha=.8,
B=.01)
for a ZM model
a list of "secondary" parameters, i.e. constants that
can be determined from the model parameters but are frequently used
in the formulae for expected values, variances, etc.;
e.g. (C=.5)
for the ZM model above
population size, i.e. number of types in the population
described by the LNRE model (may be Inf
, e.g. for a ZM
model)
whether approximations are allowed when calculating
expectations and variances (FALSE
) or not (TRUE
)
whether to use equations for multionmial sampling
(TRUE
) or independent Poisson sampling (FALSE
)
an object of class spc
, the observed frequency
spectrum from which the model parameters have been estimated (only
if the LNRE model is based on empirical data)
an object of class lnre.gof
with goodness-of-fit
information for the estimated LNRE model (only if based on empirical
data, i.e. if the spc
component is also present)
a set of utility functions, given as a list with the following components:
update
:function with signature (self, param,
transformed=FALSE)
, which updates the parameters of the LNRE
model self
with the values in param
, checks that
their values are in the allowed range, and re-calculates
"secondary" parameters and lexicon size if necessary. If
transformed=TRUE
, the specified parameters are translated
back to normal scale before the update (see below). Of course,
self
should be the object from which the utility function
was called. update
returns a modified version of the
object self
.
transform
:function with signature (param,
inverse=FALSE)
, which transform model parameters (given as a
list in the argument param
) to an unbounded range
centered at 0, and back (with option inverse=TRUE
). The
transformed model parameters are used for parameter estimation,
so that unconstrained minimization algorithms can be applied.
The link function for the transformation depends on the LNRE
model and the "distribution" of each parameter. A felicitous
choice can be crucial for robust and quick parameter estimation,
especially with Newton-like gradient algorithms. Note that
setting all transformed parameters to 0 should provide a
reasonable starting point for the parameter estimation.
print
:partial print method for this subclass of
LNRE model, which displays the name of the model, its
parameters, and optionally some additional information (invoked
internally by print.lnre
and summary.lnre
)
label
:returns a string with a short description of the LNRE model, including its subclass and approximate values for its parameters (e.g. for use in legend text).
In order to implement a new class of LNRE models, the following steps
are necessary (illustrated on the example of a lognormal type density
function, introducing the new LNRE class lnre.lognormal
):
Provide a constructor function for LNRE models of this type
(here, lnre.lognormal
), which must accept the parameters of
the LNRE model as named arguments with reasonable default values (or
alternatively as a list passed in the param
argument). The
constructor must return a partially initialized object of an
appropriate subclass of lnre
(lnre.lognormal
in our
example), and make sure that this object also inherits from the
lnre
class.
Provide the update
, transform
, print
and label
utility functions for the LNRE model, which must be returned in the
util
field of the LNRE model object (see "Value" above).
Add the new type of LNRE model to the type
argument of
the generic lnre
constructor, and insert the new constructor
function (lnre.lognormal
) in the switch
call in the
body of lnre
.
As a minimum requirement, implementations of the EV
and
EVm
methods must be provided for the new LNRE model (in our
example, they will be named EV.lnre.lognormal
and
EVm.lnre.lognormal
).
If possible, provide equations for the type density,
probability density, type distribution, distribution function
and posterior distribution of
the new LNRE model, as implementations of the tdlnre
,
dlnre
, tplnre
/tqlnre
,
plnre
/qlnre
and postplnre
/postqlnre
methods for the new LNRE model class. If
all these functions are defined, log-scaled densities and random
number generation are automatically handled by generic
implementations.
Optionally, provide a custom function for parameter estimation
of the new LNRE model, as an implementation of the
estimate.model
method (here,
estimate.model.lnre.lognormal
). Custom parameter estimation
can considerably improve convergence and goodness-of-fit if it is
possible to obtain direct estimates for one or more of the
parameters, e.g. from the condition \(E[V] = V\). However, the
default Nelder-Mead algorithm is robust and produces satisfactory
results, as long as the LNRE model defines an appropriate parameter
transformation mapping. It is thus often more profitable to
optimize the transform
utility than to spend a lot of time
implementing a complicated parameter estimation function.
The best way to get started is to take a look at one of the existing implementations of LNRE models. The GIGP model represents a "minimum" implementation (without custom parameter estimation and distribution functions), whereas ZM and fZM provide good examples of custom parameter estimation functions.
Most operations on LNRE models (in particular, computation of expected
values and variances, distribution function and type distribution,
random sampling, etc.) are realized as S3 methods, so they are
automatically dispatched to appropriate implementations for the
various types of LNRE models (e.g., EV.lnre.zm
,
EV.lnre.fzm
and EV.lnre.gigp
for the EV
method).
For some methods (e.g. estimated variances VV
and VVm
),
a single generic implementation can be used for all model types,
provided through the base class (VV.lnre
and VVm.lnre
for variances).
If you want to implement new LNRE models, have a look at "Implementing LNRE Models" below.
Important note: LNRE model parameters can be passed as named
arguments to the lnre
constructor function when they are not
estimated automatically from an observed frequency spectrum. For this
reason, parameter names must be carefully chosen so that they do not
clash with other arguments of the lnre
function. Note that
because of R's argument matching rules, any parameter name that is a
prefix of a standard argument name will lead to such a clash.
In particular, single-letter parameters (such as \(b\) and \(c\)
for the GIGP model) should always be written in uppercase (B
and C
in lnre.gigp
).
User-level information about LNRE models and parameter estimation can
be found on the lnre
manpage.
Descriptions of the different LNRE models implemented in zipfR
and their parameters are given on separate manpages
lnre.zm
, lnre.fzm
and
lnre.gigp
. These descriptions are intended for
interested end users, but are not required for standard applications
of the models.
The estimate.model
manpage explains details of the
parameter estimation procedure (intended for developers).
See lnre.goodness.of.fit
for a description of the
goodness-of-fit test performed after parameter estimation of an LNRE
model. This function can also be used to evaluate the predictions of
the model on a different data set.