Compute the expected vocabulary size \(E[V(N)]\) (with function
EV.spc
) or expected frequency spectrum \(E[V_m(N)]\) (with
function EVm.spc
) for a random sample of size \(N\) from a
given frequency spectrum (i.e., an object of class spc
). The
expectations are calculated by binomial interpolation (following
Baayen 2001, pp. 64-69).
Note that these functions are not user-visible. They can be called
implicitly through the generic methods EV
and EVm
,
applied to an object of type spc
.
# S3 method for spc
EV(obj, N, allow.extrapolation=FALSE, ...) # S3 method for spc
EVm(obj, m, N, allow.extrapolation=FALSE, ...)
an object of class spc
, representing a frequency
spectrum
positive integer value determining the frequency class \(m\) for which \(E[V_m(N)]\) be returned (or a vector of such values)
sample size \(N\) for which the expected vocabulary size or frequency spectrum are calculated (or a vector of sample sizes)
if TRUE
, the requested sample size
\(N\) may be larger than the sample size of the frequency spectrum
obj
, for binomial extrapolation. This obtion should
be used with great caution (see "Details" below).
additional arguments passed on from generic methods will be ignored
EV
returns the expected vocabulary size \(E[V(N)]\) for a
random sample of \(N\) tokens from the frequency spectrum
obj
, and EVm
returns the expected spectrum elements
\(E[V_m(N)]\) for a random sample of \(N\) tokens from obj
,
calculated by binomial interpolation.
These functions are naive implementations of binomial interpolation, using Equations (2.41) and (2.43) from Baayen (2001). No guarantees are made concerning their numerical accuracy, especially for extreme values of \(m\) and \(N\).
According to Baayen (2001), pp. 69-73., the same equations can also be
used for binomial extrapolation of a given frequency spectrum
to larger sample sizes. However, they become numerically unstable in
this case and will typically break down when extrapolating to more
than twice the size of the observed sample (Baayen 2001, p. 75).
Therefore, extrapolation has to be enabled explicitly with the option
allow.extrapolation=TRUE
and should be used with great caution.
Baayen, R. Harald (2001). Word Frequency Distributions. Kluwer, Dordrecht.
EV
and EVm
for the generic methods and
links to other implementations
spc.interp
and vgc.interp
are convenience
functions that compute an expected frequency spectrum or vocabulary
growth curve by binomial interpolation