BrownAdj.spc
, BrownNoun.spc
and BrownVer.spc
are frequency spectra of all the Brown corpus words tagged as
adjectives, nouns and verbs, respectively. BrownAdj.emp.vgc
,
BrownNoun.emp.vgc
and BrownVer.emp.vgc
are the
corresponding observed vocabulary growth curves (tracking the
development of V
and V(1)
, like all the files with
suffix .emp.vgc
below).
BrownImag.spc
and BrownInform.spc
are frequency
spectra of the Brown corpus words subdivided into the two main
stylistic partitions of the corpus, i.e., imaginative and
informative prose, respectively. BrownImag.emp.vgc
and
BrownInform.emp.vgc
are the corresponding observed vocabulary
growth curves.
Brown100k.spc
is the spectrum of the first 100,000 tokens in
the Brown (useful, e.g., for extrapolation experiments in which we
want to estimate a lnre
model on a subset of the data
available). The corresponding observed growth curve can be easily
obtained from the one for the whole Brown (Brown.emp.vgc
).
Notice that we removed numbers and other forms of non-linguistic
material before collecting any data from the Brown.