Learn R Programming

languageR (version 1.5.0)

growth.fnc: Calculate vocabulary growth curve and vocabulary richness measures

Description

This function calculates, for an increasing sequence of text sizes, the observed number of types, hapax legomena, dis legomena, tris legomena, and selected measures of lexical richness.

Usage

growth.fnc(text = languageR::alice, size = 646, nchunks = 40, chunks = 0)

Arguments

text

A vector of strings representing a text.

size

An integer giving the size of a text chunk when the text is to be split into a series of equally-sized text chunks.

nchunks

An integer denoting the number of desired equally-sized text chunks.

chunks

An integer vector denoting the token sizes for which growth measures are required. When chunks is specified, size and nchunks are ignored.

Value

A growth object with methods for plotting, printing. As running this function on large texts may take some time, a period is printed on the output device for each completed chunk to indicate progress.

The data frame with the actual measures, which can be extracted with object.name@data$data, has the following columns.

Chunk

a numeric vector with chunk numbers.

Tokens

a numeric vector with the number of tokens up to and including the current chunk.

Types

a numeric vector with the number of types up to and including the current chunk.

HapaxLegomena

a numeric vector with the corresponding count of hapax legomena.

DisLegomena

a numeric vector with the corresponding count of dis legomena.

TrisLegomena

a numeric vector with the corresponding count of tris legomena.

Yule

a numeric vector with Yule's K.

Zipf

a numeric vector with the slope of Zipf's rank-frequency curve in the double-logarithmic plane.

TypeTokenRatio

a numeric vector with the ratio of types to tokens.

Herdan

a numeric vector with Herdan's C.

Guiraud

a numeric vector with Guiraud's R.

Sichel

a numeric vector with Sichel's S.

Lognormal

a numeric vector with mean log frequency.

References

R. H. Baayen (2001) Word Frequency Distributions, Dordrecht: Kluwer Academic Publishers.

Tweedie, F. J. & Baayen, R. H. (1998) How variable may a constant be? Measures of lexical richness in perspective, Computers and the Humanities, 32, 323-352.

See Also

See Also plot.growth, and the zipfR package.

Examples

Run this code
# NOT RUN {
  data(alice)
  alice.growth = growth.fnc(alice)
  plot(alice.growth)
# }

Run the code above in your browser using DataLab