Learn R Programming

languageR (version 1.5.0)

compare.richness.fnc: Compare Lexical Richness of Two Texts

Description

Comparisons of lexical richness between two texts are carried out on the basis of the vocabulary size (number of types) and on the basis of the vocabulary growth rate. Variances of the number of types and of the number of hapax legomena required for the tests are estimated with the help of LNRE models.

Usage

compare.richness.fnc(text1, text2, digits = 5)

Arguments

text1

First text in the comparison.

text2

Second text in the comparison.

digits

Number of decimal digits required for the growth rate.

Value

A summary listing the Chi-Squared measure of goodness of fit for the LNRE models (available in the zipfR package) used to estimate variances, a table listing tokens, types, hapax legomena and the vocabulary growth rate, and two-tailed tests for differences in the vocabulary sizes and growth rates with Z-score and p-value.

Details

The comparison for the vocabulary size is carried out with the test statistic

$$Z = \frac{E[V_1] - E[V_2]}{\sqrt{\sigma(V_1)^2 + \sigma(V_2)^2}}$$

and the comparison of the growth rates with the test statistic

$$Z = \frac{\frac{1}{N_1}E[V_1(1)] - \frac{1}{N_2}E[V_2]}{\sqrt{\frac{1}{N_1^2}\sigma(V_1(1))^2 + \frac{1}{N_2^2}\sigma(V_2(1))^2}}$$

where \(N\) denotes the sample size in tokens, \(V\) the vocabulary size, and \(V(1)\) the number of hapax legomena.

References

Baayen, R. H. (2001) Word Frequency Distributions, Kluwer Academic Publishers, Dordrecht.

Examples

Run this code
# NOT RUN {
	
# }
# NOT RUN {
     data(alice, through, oz)
     compare.richness.fnc(tolower(alice), tolower(through[1:length(alice)]))
     compare.richness.fnc(tolower(alice), tolower(oz[1:25942]))
  
# }

Run the code above in your browser using DataLab