Learn R Programming

corpora (version 0.6)

z.score: The z-score statistic for frequency counts (corpora)

Description

This function computes a z-score statistic for frequency counts, based on a normal approximation to the correct binomial distribution under the random sampling model.

Usage

z.score(k, n, p = 0.5, correct = TRUE)

Value

The \(z\)-score corresponding to the specified data (or a vector of

\(z\)-scores).

Arguments

k

frequency of a type in the corpus (or an integer vector of frequencies)

n

number of tokens in the corpus, i.e. sample size (or an integer vector specifying the sizes of different samples)

p

null hypothesis, giving the assumed proportion of this type in the population (or a vector of proportions for different types and/or different populations)

correct

if TRUE, apply Yates' continuity correction (default)

Author

Stephanie Evert (https://purl.org/stephanie.evert)

Details

The \(z\) statistic is given by $$% z := \dfrac{k - np}{\sqrt{n p (1-p)}} $$ When Yates' continuity correction is enabled, the absolute value of the numerator \(d := k - np\) is reduced by \(1/2\), but clamped to a non-negative value.

See Also

z.score.pval

Examples

Run this code
# z-test for H0: pi = 0.15 with observed counts 10..30 in a sample of n=100 tokens
k <- c(10:30)
z <- z.score(k, 100, p=.15)
names(z) <- k
round(z, 3)

abs(z) >= 1.96  # significant results at p < .05

Run the code above in your browser using DataLab