Learn R Programming

corpora (version 0.6)

Statistics and Data Sets for Corpus Frequency Data

Description

Utility functions for the statistical analysis of corpus frequency data. This package is a companion to the open-source course "Statistical Inference: A Gentle Introduction for Computational Linguists and Similar Creatures" ('SIGIL').

Copy Link

Version

Install

install.packages('corpora')

Monthly Downloads

520

Version

0.6

License

GPL-3

Maintainer

Last Published

August 20th, 2023

Functions in corpora (0.6)

VSS

A small corpus of very short stories with linguistic annotations
KrennPPV

German PP-Verb collocation candidates annotated by Brigitte Krenn (2000)
LOBPassives

Frequency counts of passive verb phrases in the LOB corpus
PassiveBrownFam

By-text frequencies of passive verb phrases in the Brown Family corpora.
chisq.pval

P-values of Pearson's chi-squared test for frequency comparisons (corpora)
DistFeatBrownFam

Latent dimension scores from a distributional analysis of the Brown Family corpora
cont.table

Build contingency tables for frequency comparison (corpora)
chisq

Pearson's chi-squared statistic for frequency comparisons (corpora)
LOBStats

Basic statistics of texts in the LOB corpus
binom.pval

P-values of the binomial test for frequency counts (corpora)
sample.df

Random samples from data frames (corpora)
prop.cint

Confidence interval for proportion based on frequency counts (corpora)
keyness

Compute best-practice keyness measures (corpora)
qw

Split string into words, similar to qw() in Perl (corpora)
rowColVector

Propagate vector to single-row or single-column matrix (corpora)
simulated.language.course

Simulated study on effectiveness of language course (corpora)
corpora.palette

Colour palettes for linguistic visualization (corpora)
simulated.census

Simulated census data for examples and illustrations (corpora)
z.score

The z-score statistic for frequency counts (corpora)
z.score.pval

P-values of the z-score test for frequency counts (corpora)
fisher.pval

P-values of Fisher's exact test for frequency comparisons (corpora)
stars.pval

Show p-values as significance stars (corpora)
corpora-package

corpora: Statistical Inference from Corpus Frequency Data
simulated.wikipedia

Simulated type and token counts for Wikipedia articles (corpora)
BNCInChargeOf

Collocations of the phrase "in charge of" (BNC)
BNCmeta

Metadata for the British National Corpus (XML edition)
BrownPassives

Frequency counts of passive verb phrases in the Brown corpus
BNCcomparison

Comparison of written and spoken noun frequencies in the British National Corpus
BNCbiber

Biber's (1988) register features for the British National Corpus
BNCdomains

Distribution of domains in the British National Corpus (BNC)
BNCqueries

Per-text frequency counts for a selection of BNCweb corpus queries
BrownLOBPassives

Frequency counts of passive verb phrases in the Brown and LOB corpora
BrownBigrams

Bigrams of adjacent words from the Brown corpus
BrownStats

Basic statistics of texts in the Brown corpus