This data set provides some basic quantiative measures for all texts in the LOB corpus of written British English (Johansson et al. 1978).
LOBStats
A data frame with 500 rows and the following columns:
ty
:number of distinct types
to
:number of tokens (including punctuation)
se
:number of sentences
towl
:mean word length in characters, averaged over tokens
tywl
:mean word length in characters, averaged over types
Marco Baroni <baroni@sslmit.unibo.it>
Johansson, Stig; Leech, Geoffrey; Goodluck, Helen (1978). Manual of information to accompany the Lancaster-Oslo/Bergen corpus of British English, for use with digital computers. Technical report, Department of English, University of Oslo, Oslo.
BrownStats