This data set provides some basic quantiative measures for all texts in the Brown corpus of written American English (Francis & Kucera 1964),
BrownStats
A data frame with 500 rows and the following columns:
ty
:number of distinct types
to
:number of tokens (including punctuation)
se
:number of sentences
towl
:mean word length in characters, averaged over tokens
tywl
:mean word length in characters, averaged over types
Marco Baroni <baroni@sslmit.unibo.it>
Francis, W.~N. and Kucera, H. (1964). Manual of information to accompany a standard sample of present-day edited American English, for use with digital computers. Technical report, Department of Linguistics, Brown University, Providence, RI.
LOBStats