readability: Measure readability

Description

This function calculates several readability indices.

Usage

readability(txt.file, hyphen=NULL, index=c("ARI",
    "Bormuth", "Coleman", "Coleman.Liau", "Dale.Chall",
    "Danielson.Bryan", "Dickes.Steiwer","DRP", "ELF",
    "Farr.Jenkins.Paterson", "Flesch", "Flesch.Kincaid",
    "FOG", "FORCAST", "Fucks", "Harris.Jacobson",
    "Linsear.Write", "LIX", "nWS", "RIX", "SMOG", "Spache",
    "Strain", "Traenkle.Bailer", "TRI", "Wheeler.Smith"),
    parameters=list(), word.lists=list(Bormuth=NULL,
    Dale.Chall=NULL, Harris.Jacobson=NULL, Spache=NULL),
    fileEncoding="UTF-8", tagger="kRp.env",
    force.lang=NULL, sentc.tag="sentc",
    nonword.class="nonpunct", nonword.tag=c(), quiet=FALSE,
    ...)

Arguments

txt.file

Either an object of class kRp.tagged-class, kRp.txt.freq-class,

hyphen

An object of class kRp.hyphen. If NULL, the text will be hyphenated automatically. All syllable handling will be skipped automatically if it's not needed for the selected indices.

index

A character vector, indicating which indices should actually be computed.

parameters

A list with named magic numbers, defining the relevant parameters for each index. If none are given, the default values are used.

word.lists

A named list providing the word lists for indices which need one. If NULL or missing, the indices will be skipped and a warning is giving. Actual word lists can be provided as either a vector (or matrix or data.frame with only one col

fileEncoding

A character string naming the encoding of the word list files (if they are files). "ISO_8859-1" or "UTF-8" should work in most cases.

tagger

A character string pointing to the tokenizer/tagger command you want to use for basic text analysis. Can be omitted if txt.file is already of class kRp.tagged-class. Defaults to tagger="kRp.env" to get the se

force.lang

A character string defining the language to be assumed for the text, by force.

sentc.tag

A character vector with POS tags which indicate a sentence ending. The default value "sentc" has special meaning and will cause the result of

kRp.POS.tags(lang, tags="sentc",
  list.tags=TRUE)

to be used.

nonword.class

A character vector with word classes which should be ignored for readability analysis. The default value "nonpunct" has special meaning and will cause the result of

kRp.POS.tags(lang,
  c("punct","sentc"), list.classes=TRUE)

nonword.tag

A character vector with POS tags which should be ignored for readability analysis. Will only be of consequence if hyphen is not set!

quiet

Logical. If FALSE, short status messages will be shown.

...

Additional options for the specified tagger function

Value

An object of class kRp.readability-class.

Details

In the following formulae, $W$ stands for the number of words, $St$ for the number of sentences, $C$ for the number of characters (usually meaning letters), $Sy$ for the number of syllables, $W_{3Sy}$ for the number of words with at least three syllables, $W_{<3sy}$ for="" the="" number="" of="" words="" with="" less="" than="" three="" syllables,="" $w^{1sy}$="" exactly="" one="" syllable,="" $w_{6c}$="" at="" least="" six="" letters,="" and="" $w_{-wl}$="" which="" are="" not="" on="" a="" certain="" word="" list="" (explained="" where="" needed).="" [object="" object],[object="" object]<="" p="">

By default, if the text has to be tagged yet, the language definition is queried by calling get.kRp.env(lang=TRUE) internally. Or, if txt has already been tagged, by default the language definition of that tagged object is read and used. Set force.lang=get.kRp.env(lang=TRUE) or to any other valid value, if you want to forcibly overwrite this default behaviour, and only then. See kRp.POS.tags for all supported languages.

References

Anderson, J. (1981). Analysing the readability of english and non-english texts in the classroom with Lix. In Annual Meeting of the Australian Reading Association, Darwin, Australia.

Anderson, J. (1983). Lix and Rix: Variations on a little-known readability index. Journal of Reading, 26(6), 490--496.

Bamberger, R. & Vanecek, E. (1984). Lesen--Verstehen--Lernen--Schreiben. Wien: Jugend und Volk.

Coleman, M. & Liau, T.L. (1975). A computer readability formula designed for machine scoring, Journal of Applied Psychology, 60(2), 283--284.

Dickes, P. & Steiwer, L. (1977). Ausarbeitung von Lesbarkeitsformeln f"ur die deutsche Sprache. Zeitschrift f"ur Entwicklungspsychologie und P"adagogische Psychologie, 9(1), 20--28.

DuBay, W.H. (2004). The Principles of Readability. Costa Mesa: Impact Information. WWW: http://www.impact-information.com/impactinfo/readability02.pdf; 22.03.2011.

Farr, J.N., Jenkins, J.J. & Paterson, D.G. (1951). Simplification of Flesch Reading Ease formula. Journal of Applied Psychology, 35(5), 333--337.

Flesch, R. (1948). A new readability yardstick. Journal of Applied Psychology, 32(3), 221--233.

Fucks, W. (1955). Der Unterschied des Prosastils von Dichtern und anderen Schriftstellern. Sprachforum, 1, 233--244.

Harris, A.J. & Jacobson, M.D. (1974). Revised Harris-Jacobson readability formulas. In 18th Annual Meeting of the College Reading Association, Bethesda.

Klare, G.R. (1975). Assessing readability. Reading Research Quarterly, 10(1), 62--102.

McLaughlin, G.H. (1969). SMOG grading -- A new readability formula. Journal of Reading, 12(8), 639--646.

Powers, R.D, Sumner, W.A, & Kearl, B.E. (1958). A recalculation of four adult readability formulas, Journal of Educational Psychology, 49(2), 99--105.

Smith, E.A. & Senter, R.J. (1967). Automated readability index. AMRL-TR-66-22. Wright-Paterson AFB, Ohio: Aerospace Medical Division.

Spache, G. (1953). A new readability formula for primary-grade reading materials. The Elementary School Journal, 53, 410--413.

Tr"ankle, U. & Bailer, H. (1984). Kreuzvalidierung und Neuberechnung von Lesbarkeitsformeln f"ur die deutsche Sprache. Zeitschrift f"ur Entwicklungspsychologie und P"adagogische Psychologie, 16(3), 231--244.

Wheeler, L.R. & Smith, E.H. (1954). A practical readability formula for the classroom teacher in the primary grades. Elementary English, 31, 397--399.

[1] http://strainindex.wordpress.com/2007/09/25/hello-world/