Learn R Programming

quanteda (version 0.9.9-50)

textstat_readability: calculate readability

Description

Calculate the readability of text(s) using one of a variety of computed indexes.

Usage

textstat_readability(x, measure = c("all", "ARI", "ARI.simple", "Bormuth",
  "Bormuth.GP", "Coleman", "Coleman.C2", "Coleman.Liau", "Coleman.Liau.grade",
  "Coleman.Liau.short", "Dale.Chall", "Dale.Chall.old", "Dale.Chall.PSK",
  "Danielson.Bryan", "Danielson.Bryan.2", "Dickes.Steiwer", "DRP", "ELF",
  "Farr.Jenkins.Paterson", "Flesch", "Flesch.PSK", "Flesch.Kincaid", "FOG",
  "FOG.PSK", "FOG.NRI", "FORCAST", "FORCAST.RGL", "Fucks", "Linsear.Write",
  "LIW", "nWS", "nWS.2", "nWS.3", "nWS.4", "RIX", "Scrabble", "SMOG", "SMOG.C",
  "SMOG.simple",      "SMOG.de", "Spache", "Spache.old", "Strain",
  "Traenkle.Bailer", "Traenkle.Bailer.2", "Wheeler.Smith", "meanSentenceLength",
  "meanWordSyllables"), remove_hyphens = TRUE, min_sentence_length = 1,
  max_sentence_length = 10000, drop = TRUE, ...)

Arguments

x
a character or corpus object containing the texts
measure
character vector defining the readability measure to calculate
remove_hyphens
if TRUE, treat constituent words in hyphenated as separate terms, for purposes of computing word lengths, e.g. "decision-making" as two terms of lengths 8 and 6 characters respectively, rather than as a single word of 15 characters
min_sentence_length, max_sentence_length
set the minimum and maximum sentence lengths (in tokens, excluding punctuation) to include in the computation of readability. This makes it easy to exclude "sentences" that may not really be sentences, such as section titles, table elements, and other cruft that might be in the texts following conversion.

For finer-grained control, consider filtering sentences prior first, including through pattern-matching, using corpus_trimsentences.

drop
if TRUE, the result is returned as a numeric vector if only a single measure is requested; otherwise, a data.frame is returned with each column consisting of a requested measure.
...
not used

Value

a data.frame object consisting of the documents as rows, and the readability statistics as columns

Examples

Run this code
txt <- c("Readability zero one.  Ten, Eleven.", "The cat in a dilapidated tophat.")
textstat_readability(txt, "Flesch.Kincaid")
textstat_readability(txt, "Flesch.Kincaid", drop = FALSE)
textstat_readability(txt, c("FOG", "FOG.PSK", "FOG.NRI"))
inaugReadability <- textstat_readability(data_corpus_inaugural, "all")
round(cor(inaugReadability), 2)

textstat_readability(data_corpus_inaugural, measure = "Flesch.Kincaid")
inaugReadability <- textstat_readability(data_corpus_inaugural, "all")
round(cor(inaugReadability), 2)

Run the code above in your browser using DataLab