Learn R Programming

koRpus (version 0.13-8)

textFeatures: Extract text features for authorship analysis

Description

This function combines several of koRpus' methods to extract the 9-Feature Set for authorship detection (Brannon, Afroz & Greenstadt, 2011; Brannon & Greenstadt, 2009).

Usage

textFeatures(text, hyphen = NULL)

Arguments

text

An object of class kRp.text. Can also be a list of these objects, if you want to analyze more than one text at once.

hyphen

An object of class kRp.hyphen, if text has already been hyphenated. If text is a list and hyphen is not NULL, it must also be a list with one object for each text, in the same order.

Value

A data.frame:

uniqWd

Number of unique words (tokens)

cmplx

Complexity (TTR)

sntCt

Sentence count

sntLen

Average sentence length

syllCt

Average syllable count

charCt

Character count (all characters, including spaces)

lttrCt

Letter count (without spaces, punctuation and digits)

FOG

Gunning FOG index

flesch

Flesch Reading Ease index

References

Brennan, M., Afroz, S., & Greenstadt, R. (2011). Deceiving authorship detection. Presentation at 28th Chaos Communication Congress (28C3), Berlin, Germany. Brennan, M. & Greenstadt, R. (2009). Practical Attacks Against Authorship Recognition Techniques. In Proceedings of the Twenty-First Conference on Innovative Applications of Artificial Intelligence (IAAI), Pasadena, CA. Tweedie, F.J., Singh, S., & Holmes, D.I. (1996). Neural Network Applications in Stylometry: The Federalist Papers. Computers and the Humanities, 30, 1--10.

Examples

Run this code
# NOT RUN {
# code is only run when the english language package can be loaded
if(require("koRpus.lang.en", quietly = TRUE)){
  sample_file <- file.path(
    path.package("koRpus"), "examples", "corpus", "Reality_Winner.txt"
  )
  tokenized.obj <- tokenize(
    txt=sample_file,
    lang="en"
  )
  textFeatures(tokenized.obj)
} else {}
# }

Run the code above in your browser using DataLab