Representing and computing on text documents.
Text documents are documents containing (natural language) text. The
tm package employs the infrastructure provided by package NLP and
represents text documents via the virtual S3 class TextDocument
.
Actual S3 text document classes then extend the virtual base class (such as
PlainTextDocument
).
All extension classes must provide an as.character
method which extracts the natural language text in documents of the
respective classes in a “suitable” (not necessarily structured)
form, as well as content
and meta
methods
for accessing the (possibly raw) document content and metadata.
PlainTextDocument
, and
XMLTextDocument
for the text document classes provided by package tm.
TextDocument
for text documents in package NLP.