
Create permanent corpora.
PCorpus(x,
readerControl = list(reader = reader(x), language = "en"),
dbControl = list(dbName = "", dbType = "DB1"))
A Source
object.
a named list of control parameters for reading in content
from x
.
reader
a function capable of reading in and processing the
format delivered by x
.
language
a character giving the language (preferably as
IETF language tags, see language in
package NLP).
The default language is assumed to be English ("en"
).
a named list of control parameters for the underlying database storage provided by package filehash.
dbName
a character giving the filename for the database.
dbType
a character giving the database format (see
filehashOption
for possible database formats).
An object inheriting from PCorpus
and Corpus
.
A permanent corpus stores documents outside of R in a database. Since
multiple PCorpus
R objects with the same underlying database can
exist simultaneously in memory, changes in one get propagated to all
corresponding objects (in contrast to the default R semantics).
Corpus
for basic information on the corpus infrastructure
employed by package tm.
VCorpus
provides an implementation with volatile storage
semantics.
# NOT RUN {
txt <- system.file("texts", "txt", package = "tm")
# }
# NOT RUN {
PCorpus(DirSource(txt),
dbControl = list(dbName = "pcorpus.db", dbType = "DB1"))
# }
Run the code above in your browser using DataLab