Read data from a custom corpus into a valid object of class kRp.corp.freq
.
read.corp.custom(corpus, caseSens = TRUE, log.base = 10, ...)# S4 method for kRp.text
read.corp.custom(
corpus,
caseSens = TRUE,
log.base = 10,
dtm = docTermMatrix(obj = corpus, case.sens = caseSens),
as.feature = FALSE
)
An object of class kRp.text
(then the column "token"
of the tokens
slot is used).
Logical. If FALSE
,
all tokens will be matched in their lower case form.
A numeric value defining the base of the logarithm used for inverse document frequency (idf). See
log
for details.
Additional options for methods of the generic.
A document term matrix of the corpus
object as generated by docTermMatrix
.
This argument merely exists for cases where you want to re-use an already existing matrix.
By default, it is being created from the corpus
object.
Logical,
whether the output should be just the analysis results or the input object with
the results added as a feature. Use corpusCorpFreq
to get the results from such an aggregated object.
An object of class kRp.corp.freq
.
Depending on as.feature
,
either an object of class kRp.corp.freq
,
or an object of class kRp.text
with the added feature corp_freq
containing it.
The methods should enable you to perform a basic text corpus frequency analysis. That is,
not just to
import analysis results like LCC files,
but to import the corpus material itself. The resulting object
is of class kRp.corp.freq
,
so it can be used for frequency analysis by
other functions and methods of this package.
# NOT RUN {
ru.corp <- read.corp.custom("~/mydata/corpora/russian_corpus/")
# }
Run the code above in your browser using DataLab