Learn R Programming

quanteda (version 0.9.8.5)

compress: compress a dfm by combining similarly named dimensions

Description

"Compresses" a dfm whose dimension names are the same, for either documents or features. This may happen, for instance, if features are made equivalent through application of a thesaurus. It may also occur after lower-casing or stemming the features of a dfm, but this should only be done in very rare cases (approaching never: it's better to do this before constructing the dfm.) It could also be needed , after a cbind.dfm or rbind.dfm operation.

Usage

compress(x, ...)
"compress"(x, margin = c("both", "documents", "features"), ...)

Arguments

x
input object, a dfm
...
additional arguments passed from generic to specific methods
margin
character indicating which margin to compress on, either "documents", "features", or "both" (default)

Examples

Run this code
mat <- rbind(dfm(c("b A A", "C C a b B"), toLower = FALSE, verbose = FALSE),
             dfm("A C C C C C", toLower = FALSE, verbose = FALSE))
colnames(mat) <- toLower(features(mat))
mat
compress(mat, margin = "documents")
compress(mat, margin = "features")
compress(mat)

# no effect if no compression needed
compress(dfm(inaugTexts, verbose = FALSE))

Run the code above in your browser using DataLab