Learn R Programming

preText (version 0.6.2)

remove_infrequent_terms: Remove infrequently occurring terms from quanteda dfm.

Description

Removes terms appearing in less than a specific proportion of documents in a corpus from a dfm.

Usage

remove_infrequent_terms(dfm_object, proportion_threshold = 0.01,
  indices = NULL, verbose = TRUE)

Arguments

dfm_object

A quanteda dfm object.

proportion_threshold

proportion of documents a term must be included in to be included in the dfm.

indices

Defaults to NULL. If not NULL, then it must be a numeric vector specifying the column indices of terms the user would like to remove. Useful for removing specific terms.

verbose

Logical indicating whether more information should be printed to the screen to let the user know about progress in preprocessing. Defaults to TRUE.

Value

A reduced dfm.

Examples

Run this code
# NOT RUN {
# load the package
library(preText)
# load in the data
data("UK_Manifestos")
# preprocess data
preprocessed_documents <- factorial_preprocessing(
    UK_Manifestos,
    use_ngrams = TRUE,
    infrequent_term_threshold = 0.02,
    verbose = TRUE)
updated_dfm <- remove_infrequent_terms(preprocessed_documents$dfm_list[[1]],
                                       proportion_threshold = 0.5,
                                       indices = NULL,
                                       verbose = TRUE)
# }

Run the code above in your browser using DataLab