Learn R Programming

quanteda (version 4.2.0)

dfm_subset: Extract a subset of a dfm

Description

Returns document subsets of a dfm that meet certain conditions, including direct logical operations on docvars (document-level variables). dfm_subset functions identically to subset.data.frame(), using non-standard evaluation to evaluate conditions based on the docvars in the dfm.

Usage

dfm_subset(
  x,
  subset,
  min_ntoken = NULL,
  max_ntoken = NULL,
  drop_docid = TRUE,
  verbose = quanteda_options("verbose"),
  ...
)

Value

dfm object, with a subset of documents (and docvars) selected according to arguments

Arguments

x

dfm object to be subsetted.

subset

logical expression indicating the documents to keep: missing values are taken as false.

min_ntoken, max_ntoken

minimum and maximum lengths of the documents to extract.

drop_docid

if TRUE, docid for documents are removed as the result of subsetting.

verbose

if TRUE print the number of tokens and documents before and after the function is applied. The number of tokens does not include paddings.

...

not used

Details

To select or subset features, see dfm_select() instead.

Examples

Run this code
corp <- corpus(c(d1 = "a b c d", d2 = "a a b e",
                 d3 = "b b c e", d4 = "e e f a b"),
               docvars = data.frame(grp = c(1, 1, 2, 3)))
dfmat <- dfm(tokens(corp))
# selecting on a docvars condition
dfm_subset(dfmat, grp > 1)
# selecting on a supplied vector
dfm_subset(dfmat, c(TRUE, FALSE, TRUE, FALSE))

Run the code above in your browser using DataLab