Learn R Programming

quanteda (version 0.99.12)

dfm_subset: extract a subset of a dfm

Description

Returns document subsets of a dfm that meet certain conditions, including direct logical operations on docvars (document-level variables). dfm_subset functions identically to subset.data.frame, using non-standard evaluation to evaluate conditions based on the docvars in the dfm.

Usage

dfm_subset(x, subset, select, ...)

Arguments

x

dfm object to be subsetted

subset

logical expression indicating the documents to keep: missing values are taken as FALSE

select

expression, indicating the docvars to select from the dfm; or a dfm, in which case the returned dfm will contain the same documents as the original dfm, even if these are empty. See Details.

...

not used

Value

dfm object, with a subset of documents (and docvars) selected according to arguments

Details

To select or subset features, see dfm_select instead.

When select is a dfm, then the returned dfm will be equal in row dimensions and order to the dfm used for selection. This is the document-level version of using dfm_select where pattern is a dfm: that function matches features, while dfm_subset will match documents.

See Also

subset.data.frame

Examples

Run this code
# NOT RUN {
testcorp <- corpus(c(d1 = "a b c d", d2 = "a a b e",
                     d3 = "b b c e", d4 = "e e f a b"),
                   docvars = data.frame(grp = c(1, 1, 2, 3)))
testdfm <- dfm(testcorp)
# selecting on a docvars condition
dfm_subset(testdfm, grp > 1)
# selecting on a supplied vector
dfm_subset(testdfm, c(TRUE, FALSE, TRUE, FALSE))

# selecting on a dfm
dfm1 <- dfm(c(d1 = "a b b c", d2 = "b b c d"))
dfm2 <- dfm(c(d1 = "x y z", d2 = "a b c c d", d3 = "x x x"))
dfm_subset(dfm1, subset = dfm2)
dfm_subset(dfm1, subset = dfm2[c(3,1,2), ])
# }

Run the code above in your browser using DataLab