Learn R Programming

quanteda (version 4.2.0)

dfm_match: Match the feature set of a dfm to given feature names

Description

Match the feature set of a dfm to a specified vector of feature names. For existing features in x for which there is an exact match for an element of features, these will be included. Any features in x not features will be discarded, and any feature names specified in features but not found in x will be added with all zero counts.

Usage

dfm_match(x, features, verbose = quanteda_options("verbose"))

Value

A dfm whose features are identical to those specified in features.

Arguments

x

a dfm

features

character; the feature names to be matched in the output dfm

verbose

if TRUE print the number of tokens and documents before and after the function is applied. The number of tokens does not include paddings.

Details

Selecting on another dfm's featnames() is useful when you have trained a model on one dfm, and need to project this onto a test set whose features must be identical. It is also used in bootstrap_dfm().

See Also

dfm_select()

Examples

Run this code
# matching a dfm to a feature vector
dfm_match(dfm(tokens("")), letters[1:5])
dfm_match(data_dfm_lbgexample, c("A", "B", "Z"))
dfm_match(data_dfm_lbgexample, c("B", "newfeat1", "A", "newfeat2"))

# matching one dfm to another
txt <- c("This is text one", "The second text", "This is text three")
(dfmat1 <- dfm(tokens(txt[1:2])))
(dfmat2 <- dfm(tokens(txt[2:3])))
(dfmat3 <- dfm_match(dfmat1, featnames(dfmat2)))
setequal(featnames(dfmat2), featnames(dfmat3))

Run the code above in your browser using DataLab