Learn R Programming

udpipe (version 0.8.11)

strsplit.data.frame: Obtain a tokenised data frame by splitting text alongside a regular expression

Description

Obtain a tokenised data frame by splitting text alongside a regular expression. This is the inverse operation of paste.data.frame.

Usage

strsplit.data.frame(
  data,
  term,
  group,
  split = "[[:space:][:punct:][:digit:]]+",
  ...
)

Value

A tokenised data frame containing one row per token.

This data.frame has the columns from group and term where the text in column term

will be split by the provided regular expression into tokens.

Arguments

data

a data.frame or data.table

term

a character with a column name from data which you want to split into tokens

group

a string with a column name or a character vector of column names from data indicating identifiers of groups. The text in term will be split into tokens by group.

split

a regular expression indicating how to split the term column. Defaults to splitting by spaces, punctuation symbols or digits. This will be passed on to strsplit.

...

further arguments passed on to strsplit

See Also

paste.data.frame, strsplit

Examples

Run this code
data(brussels_reviews, package = "udpipe")
x <- strsplit.data.frame(brussels_reviews, term = "feedback", group = "id")
head(x)
x <- strsplit.data.frame(brussels_reviews, 
                         term = c("feedback"), 
                         group = c("listing_id", "language"))
head(x)  
x <- strsplit.data.frame(brussels_reviews, term = "feedback", group = "id", 
                         split = " ", fixed = TRUE)
head(x)                          

Run the code above in your browser using DataLab