which_are: Detect/Locate Potential Non-Normalized Text
Description
Detect/Locate potential issues with text data. This family of functions
generates a list of detections/location functions that can be accessed via
the dollar sign or square bracket operators. Accessible functions include:
Usage
which_are()
is_it()
Arguments
Value
which_are returns an environment of functions that can be used to
locate and return the integer locations of the particular non-normalized text
named by the function.
is_it returns an environment of functions that can be used to
detect and return a logical atomic vector of equal length to the input vector
(except for meta functions) of the particular non-normalized text
named by the function.
Details
contraction
Contains contractions
date
Contains dates
digit
Contains digits
email
Contains email addresses
emoticon
Contains emoticons
empty
Contains just white space
escaped
Contains escaped backslash character
hash
Contains Twitter style hash tags
html
Contains html mark-up
incomplete
Contains incomplete sentences (e.g., ends with ...)
kern
Contains kerning (e.g. "The B O M B!")
list_column
Is a list of atomic vectors (Not provided by which_are))
misspelled
Contains potentially misspelled words
no_endmark
Contains a sentence with no ending punctuation
no_space_after_comma
Contains commas with no space after them
non_ascii
Contains non-ASCII characters
non_character
Is a non-character vector (Not provided by which_are))
non_split_sentence
Contains non split sentences
tag
Contains a Twitter style handle used to tag others (use of the at symbol)
time
Contains a time stamp
url
Contains a URL
The functions above that have a description starting with 'is' rather than 'contains'
are meta functions that describe the attribute of the column/vector being passed
rather than attributes about the individual elements of the column/vector. The
meta functions will return a logical of length one and are not available under
which_are.
# NOT RUN {wa <- which_are()
it <- is_it()
wa$digit(c('The dog', "I like 2", NA))
it$digit(c('The dog', "I like 2", NA))
is_it()$list_column(c('the dog', 'ate the chicken'))
# }