Learn R Programming

textclean (version 0.9.3)

which_are: Detect/Locate Potential Non-Normalized Text

Description

Detect/Locate potential issues with text data. This family of functions generates a list of detections/location functions that can be accessed via the dollar sign or square bracket operators. Accessible functions include:

Usage

which_are()

is_it()

Arguments

Value

which_are returns an environment of functions that can be used to locate and return the integer locations of the particular non-normalized text named by the function.

is_it returns an environment of functions that can be used to detect and return a logical atomic vector of equal length to the input vector (except for meta functions) of the particular non-normalized text named by the function.

Details

contraction

Contains contractions

date

Contains dates

digit

Contains digits

email

Contains email addresses

emoticon

Contains emoticons

empty

Contains just white space

escaped

Contains escaped backslash character

hash

Contains Twitter style hash tags

html

Contains html mark-up

incomplete

Contains incomplete sentences (e.g., ends with ...)

kern

Contains kerning (e.g. "The B O M B!")

list_column

Is a list of atomic vectors (Not provided by which_are))

misspelled

Contains potentially misspelled words

no_endmark

Contains a sentence with no ending punctuation

no_space_after_comma

Contains commas with no space after them

non_ascii

Contains non-ASCII characters

non_character

Is a non-character vector (Not provided by which_are))

non_split_sentence

Contains non split sentences

tag

Contains a Twitter style handle used to tag others (use of the at symbol)

time

Contains a time stamp

url

Contains a URL

The functions above that have a description starting with 'is' rather than 'contains' are meta functions that describe the attribute of the column/vector being passed rather than attributes about the individual elements of the column/vector. The meta functions will return a logical of length one and are not available under which_are.

Examples

Run this code
# NOT RUN {
wa <- which_are()
it <- is_it()
wa$digit(c('The dog',  "I like 2", NA))
it$digit(c('The dog',  "I like 2", NA))

is_it()$list_column(c('the dog', 'ate the chicken'))

# }

Run the code above in your browser using DataLab