Remove/replace/extract words with repeating characters. The word must contain characters, each repeating at east 2 times
rm_repeated_characters(
text.var,
trim = !extract,
clean = TRUE,
pattern = "@rm_repeated_characters",
replacement = "",
extract = FALSE,
dictionary = getOption("regex.library"),
...
)ex_repeated_characters(
text.var,
trim = !extract,
clean = TRUE,
pattern = "@rm_repeated_characters",
replacement = "",
extract = TRUE,
dictionary = getOption("regex.library"),
...
)
Returns a character string with percentages removed.
The text variable.
logical. If TRUE removes leading and trailing white
spaces.
trim logical. If TRUE extra white spaces and escaped
character will be removed.
A character string containing a regular expression (or
character string for fixed = TRUE) to be matched in the given
character vector. Default, @rm_repeated_characters uses the
rm_repeated_characters regex from the regular expression dictionary from
the dictionary argument.
Replacement for matched pattern.
logical. If TRUE the words with repeating characters
are extracted into a list of vectors.
A dictionary of canned regular expressions to search within
if pattern begins with "@rm_".
Other arguments passed to gsub.
stackoverflow's vks and Tyler Rinker <tyler.rinker@gmail.com>.
Other rm_ functions:
rm_abbreviation(),
rm_between(),
rm_bracket(),
rm_caps_phrase(),
rm_caps(),
rm_citation_tex(),
rm_citation(),
rm_city_state_zip(),
rm_city_state(),
rm_date(),
rm_default(),
rm_dollar(),
rm_email(),
rm_emoticon(),
rm_endmark(),
rm_hash(),
rm_nchar_words(),
rm_non_ascii(),
rm_non_words(),
rm_number(),
rm_percent(),
rm_phone(),
rm_postal_code(),
rm_repeated_phrases(),
rm_repeated_words(),
rm_tag(),
rm_time(),
rm_title_name(),
rm_url(),
rm_white(),
rm_zip()
x <- "aaaahahahahaha that was a good joke peep and pepper and pepe"
rm_repeated_characters(x)
ex_repeated_characters(x)
Run the code above in your browser using DataLab