This function allows removing shared words, ie triming to non-redundant words.
rmSharedWords(
x,
sep = c("_", " ", "."),
anySep = TRUE,
newSep = NULL,
minLe = 2,
na.omit = FALSE,
fixed = TRUE,
silent = FALSE,
debug = FALSE,
callFrom = NULL
)
This function returns character vector of same length (unless na.omit=TRUE
), simply with modified text-content
(character) main input for making non-redundant
(character) separator(s) to be used
(logical) if TRUE
, will consider all separators at one time (), thus combinations with different separators won't be distinguished
(character) new (uniform) separator between words, if NULL
the first value/separator of if sep
will be used
(integer) minimum length for allowing being recognised as 'word'
(logical) if TRUE NA
s will be removed from output
(logical) will be transmitted to argument fixed
of strsplit()
; if TRUE
regular expressions are allowed/used
(logical) suppress messages
(logical) additional messages for debugging
(character) allows easier tracking of messages produced
Heading separators will be removed in any case (even if not followed by a 'word').
Special characters will be automatically protected. When looking for repeated words, the order of such words does NOT matter, multiple repeats will be removed, too.
#'
trimRedundText
x1 <- c("aa_A1 yy_zz.txt", NA, "B2 yy_aa_aa_zz.txt")
rmSharedWords(x1)
Run the code above in your browser using DataLab