powered by
Standardise text by
Conversion of text from UTF-8 to ASCII
Keeping only alphanumeric characters: letters and numbers
Removing multiple spaces
Removing leading/trailing spaces
Performing lowercasing
txt_clean_word2vec(x, ascii = TRUE, alpha = TRUE, tolower = TRUE, trim = TRUE)
a character vector of the same length as x
x
which is standardised by converting the encoding to ascii, lowercasing and keeping only alphanumeric elements
a character vector in UTF-8 encoding
logical indicating to use iconv to convert the input from UTF-8 to ASCII. Defaults to TRUE.
iconv
logical indicating to keep only alphanumeric characters. Defaults to TRUE.
logical indicating to lowercase x. Defaults to TRUE.
logical indicating to trim leading/trailing white space. Defaults to TRUE.
x <- c(" Just some.texts, ok?", "123.456 and\tsome MORE! ") txt_clean_word2vec(x)
Run the code above in your browser using DataLab