detect noise
noise(.Object, ...)# S4 method for DocumentTermMatrix
noise(
.Object,
minTotal = 2,
minTfIdfMean = 0.005,
sparse = 0.995,
stopwordsLanguage = "german",
minNchar = 2L,
specialChars = getOption("polmineR.specialChars"),
numbers = "^[0-9\\.,]+$",
verbose = TRUE
)
# S4 method for TermDocumentMatrix
noise(.Object, ...)
# S4 method for character
noise(
.Object,
stopwordsLanguage = "german",
minNchar = 2,
specialChars = getOption("polmineR.specialChars"),
numbers = "^[0-9\\.,]+$",
verbose = TRUE
)
# S4 method for textstat
noise(.Object, p_attribute, ...)
a list
An object of class DocumentTermMatrix
.
further parameters
minimum colsum (for DocumentTermMatrix) to qualify a term as non-noise
minimum mean value for tf-idf to qualify a term as non-noise
Will be passed into tm::removeSparseTerms()
.
e.g. "german", to get stopwords defined in the tm
package.
Minimum number of characters to qualify a term as non-noise.
special characters to drop
regex, to drop numbers
logical
relevant if applied to a textstat object