detect noise
noise(.Object, ...)# S4 method for DocumentTermMatrix
noise(
.Object,
minTotal = 2,
minTfIdfMean = 0.005,
sparse = 0.995,
stopwordsLanguage = "german",
minNchar = 2,
specialChars = getOption("polmineR.specialChars"),
numbers = "^[0-9\\.,]+$",
verbose = TRUE
)
# S4 method for TermDocumentMatrix
noise(.Object, ...)
# S4 method for character
noise(
.Object,
stopwordsLanguage = "german",
minNchar = 2,
specialChars = getOption("polmineR.specialChars"),
numbers = "^[0-9\\.,]+$",
verbose = TRUE
)
# S4 method for textstat
noise(.Object, p_attribute, ...)
an .Object of class "DocumentTermMatrix"
further parameters
minimum colsum (for DocumentTermMatrix) to qualify a term as non-noise
minimum mean value for tf-idf to qualify a term as non-noise
will be passed into "removeSparseTerms"
from "tm"
-package
e.g. "german", to get stopwords defined in the tm package
min char length ti qualify a term as non-noise
special characters to drop
regex, to drop numbers
logical
relevant if applied to a textstat object
a list