StopWordPipe
class is responsible for detecting
the existing stop words in the data field of each Instance
.
Identified stop words are stored inside the contraction field of
Instance
class. Moreover if needed, is able to perform inline
stop words removement.
This class inherits from GenericPipe
and implements the
pipe
abstract function.
bdpar::GenericPipe
-> StopWordPipe
new()
Creates a StopWordPipe
object.
StopWordPipe$new(
propertyName = "stopWord",
propertyLanguageName = "language",
alwaysBeforeDeps = list("GuessLanguagePipe"),
notAfterDeps = list("AbbreviationPipe"),
removeStopWords = TRUE,
resourcesStopWordsPath = NULL
)
propertyName
A character
value. Name of the property
associated with the GenericPipe
.
propertyLanguageName
A character
value. Name of the
language property.
alwaysBeforeDeps
A list
value. The dependencies
alwaysBefore (GenericPipes
that must be executed before
this one).
notAfterDeps
A list
value. The dependencies
notAfter (GenericPipes
that cannot be executed after
this one).
removeStopWords
A logical
value. Indicates if
the stop words are removed or not.
resourcesStopWordsPath
A character
value. Path
of resource files (in json format) containing the stop words.
pipe()
Preprocesses the Instance
to obtain/remove
the stop words. The stop words found in the data are added to the
list of properties of the Instance
.
StopWordPipe$pipe(instance)
instance
A Instance
value. The Instance
to preprocess.
The Instance
with the modifications that have
occurred in the pipe.
findStopWord()
Checks if the stop word is in the data.
StopWordPipe$findStopWord(data, stopWord)
data
A character
value. The text where stop word
will be searched.
stopWord
A character
value. Indicates the
stop word to find.
A logical
value depending on whether the
stop word is in the data.
removeStopWord()
Removes the stop word in the data.
StopWordPipe$removeStopWord(stopWord, data)
stopWord
A character
value. Indicates the
stop word to remove.
data
A character
value. The text where stop word
will be removed.
The data with the stop words removed.
getPropertyLanguageName()
Gets the name of property language.
StopWordPipe$getPropertyLanguageName()
Value of name of property language.
getResourcesStopWordsPath()
Gets the path of stop words resources.
StopWordPipe$getResourcesStopWordsPath()
Value of path of stop words resources.
setResourcesStopWordsPath()
Sets the path of stop words resources.
StopWordPipe$setResourcesStopWordsPath(path)
path
A character
value. The new value of the path of
stop words resources.
clone()
The objects of this class are cloneable with this method.
StopWordPipe$clone(deep = FALSE)
deep
Whether to make a deep clone.
StopWordPipe
class requires the resource files (in json format)
containing the list of stop words. To this end, the language of the text
indicated in the propertyLanguageName should be contained in the
resource file name (ie. xxx.json where xxx is the value defined in the
propertyLanguageName ). The location of the resources should be
defined in the "resources.stopwords.path" field of
bdpar.Options variable.
AbbreviationPipe
, bdpar.Options
,
ContractionPipe
, File2Pipe
,
FindEmojiPipe
, FindEmoticonPipe
,
FindHashtagPipe
, FindUrlPipe
,
FindUserNamePipe
, GuessDatePipe
,
GuessLanguagePipe
, Instance
,
InterjectionPipe
, MeasureLengthPipe
,
GenericPipe
, ResourceHandler
,
SlangPipe
, StoreFileExtPipe
,
TargetAssigningPipe
, TeeCSVPipe
,
ToLowerCasePipe