Learn R Programming

bdpar (version 3.1.0)

StopWordPipe: Class to find and/or remove the stop words on the data field of an Instance

Description

StopWordPipe class is responsible for detecting the existing stop words in the data field of each Instance. Identified stop words are stored inside the contraction field of Instance class. Moreover if needed, is able to perform inline stop words removement.

Arguments

Inherit

This class inherits from GenericPipe and implements the pipe abstract function.

Super class

bdpar::GenericPipe -> StopWordPipe

Methods

Inherited methods


Method new()

Creates a StopWordPipe object.

Usage

StopWordPipe$new(
  propertyName = "stopWord",
  propertyLanguageName = "language",
  alwaysBeforeDeps = list("GuessLanguagePipe"),
  notAfterDeps = list("AbbreviationPipe"),
  removeStopWords = TRUE,
  resourcesStopWordsPath = NULL
)

Arguments

propertyName

A character value. Name of the property associated with the GenericPipe.

propertyLanguageName

A character value. Name of the language property.

alwaysBeforeDeps

A list value. The dependencies alwaysBefore (GenericPipes that must be executed before this one).

notAfterDeps

A list value. The dependencies notAfter (GenericPipes that cannot be executed after this one).

removeStopWords

A logical value. Indicates if the stop words are removed or not.

resourcesStopWordsPath

A character value. Path of resource files (in json format) containing the stop words.


Method pipe()

Preprocesses the Instance to obtain/remove the stop words. The stop words found in the data are added to the list of properties of the Instance.

Usage

StopWordPipe$pipe(instance)

Arguments

instance

A Instance value. The Instance to preprocess.

Returns

The Instance with the modifications that have occurred in the pipe.


Method findStopWord()

Checks if the stop word is in the data.

Usage

StopWordPipe$findStopWord(data, stopWord)

Arguments

data

A character value. The text where stop word will be searched.

stopWord

A character value. Indicates the stop word to find.

Returns

A logical value depending on whether the stop word is in the data.


Method removeStopWord()

Removes the stop word in the data.

Usage

StopWordPipe$removeStopWord(stopWord, data)

Arguments

stopWord

A character value. Indicates the stop word to remove.

data

A character value. The text where stop word will be removed.

Returns

The data with the stop words removed.


Method getPropertyLanguageName()

Gets the name of property language.

Usage

StopWordPipe$getPropertyLanguageName()

Returns

Value of name of property language.


Method getResourcesStopWordsPath()

Gets the path of stop words resources.

Usage

StopWordPipe$getResourcesStopWordsPath()

Returns

Value of path of stop words resources.


Method setResourcesStopWordsPath()

Sets the path of stop words resources.

Usage

StopWordPipe$setResourcesStopWordsPath(path)

Arguments

path

A character value. The new value of the path of stop words resources.


Method clone()

The objects of this class are cloneable with this method.

Usage

StopWordPipe$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Details

StopWordPipe class requires the resource files (in json format) containing the list of stop words. To this end, the language of the text indicated in the propertyLanguageName should be contained in the resource file name (ie. xxx.json where xxx is the value defined in the propertyLanguageName ). The location of the resources should be defined in the "resources.stopwords.path" field of bdpar.Options variable.

See Also

AbbreviationPipe, bdpar.Options, ContractionPipe, File2Pipe, FindEmojiPipe, FindEmoticonPipe, FindHashtagPipe, FindUrlPipe, FindUserNamePipe, GuessDatePipe, GuessLanguagePipe, Instance, InterjectionPipe, MeasureLengthPipe, GenericPipe, ResourceHandler, SlangPipe, StoreFileExtPipe, TargetAssigningPipe, TeeCSVPipe, ToLowerCasePipe