Learn R Programming

bdpar (version 3.1.0)

FindUrlPipe: Class to find and/or remove the URLs on the data field of an Instance

Description

This class is responsible of detecting the existing URLs in the data field of each Instance. Identified URLs are stored inside the URLs field of Instance class. Moreover if required, is able to perform inline URLs removement.

Arguments

Inherit

This class inherits from GenericPipe and implements the pipe abstract function.

Super class

bdpar::GenericPipe -> FindUrlPipe

Public fields

URLPattern

A character value. The regular expression to detect URLs.

EmailPattern

A character value. The regular expression to detect emails.

Methods

Inherited methods


Method new()

Creates a FindUrlPipe object.

Usage

FindUrlPipe$new(
  propertyName = "URLs",
  alwaysBeforeDeps = list(),
  notAfterDeps = list("FindUrlPipe"),
  removeUrls = TRUE,
  URLPatterns = list(self$URLPattern, self$EmailPattern),
  namesURLPatterns = list("UrlPattern", "EmailPattern")
)

Arguments

propertyName

A character value. Name of the property associated with the GenericPipe.

alwaysBeforeDeps

A list value. The dependencies alwaysBefore (GenericPipes that must be executed before this one).

notAfterDeps

A list value. The dependencies notAfter (GenericPipes that cannot be executed after this one).

removeUrls

A logical value. Indicates if the URLs are removed.

URLPatterns

A list value. The regex to find URLs.

namesURLPatterns

A list value. The names of regex.

propertyLanguageName

A character value. Name of the language property.


Method pipe()

Preprocesses the Instance to obtain/remove the URLs. The URLs found in the data are added to the list of properties of the Instance.

Usage

FindUrlPipe$pipe(instance)

Arguments

instance

A Instance value. The Instance to preprocess.

Returns

The Instance with the modifications that have occurred in the pipe.


Method findUrl()

Finds the URLs in the data.

Usage

FindUrlPipe$findUrl(pattern, data)

Arguments

pattern

A character value. The regex to find URLs.

data

A character value. The text to find the URLs.

Returns

The list with URLs found.


Method removeUrl()

Removes the URL in the data.

Usage

FindUrlPipe$removeUrl(pattern, data)

Arguments

pattern

A character value. The regex to find URLs.

data

A character value. The text to remove the URLs.

Returns

The data with URLs removed.


Method putNamesURLPattern()

Sets the names to URL patterns result.

Usage

FindUrlPipe$putNamesURLPattern(resultOfURLPatterns)

Arguments

resultOfURLPatterns

A list value. The list with URLs found.

Returns

The URLs found with the names of URL pattern.


Method getURLPatterns()

Gets the URL patterns.

Usage

FindUrlPipe$getURLPatterns()

Returns

Value of URL patterns.


Method setURLPatterns()

Sets the URL patterns.

Usage

FindUrlPipe$setURLPatterns(URLPatterns)

Arguments

URLPatterns

A list value. The new value of the URL patterns.


Method getNamesURLPatterns()

Gets the names of URLs.

Usage

FindUrlPipe$getNamesURLPatterns()

Returns

Value of names of URLs.


Method setNamesURLPatterns()

Sets the names of URLs.

Usage

FindUrlPipe$setNamesURLPatterns(namesURLPatterns)

Arguments

namesURLPatterns

A list value. The new value of the names of URLs.


Method clone()

The objects of this class are cloneable with this method.

Usage

FindUrlPipe$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Details

The regular expressions indicated in the URLPatterns variable are used to identify URLs.

See Also

AbbreviationPipe, ContractionPipe, File2Pipe, FindEmojiPipe, FindEmoticonPipe, FindHashtagPipe, FindUserNamePipe, GuessDatePipe, GuessLanguagePipe, Instance, InterjectionPipe, MeasureLengthPipe, GenericPipe, SlangPipe, StopWordPipe, StoreFileExtPipe, TargetAssigningPipe, TeeCSVPipe, ToLowerCasePipe