This class is responsible of detecting the existing URLs in the
data field of each Instance
. Identified URLs are
stored inside the URLs field of Instance
class.
Moreover if required, is able to perform inline URLs removement.
This class inherits from GenericPipe
and implements the
pipe
abstract function.
bdpar::GenericPipe
-> FindUrlPipe
new()
Creates a FindUrlPipe
object.
FindUrlPipe$new(
propertyName = "URLs",
alwaysBeforeDeps = list(),
notAfterDeps = list("FindUrlPipe"),
removeUrls = TRUE,
URLPatterns = list(self$URLPattern, self$EmailPattern),
namesURLPatterns = list("UrlPattern", "EmailPattern")
)
propertyName
A character
value. Name of the property
associated with the GenericPipe
.
alwaysBeforeDeps
A list
value. The dependencies
alwaysBefore (GenericPipes
that must be executed before
this one).
notAfterDeps
A list
value. The dependencies
notAfter (GenericPipes
that cannot be executed after
this one).
removeUrls
A logical
value. Indicates if the
URLs are removed.
URLPatterns
A list
value. The regex to find URLs.
namesURLPatterns
A list
value. The names of regex.
propertyLanguageName
A character
value. Name of the
language property.
pipe()
Preprocesses the Instance
to obtain/remove
the URLs. The URLs found in the data are added to the
list of properties of the Instance
.
FindUrlPipe$pipe(instance)
instance
A Instance
value. The Instance
to preprocess.
The Instance
with the modifications that have
occurred in the pipe.
findUrl()
Finds the URLs in the data.
FindUrlPipe$findUrl(pattern, data)
pattern
A character
value. The regex to find URLs.
data
A character
value. The text to find the URLs.
The list
with URLs found.
removeUrl()
Removes the URL in the data.
FindUrlPipe$removeUrl(pattern, data)
pattern
A character
value. The regex to find URLs.
data
A character
value. The text to remove the URLs.
The data with URLs removed.
putNamesURLPattern()
Sets the names to URL patterns result.
FindUrlPipe$putNamesURLPattern(resultOfURLPatterns)
resultOfURLPatterns
A list
value. The list with
URLs found.
The URLs found with the names of URL pattern.
getURLPatterns()
Gets the URL patterns.
FindUrlPipe$getURLPatterns()
Value of URL patterns.
setURLPatterns()
Sets the URL patterns.
FindUrlPipe$setURLPatterns(URLPatterns)
URLPatterns
A list
value. The new value of
the URL patterns.
getNamesURLPatterns()
Gets the names of URLs.
FindUrlPipe$getNamesURLPatterns()
Value of names of URLs.
setNamesURLPatterns()
Sets the names of URLs.
FindUrlPipe$setNamesURLPatterns(namesURLPatterns)
namesURLPatterns
A list
value. The new value of
the names of URLs.
clone()
The objects of this class are cloneable with this method.
FindUrlPipe$clone(deep = FALSE)
deep
Whether to make a deep clone.
The regular expressions indicated in the URLPatterns
variable are used to identify URLs.
AbbreviationPipe
, ContractionPipe
,
File2Pipe
, FindEmojiPipe
,
FindEmoticonPipe
, FindHashtagPipe
,
FindUserNamePipe
, GuessDatePipe
,
GuessLanguagePipe
, Instance
,
InterjectionPipe
, MeasureLengthPipe
,
GenericPipe
, SlangPipe
,
StopWordPipe
, StoreFileExtPipe
,
TargetAssigningPipe
, TeeCSVPipe
,
ToLowerCasePipe