Learn R Programming

bdpar (version 3.1.0)

Big Data Preprocessing Architecture

Description

Provide a tool to easily build customized data flows to pre-process large volumes of information from different sources. To this end, 'bdpar' allows to (i) easily use and create new functionalities and (ii) develop new data source extractors according to the user needs. Additionally, the package provides by default a predefined data flow to extract and pre-process the most relevant information (tokens, dates, ... ) from some textual sources (SMS, Email, YouTube comments).

Copy Link

Version

Install

install.packages('bdpar')

Monthly Downloads

325

Version

3.1.0

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Miguel Ferreiro-D<c3><ad>az

Last Published

December 12th, 2023

Functions in bdpar (3.1.0)

DynamicPipeline

Class implementing a dynamic pipelining process
ExtractorSms

Class to handle SMS files with tsms extension
Connections

Class to manage the connections with YouTube
ExtractorYtbid

Class to handle comments of YouTube files with ytbid extension
ExtractorFactory

Class to handle the creation of Instance types
DefaultPipeline

Class implementing a default pipelining process.
Bdpar

Class to manage the preprocess of the files throughout the flow of pipes
AbbreviationPipe

Class to find and/or replace the abbreviations on the data field of an Instance
ContractionPipe

Class to find and/or replace the contractions on the data field of a Instance
ExtractorEml

Class to handle email files with eml extension
GuessLanguagePipe

Class to guess the language of an Instance
File2Pipe

Class to obtain the source field of an Instance
GuessDatePipe

Class to obtain the date field of an Instance
FindUserNamePipe

Class to find and/or remove the users on the data field of an Instance
FindEmojiPipe

Class to find and/or replace the emoji on the data field of an Instance
FindUrlPipe

Class to find and/or remove the URLs on the data field of an Instance
FindEmoticonPipe

Class to find and/or remove the emoticons on the data field of an Instance
FindHashtagPipe

Class to find and/or remove the hashtags on the data field of an Instance
GenericPipe

Abstract super class that handles the management of the Pipes
GenericPipeline

Abstract super class implementing the pipelining process
SlangPipe

Class to find and/or replace the slangs on the data field of an Instance
Instance

Abstract super class that handles the management of the Instances
TeeCSVPipe

Class to handle a CSV with the properties field of the preprocessed Instance
StopWordPipe

Class to find and/or remove the stop words on the data field of an Instance
ResourceHandler

Class that handles different types of resources
TargetAssigningPipe

Class to get the target field of the Instance
StoreFileExtPipe

Class to get the file's extension field of an Instance
MeasureLengthPipe

Class to obtain the length of the data field of an Instance
ToLowerCasePipe

Class to convert the data field of an Instance to lower case
runPipeline

Initiates the pipelining process
operator-pipe

bdpar customized forward-pipe operator
InterjectionPipe

Class to find and/or remove the interjections on the data field of an Instance
bdpar.log

Write messages to the log at a given priority level using the custom bdpar log
bdpar.Options

Object to handle the keys/attributes/options common to all pipeline flow
bdparData

Example of the content of the files to be preprocessed.
emojisData

Emojis codes and descriptions data.