Learn R Programming

bdpar (version 3.1.0)

Bdpar: Class to manage the preprocess of the files throughout the flow of pipes

Description

Bdpar class provides the static variables required to perform the whole data flow process. To this end Bdpar is in charge of (i) initialize the objects of handle the connections to APIs (Connections) and handles json resources (ResourceHandler) and (ii) executing the flow of pipes (inherited from GenericPipeline class) passed as argument.

Arguments

Static variables

connections:

(Connections) object that handles the connections with YouTube and Twitter.

resourceHandler:

(ResourceHandler) object that handles the json resources files.

Methods


Method new()

Creates a Bdpar object. Initializes the static variables: connections and resourceHandler.

Usage

Bdpar$new()


Method execute()

Preprocess files through the indicated flow of pipes.

Usage

Bdpar$execute(
  path,
  extractors = ExtractorFactory$new(),
  pipeline = DefaultPipeline$new(),
  cache = TRUE,
  verbose = FALSE,
  summary = FALSE
)

Arguments

path

A character value. The path where the files to be processed are located.

extractors

A ExtractorFactory value. Class which implements the createInstance method to choose which type of Instance is created.

pipeline

A GenericPipeline value. Subclass of GenericPipeline, which implements the execute method. By default, it is the DefaultPipeline pipeline.

cache

(logical) flag indicating if the status of the instances will be stored after each pipe. This allows to avoid rejections of previously executed tasks, if the order and configuration of the pipe and pipeline is the same as what is stored in the cache.

verbose

(logical) flag indicating for printing messages, warnings and errors.

summary

(logical) flag indicating if a summary of the pipeline execution is provided or not.

Details

In case of wanting to parallelize, it is necessary to indicate the number of cores to be used through bdpar.Options$set("numCores", numCores)

Returns

The list of Instances that have been preprocessed.


Method clone()

The objects of this class are cloneable with this method.

Usage

Bdpar$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Details

In the case that some pipe, defined on the workflow, needs some type of configuration, it can be defined through bdpar.Options variable which have different methods to support the functionality of different pipes.

See Also

bdpar.Options, Connections, DefaultPipeline, DynamicPipeline, GenericPipeline, Instance, ExtractorFactory, ResourceHandler, runPipeline

Examples

Run this code
if (FALSE) {

#If it is necessary to indicate any configuration, do it through:
#bdpar.Options$set(key, value)
#If the key is not initialized, do it through:
#bdpar.Options$add(key, value)

#If it is necessary parallelize, do it through:
#bdpar.Options$set("numCores", numCores)

#If it is necessary to change the behavior of the log, do it through:
#bdpar.Options$configureLog(console = TRUE, threshold = "INFO", file = NULL)

#Folder with the files to preprocess
path <- system.file("example",
                    package = "bdpar")

#Object which decides how creates the instances
extractors <- ExtractorFactory$new()

#Object which indicates the pipes' flow
pipeline <- DefaultPipeline$new()

objectBdpar <- Bdpar$new()

#Starting file preprocessing...
objectBdpar$execute(path = path,
                    extractors = extractors,
                    pipeline = pipeline,
                    cache = FALSE,
                    verbose = FALSE,
                    summary = TRUE)
}

Run the code above in your browser using DataLab