Learn R Programming

flowr (version 0.9.11)

to_flow: Create flow objects

Description

Use a set of shell commands (flow mat) and flow definition to create flow object.

Usage

to_flow(x, ...)

is.flow(x)

# S3 method for character to_flow(x, def, grp_col, jobname_col, cmd_col, ...)

# S3 method for flowmat to_flow( x, def, flowname, grp_col, jobname_col, cmd_col, submit = FALSE, execute = FALSE, containerize = TRUE, platform, flow_run_path, qobj, verbose = opts_flow$get("verbose"), ... )

# S3 method for data.frame to_flow(x, ...)

# S3 method for list to_flow( x, def, flowname, flow_run_path, desc, qobj, module_cmds = opts_flow$get("module_cmds"), verbose = opts_flow$get("verbose"), ... )

Arguments

x

this can either to a filename, a data.frame or a list. In case it is a file name, it should be a tsv file representing a flow_mat. See to_flowmat for details

...

Supplied to specific functions like to_flow.data.frame

def

a flow definition. Basically a table with resource requirements and mapping of the jobs in this flow. See to_flowdef for details on the format.

grp_col

name of the grouping column in the supplied flow_mat. See to_flow for details. Default value is [samplename].

jobname_col

name of the job name column in flow_mat. Default value is [jobname].

cmd_col

name of the command column name in flow_mat. Default value is [cmd].

flowname

name of the flow, this is used as part of the execution foldername. A good simple identifier, which does not support any special characters. Names may use characters (a-z) and numbers (0-9), using underscore (_) as a word separator. Default value is [flowname].

submit

after creating a flow object, should flowr also use submit_flow to perform a dry-run OR real submission. See below for details. Default value is [FALSE]

execute

when calling submit_flow, should flowr execute the flow or perform a dry-run. See below for details. Default value is [FALSE].

containerize

if the flowmat has multiple samples, flowr creates a creates a new date-stamped folder, and includes all flows in this batch inside it. This is keeps the logs clean, and containerizes each batch. To disable this behavior set this to FALSE, default is [TRUE].

platform

a specifying the platform to use, possible values are local, lsf, torque, moab, sge and slurm This over-rides the platform column in the flowdef. (optional)

flow_run_path

base path to be used for execution of this flow. flowr would create a new time-stamped folder in this base path and use it for logs, scripts etc. The default is retrieved using opts_flow$get("flow_run_path").

qobj

Depreciated, modify cluster templates as explained on flow-r.github.io/flowr. An object of class queue.

verbose

A numeric value indicating the amount of messages to produce. Values are integers varying from 0, 1, 2, 3, .... Please refer to the verbose page for more details. opts_flow$get("verbose")

desc

Advanced Use. final flow name.

module_cmds

A character vector of additional commands, which will be prepended to each script of the flow. Default is retrieved using opts_flow$get("module_cmds").

Value

Returns a flow object. If execute=TRUE, fobj is rich with information about where and how the flow was executed. It would include details like jobids, path to exact scripts run etc. To use kill_flow, to kill all the jobs one would need a rich flow object, with job ids present.

Behaviour: What goes in, and what to expect in return?

  • submit=FALSE & execute=FALSE: Create and return a flow object

  • submit=TRUE & execute=FALSE: dry-run, Create a flow object then, create a structured execution folder with all the commands

  • submit=TRUE, execute=TRUE: Do all of the above and then, submit to cluster

Details

The parameter x can be a path to a flow_mat, or a data.frame (as read by read_sheet). This is a minimum three column table with columns: samplename, jobname and cmd. See to_flowmat for details.

See Also

to_flowmat, to_flowdef, to_flowdet, flowopts and submit_flow

Examples

Run this code
# NOT RUN {
## Use this link for a few elaborate examples:
## http://flow-r.github.io/flowr/flowr/tutorial.html#define_modules

ex = file.path(system.file(package = "flowr"), "pipelines")
flowmat = as.flowmat(file.path(ex, "sleep_pipe.tsv"))
flowdef = as.flowdef(file.path(ex, "sleep_pipe.def"))
fobj = to_flow(x = flowmat, def = flowdef, flowname = "sleep_pipe", platform = "lsf")


## create a vector of shell commands
cmds = c("sleep 1", "sleep 2")
## create a named list
lst = list("sleep" = cmds)
## create a flowmat
flowmat = to_flowmat(lst, samplename = "samp")

## Use flowmat to create a skeleton flowdef
flowdef = to_flowdef(flowmat)

## use both (flowmat and flowdef) to create a flow
fobj = to_flow(flowmat, flowdef)

## submit the flow to the cluster (execute=TRUE) or do a dry-run (execute=FALSE)
# }
# NOT RUN {
fobj2 = submit_flow(fobj, execute=FALSE)
fobj3 = submit_flow(fobj, execute=TRUE)

## Get the status or kill all the jobs
status(fobj3)
kill(fobj3)
# }
# NOT RUN {

# }

Run the code above in your browser using DataLab