Learn R Programming

rPlant (version 2.16)

SubmitJob: Executing analytical applications

Description

Functions for executing and managing analytical applications deployed in the iPlant infrastructure

Usage

SubmitJob(application, file.path="", file.list=NULL, input.list, 
          args.list=NULL, job.name, nprocs=1, private.APP=FALSE, 
          suppress.Warnings=FALSE,  shared.username=NULL,
          print.curl=FALSE)
Wait(job.id, minWaitsec, maxWaitsec, print=FALSE)
CheckJobStatus(job.id, history = FALSE, print.curl = FALSE)
KillJob(job.id, print.curl=FALSE)
ListJobOutput(job.id, print.curl=FALSE, print.total=TRUE)
RetrieveJob(job.id, file.vec=NULL, print.curl=FALSE, verbose=FALSE)
GetJobHistory(return.json=FALSE, print.curl=FALSE)
DeleteJob(job.id, print.curl=FALSE, ALL=FALSE)

Arguments

application
Name of DE application. Use the ListApps() function for a list of eligible applications. To run your own private application use private.APP =TRUE and suppress.Warnings=TRUE.
file.path
Optional path to a user's subdirectory on the DE; the default path is empty, which leads to the home directory.
file.list
A list of input files, many functions only have one input file, but some have multiple input files. These should be organized as a list. The file.list and input.list should correspond. See details for more information.
job.name
The name to give the job being submitted.
nprocs
The number of processors to be allocated to the job, default = 1.
private.APP
Optional argument for submitting a job on your own private application, default is FALSE
job.id
The unique ID number given to a submitted job.
input.list
A list of type of input that is specific to the application. See details for more information.
args.list
A list of input options available for the application. These are usually the flagging options in command line invocations. See details for more information.
return.json
Optional screen output that displays all of the results from the api, default = FALSE.
file.vec
Names of output files to download, can be one or many. If left NULL, all the files in the job output will download.
minWaitsec
A range of times (in seconds) must be entered for the Wait function. This entry is the minimum time (in seconds) of that range.
maxWaitsec
A range of times (in seconds) must be entered for the Wait function. This entry is the maximum time (in seconds) of that range.
print.curl
Prints the curl statement that can be used in the terminal, if curl is installed on your computer.
print.total
Option only for the ListJobOuput function this option will print the total number of files in the folder.
print
Only for the Wait function, when print=TRUE, it simply prints the status when the job is complete.
verbose
For the RetrieveJob function this option will print the names of the files as they are downloaded.
shared.username
With iPlant you have the ability to share folders with other users. If someone has shared a folder with you and you want to run a job with them, enter their username for this input.
suppress.Warnings
This will turn off the warnings, will speed up run time. Use with caution, if the inputs are incorrect they will not be caught. If the application you are running is a private application have suppress.Warnings=TRUE.
ALL
This option is only on the DeleteJob function. If ALL=TRUE then all jobs in the job history will be deleted.
history
This option is only on the CheckJobStatus function. If TRUE, then will show entire history of job.

Value

  • A list containing the job id and the job name is provided for jobs submitted. If an error, then a message stating the error should also be reported.

Details

The function SubmitJob, takes inputs and arguments and submits a job on the Agave API. The SubmitJob function will run the application with the file inputs file.list that are in the directory file.path. The files within file.list need to match the expected file types for the application (defined in input.list argument). The appropriate options for the application need to be outlined in input.list and potentially args.list. The SubmitJob function outputs the job.id and the job name. With that job.id you can run CheckJobStatus(job.id) to check the status of your job, and the job name can be used in workflows. The stages for CheckJobStatus are:

l{ PENDING STAGING_INPUTS CLEANING_UP ARCHIVING STAGING_JOB FINISHED KILLED FAILED STOPPED RUNNING PAUSED QUEUED SUBMITTING STAGED PROCESSING_INPUTS ARCHIVING_FINISHED ARCHIVING_FAILED }

When it is finished it will read either ARCHIVING_FINISHED or FINISHED, unless it failed. Use the KillJob function to terminate a running job. Use the Wait function to wait until job is finished. Be cautious using the Wait function, because it will lock up the workspace until the job is finished. When the job is finished then use the ListJobOutput function to see all of the files in your job. The number of output files varies by application. The RetrieveJob function takes the job.id and the file.vec as input, and downloads the specified files in the file.vec. The files will be downloaded to your current working directory (getwd()). The file.vec contains the file names that you want to download. This vector is a subset of the output from ListJobOutput. The DeleteJob function then deletes the job and the correponding output folder that was generated from running the job. Using the option DeleteJob(ALL=TRUE) will delete all jobs in a user's job history. The GetJobHistory function displays all jobs in your history that have not been deleted.

For the SubmitJob function the application must match an application name that is in the output from the ListApps function. For the input.list use the GetAppInfo function, the 'kind' column verifies if "input" or "output". What goes in the input.list is only the name in the 'id' column when the 'kind' column is "input". For example, when the application is "muscle-lonestar-3.8.31u2", we can use GetAppInfo("muscle-lonestar-3.8.31u2")$Information to determine that the application is expecting "stdin" as its first input file (input.list=list("stdin")). For the application "velveth-1.2.07u1", GetAppInfo("velveth-1.2.07u1")$Information, tells us that the application will expect six input files, which should be in the order: input.list=list("reads1", "reads2", "reads3", "reads4", "reads5", "reads6"). A few things to note: 1) depending on the application, the input.list can be shorter than the the number of inputs, for example, using the "velveth-1.2.07u1" application, the input list could be input.list=list("reads1", "reads2", "reads3"); 2) the file.list should always be the same length as input.list; 3) for args.list use GetAppInfo function, when the 'kind' column is 'parameters', those are the inputs for args.list. For velveth-1.2.07u1 the args.list is as follows, list(c("format1", value), c("kmer", value), c("Output", value)). The list can be as long as the number of options.

See Also

ListApps, Validate, UploadFile

Examples

Run this code
data(DNA.fasta)
write.fasta(sequences = DNA.fasta, names = names(DNA.fasta), file.out = "DNA.fasta")
Validate("username","password")
UploadFile("DNA.fasta", filetype="FASTA-0")

# Submit a MUSCLE job using the provided data in the package.  The job will return
# a job id and job name
myJob <- SubmitJob(application="Muscle-3.8.32u4", file.list=list("DNA.fasta"),
                            input.list=list("stdin"), args.list=list(c("arguments", 
                            "-phyiout")), job.name="muscleDNA")

# Check the status of any job
CheckJobStatus(myJob$id)
             
# Lists and output files a job has created
ListJobOutput(myJob$id)

# Might want to kill job if incorrect running
KillJob(myJob$id)
# Need to wait for job to be done 
Wait(myJob$id, 5, 1800, print=TRUE)
 
# Download output files
RetrieveJob(myJob$id, ListJobOutput(myJob$id, print.total=FALSE))
     
# View job history
GetJobHistory()

# Delete Job
DeleteJob(myJob$id)
DeleteJob(ALL=TRUE)

Run the code above in your browser using DataLab