Learn R Programming

geomedb (version 2.0.1)

fasterqDump: Download or convert fastq data from NCBI Sequence Read Archive using multiple threads

Description

`fasterqDump()` uses the SRAtoolkit command-line function `fasterq-dump` to download fastq files from all samples returned by a queryMetadata query of GEOME, when one of the entities queried was `fastqMetadata`

Usage

fasterqDump(queryMetadata_object, sratoolkitPath = "",
  outputDirectory = "./", arguments = "-p", filenames = "accessions",
  source = "sra", cleanup = FALSE, fasterqDumpHelp = FALSE)

Arguments

queryMetadata_object

A list object returned from `queryMetadata` where one of the entities queried was `fastqMetadata`.

sratoolkitPath

String. A path to a local copy of sratoolkit. Only necessary if sratoolkit is not on your $PATH. Assumes executables are inside `bin`.

outputDirectory

String. A path to the directory where you would like the files to be stored.

arguments

A string variable of arguments to be passed directly to `fasterq-dump`. Defaults to "-p" to show progress. Use fasterqDumpHelp = TRUE to see a list of arguments.

filenames

String. How would you like the downloaded fastq files to be named? "accessions" names files with SRA accession numbers "IDs" names files with their materialSampleID "locality_IDs" names files with their locality and materialSampleID.

source

String. `fasterq-dump` can retrieve files directly from SRA, or it can convert .sra files previously downloaded with `prefetch` that are in the current working directory. "sra" downloads from SRA "local" converts .sra files in the current working directory.

cleanup

Logical. cleanup = T will delete any intermediate .sra files.

fasterqDumpHelp

Logical. fasterqDumpHelp = T will show the help page for `fasterq-dump` and then quit.

Value

This function will not return anything within r. It simply downloads fastq files. It will print command line stdout to the console, and also provide a start and end time and amount of time elapsed during the download.

Details

The `fasterq-dump` tool uses temporary files and multi-threading to speed up the extraction of FASTQ from SRA-accessions. This function works best with sratoolkit functions of version 2.9.6 or greater. SRAtoolkit functions can (ideally) be in your $PATH, or you can supply a path to them using the sratoolkitPath argument.

`fasterqDump()` downloads files to the current working directory unless a different one is assigned through outputDirectory.

Change the number of threads by adding "-e X" to arguments where X is the number of threads.

`fasterq-dump` will automatically split paired-end data into three files with:

  • file_1.fastq having read 1

  • file_2.fastq having read 2

  • file.fastq having unmatched reads

`fasterqDump()` can then rename these files based on their materialSampleID and locality.

Note that `fasterq-dump` will store temporary files in ~/ncbi/public/sra by default unless you pass "-t /path/to/temp/dir" to arguments. Make sure to periodically delete these temporary files.

See Also

https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc to download pre-compiled executables for sratoolkit or https://github.com/ncbi/sra-tools/wiki/Building-and-Installing-from-Source> to install from source

This function will not work on Windows systems because fasterq-dump is not currently available for Windows. See fastqDump if you use Windows. See prefetch to download .sra files prior to converting them locally.

Examples

Run this code
# NOT RUN {
# Run a query of GEOME first
acaoli <- queryMetadata(
    entity = "fastqMetadata", 
    query = "genus = Acanthurus AND specificEpithet = olivaceus AND _exists_:bioSample", 
    select=c("Event"))

#trim to 3 entries for expediency
acaoli$fastqMetadata<-acaoli$fastqMetadata[1:3,]
acaoli$Event<-acaoli$Event[1:3,]

# Download straight from SRA, naming files with their locality and materialSampleID
fasterqDump(queryMetadata_object = acaoli, filenames = "IDs", source = "sra")

# A generally faster option is to run prefetch first, followed by fasterqDump, with cleanup = T to 
# remove the prefetched .sra files.
prefetch(queryMetadata_object = acaoli)
fasterqDump(queryMetadata_object = acaoli, filenames = "IDs", source = "local", cleanup = T)
# }

Run the code above in your browser using DataLab