qualityScores: Extract quality score data in a sequencing read dataset

Description

Extract quality strings and convert them to Phred scores

Usage

qualityScores(filename, input_format="gzFASTQ", offset=33, nreads=10000)

Arguments

filename

character string giving the name of an input file containing sequence reads.

input_format

character string specifying format of the input file. gzFASTQ (gzipped FASTQ) by default. Acceptable formats include gzFASTQ, FASTQ, SAM and BAM. Character string is case insensitive.

offset

numeric value giving the offset added to the base-calling Phred scores. Possible values include 33 and 64. By default, 33 is used.

nreads

numeric value giving the number of reads from which quality scores are extracted. 10000 by default.

Value

A data matrix containing Phred scores for read bases. Rows in the matrix are reads and columns are base positions in each read.

Details

Quality scores of read bases are represented by ASCII characters in next-gen sequencing data. This function extracts the quality characters from each base in each read and then converts them to Phred scores using the provided offset value (offset).

If the total number of reads in a dataset is n, then every n/nreads read is extracted from the input data.

Examples

Run this code

library(Rsubread)
reads <- system.file("extdata","reads.txt.gz",package="Rsubread")
x <- qualityScores(filename=reads,offset=64,nreads=1000)
x[1:10,1:10]

Run the code above in your browser using DataLab