readFastqDb
adds the sequencing quality scores to a data.frame
from a FASTQ file. Matching is done by `sequence_id`.
readFastqDb(
data,
fastq_file,
quality_offset = -33,
header = c("presto", "asis"),
sequence_id = "sequence_id",
sequence = "sequence",
sequence_alignment = "sequence_alignment",
v_cigar = "v_cigar",
d_cigar = "d_cigar",
j_cigar = "j_cigar",
np1_length = "np1_length",
np2_length = "np2_length",
v_sequence_end = "v_sequence_end",
d_sequence_end = "d_sequence_end",
style = c("num", "ascii", "both"),
quality_sequence = FALSE
)
Modified data
with additional fields:
quality_alignment
: A character vector with ASCII Phred
scores for sequence_alignment
.
quality_alignment_num
: A character vector, with comma separated
numerical quality values for each
position in sequence_alignment
.
quality
: A character vector with ASCII Phred
scores for sequence
.
quality_num
: A character vector, with comma separated
numerical quality values for each
position in sequence
.
data.frame
containing sequence data.
path to the fastq file
offset value to be used by ape::read.fastq. It is the value to be added to the quality scores (the default -33 applies to the Sanger format and should work for most recent FASTQ files).
FASTQ file header format; one of "presto"
or
"asis"
. Use "presto"
to specify
that the fastq file headers are using the pRESTO
format and can be parsed to extract
the sequence_id
. Use "asis"
to skip
any processing and use the sequence names as they are.
column in data
that contains sequence
identifiers to be matched to sequence identifiers in
fastq_file
.
column in data
that contains sequence data.
column in data
that contains IMGT aligned sequence data.
column in data
that contains CIGAR
strings for the V gene alignments.
column in data
that contains CIGAR
strings for the D gene alignments.
column in data
that contains CIGAR
strings for the J gene alignments.
column in data
that contains the number
of nucleotides between the V gene and first D gene
alignments or between the V gene and J gene alignments.
column in data
that contains the number
of nucleotides between either the first D gene and J
gene alignments or the first D gene and second D gene
alignments.
column in data
that contains the
end position of the V gene in sequence
.
column in data
that contains the
end position of the D gene in sequence
.
how the sequencing quality should be returned;
one of "num"
, "phred"
, or "both"
.
Specify "num"
to store the quality scores as strings of
comma separated numeric values. Use "phred"
to have
the function return the scores as Phred (ASCII) scores.
Use "both"
to retrieve both.
specify TRUE
to keep the quality scores for
sequence
. If false, only the quality score
for sequence_alignment
will be added to data
.
maskPositionsByQuality and getPositionQuality
db <- airr::read_rearrangement(system.file("extdata", "example_quality.tsv", package="alakazam"))
fastq_file <- system.file("extdata", "example_quality.fastq", package="alakazam")
db <- readFastqDb(db, fastq_file, quality_offset=-33)
Run the code above in your browser using DataLab