qAlign(sampleFile, genome, auxiliaryFile=NULL, aligner="Rbowtie", maxHits=1, paired=NULL, splicedAlignment=FALSE, snpFile=NULL, bisulfite="no", alignmentParameter=NULL, projectName="qProject", alignmentsDir=NULL, lib.loc=NULL, cacheDir=NULL, clObj=NULL, checkOnly=FALSE)
Rbowtie
package).maxHits
alignments, no alignments will be reported for it. In case of a
multi-mapping read, a single alignment is randomly selectedno
(single read experiment, default), fr
(fw/rev),
ff
(fw/fw) or rf
(rev/fw).TRUE
, reads will be aligned by
SpliceMap to produce spliced alignments (without using a database of
known exon-exon junctions). Using splicedAlignment=TRUE will
increase alignment times roughly by a factor of ten. The option can
only be used for reads with a minimal length of 50nt; SpliceMap
ignores reads that are shorter. Such short reads will not be
contained in the BAM file, neither as mapped or unmapped reads.QuasR
. Please use with caution;
some alignment parameters may break assumptions made by
QuasR
. Default parameters are listed in Details.NULL
(default), bam files will be
generated at the location of the input sequence files.QuasR
to store aligner index
packages created from BSgenome
reference genomes, or to
install newly downloaded BSgenome
packages.NULL
(default), the temporary
directory of the current R session as returned by tempdir()
will be used.parallel
, to enable parallel processing and
speed up the alignment process.TRUE
, prevents the automatic creation of
alignments or aligner indices. This allows to quickly check for missing
alignment files without starting the potentially long process of
their creation. In the case of missing alignments or indices, an
exception is thrown.qProject
object.
qAlign
looks for previously
generated alignments as well as for an aligner index. If no aligner
index exists, it will be automatically created and stored in the same
directory as the provided fasta file, or as an R package in the case
of a BSgenome reference. The name of this R package will be the same
as the BSgenome package name, with an additional suffix from the
aligner (e.g. BSgenome.Hsapiens.UCSC.hg19.Rbowtie
). The
generated bam files contain both aligned und unaligned reads. For
paired-end samples, by default no alignments will be reported for
read pairs where only one of the reads could be aligned.
sampleFile
is a tab-delimited text file listing all the input
sequences to be included in a given analysis. The file has either two
(single-end) or three columns (paired-end). The first row contains the
column names, and additional rows contain relative or absolute path
and name of input sequence file(s), as well as the according sample
name. Three input file formats are supported (fastq, fasta and
bam). All input files in one sampleFile
need to be in the same
format, and are recognized by their extension (.fq, .fastq, .fa,
.fasta, .fna, .bam), in raw or compressed form (e.g. .fastq.gz). If
bam files are provided, then no alignments are generated by
qAlign
, and the alignments contained in the bam files will be
used instead. The column names in sampleFile
have to match to the ones in the
examples below, for a single-read experiment:
FileName |
SampleName |
chip_1_1.fq.bz2 |
Sample1 |
FileName1 | FileName2 |
SampleName | rna_1_1.fq.bz2 |
rna_1_2.fq.bz2 | Sample1 |
The SampleName column is the human-readable name for each
sample that will be used as sample labels. Multiple sequence files may
be associated to the same sample name, which instructs QuasR
to
combine those files.
auxiliaryFile
is a tab-delimited text file listing one or
several additional target sequence files in fasta format. Reads that
do not map against the reference genome will be aligned against each
of these target sequence files. The first row contains the column
names which have to match to the ones in the example below:
FileName |
AuxName |
snpFile
is a tab-delimited text file without a header and
contains four columns with chromosome name, position, reference allele
and alternative allele, as in the example below:
chr1 | 8596 | G |
A | chr1 | 18443 |
G | A | chr1 |
18981 | C | T |
The reference and alternative alleles will be injected into the
reference genome, resulting in two separate genomes. All reads will be
aligned separately to both of these genomes, and the alignments will
be combined, only retaining the best alignment for each read. In the
final alignment, each read will be marked with a tag that classifies
it into reference (R
), alternative (A
) or unknown
(U
), if the reads maps equally well to both genomes.
If bisulfite
is set to dir or undir, reads
will be C-to-T converted and aligned to a similarly converted genome.
If alignmentParameter
is NULL
(recommended),
qAlign
will select default parameters that are suitable for the
experiment type. Please note that for bisulfite or allele-specific
experiments, each read is aligned multiple times, and resulting
alignments need to be combined. This requires special settings for the
alignment parameters that are not recommended to be changed. For
simple experiments (neither bisulfite, allele-specific, nor
spliced), alignments are generated using the parameters -m
maxHits --best --strata
. This will align reads with up to
maxHits best hits in the genome and selects one of them randomly.
qProject
,
makeCluster
from package parallel,
Rbowtie
package
## Not run:
# # see qCount, qMeth and qProfile manual pages for examples
# example(qCount)
# example(qMeth)
# example(qProfile)
# ## End(Not run)
Run the code above in your browser using DataLab