Usage
simulate_experiment(fasta = NULL, gtf = NULL, seqpath = NULL, num_reps = 10, fraglen = 250, fragsd = 25, readlen = 100, error_rate = 0.005, paired = TRUE, reads_per_transcript = 300, fold_changes, size = NULL, outdir = ".", write_info = TRUE, transcriptid = NULL, seed = NULL, ...)
Arguments
fasta
path to FASTA file containing transcripts from which to simulate
reads. See details.
gtf
path to GTF file containing transcript structures from which reads
should be simulated. See details.
seqpath
path to folder containing one FASTA file (.fa
extension) for each chromosome in gtf
. See details.
num_reps
How many biological replicates should be in each group? If
num_reps
is a single integer, num_reps
replicates will be
simulated in each group. Otherwise, num_reps
can be a length-2
vector, where num_reps[1]
and num_reps[2]
replicates will be
simulated in each of the two groups.
fraglen
Mean RNA fragment length. Sequences will be read off the
end(s) of these fragments.
fragsd
Standard deviation of fragment lengths.
error_rate
Sequencing error rate. Must be between 0 and 1. A uniform
error model is assumed.
paired
If TRUE
, paired-end reads are simulated; else
single-end reads are simulated.
reads_per_transcript
baseline mean number of reads to simulate
from each transcript. Can be an integer, in which case this many reads
are simulated from each transcript, or an integer vector whose length
matches the number of transcripts in fasta
.
fold_changes
Vector of multiplicative fold changes between groups,
one entry per transcript in fasta
. A fold change > 1 means the
transcript is overexpressed in the first num_reps
(or
num_reps[1]
) samples. Fold change < 1 means transcript is
overexpressed in the last num_reps
(or num_reps[2]
) samples.
The change is in the mean number of reads generated from the transcript,
between groups.
size
the negative binomial size
parameter (see
NegBinomial
) for the number of reads drawn per transcript.
If left blank, defaults to reads_per_transcript / 3
. Negative
binomial variance is mean + mean^2 / size. Can either be left at default,
a vector of the same length as number of transcripts in fasta
,
if the two groups should have the same size parameters, or a list with 2
elements, each of which is a vector with length equal to the number of
transcripts in fasta
, which represent the size parameters for each
transcript in groups 1 and 2, respectively. outdir
character, path to folder where simulated reads should be
written, with *no* slash at the end. By default, reads are
written to current working directory.
write_info
If TRUE
, write a file matching transcript IDs to
differential expression status into the file outdir/sim_info.txt
.
transcriptid
optional vector of transcript IDs to be written into
sim_info.txt
and used as transcript identifiers in the fasta files.
Defaults to names(readDNAStringSet(fasta))
. This option is useful
if default names are very long or contain special characters.
seed
Optional seed to set before simulating reads, for
reproducibility.
...
additional arguments to pass to seq_gtf
if using
gtf
and seqpath