readGeneric(files, dir = ".", replicates, libnames, chrs, chrlens, cols, header = TRUE, minlen = 15, maxlen = 1000, multireads = 1000, polyLength, estimationType = "quantile", verbose = TRUE, filterReport = NULL, ...)
readBAM(files, dir = ".", replicates, libnames, chrs, chrlens, countID = NULL, minlen = 15, maxlen = 1000, multireads = 1000, polyLength, estimationType = "quantile", verbose = TRUE, filterReport)
@replicates[i] == @replicates[j]
. This argument
may be given in any form but will be stored as a factor.
getLibsizes
to infer the library sizes of the
samples.
read.table
. In particular, the `sep' and `skip'
arguments may be useful.alignmentData
object.
readBAM:
This function takes a set of BAM files and generates the
'alignmentData'
object from these. If a character string for
`countID' is given, the function assumes the data are non-redundant
and that `countID' identifies the count data (i.e., how many times
each read appears in the sequenced library) in each BAM file. If
`countID' is NULL, then it is assumed that the data are redundant, and
the count data are inferred from the file.
readGeneric:
The purpose of this function is to take a set of plain text files
and produce an 'alignmentData'
object. The function uses
read.table
to read in the columns of data in the files
and so by default columns are separated by any white
space. Alternative separators can be used by passing the appropriate
value for 'sep'
to read.table
.
The files may contain columns with column names
'chr'
, 'tag'
, 'count'
, 'start'
,
'end'
, 'strand'
in which case the `cols' argument can be
ommitted and `header' set to TRUE. If this is the case, there is no
requirement for all the files to have the same ordering of columns
(although all must have these column names).
Alternatively, the columns of data in the input files can be specified by
the `cols' argument in the form of a named character vector (e.g;
'cols = c(chr = 1, tag = 2, count = 3, start = 4, end = 5,
strand = 6)'
would cause the function to assume that the first column
contains the chromosome information, the second column contained the
tag information, etc. If `cols' is specified then information in the
header is ignored. If `cols' is missing and `header' is FALSE, then it
is assumed that the data takes the form described in the example above.
The 'tag'
, 'count'
and 'strand'
columns may optionally be
omitted from either the file column headers or the `cols' argument. If
the 'tag'
column is omitted, then the data will not account for
duplicated sequences when estimating the number of counts in loci. If
the 'count'
column is omitted, the 'readGeneric'
function
will assume that the file contains the alignments of each copy of each
sequence tag, rather than an aggregated alignment of each unique
sequence. The unique alignments will be identified and the number of
sequence tags aligning to each position will be calculated. If
'strand'
is omitted, the strand will simply be ignored.
alignmentData
# Define the chromosome lengths for the genome of interest.
chrlens <- c(2e6, 1e6)
# Define the files containing sample information.
datadir <- system.file("extdata", package = "segmentSeq")
libfiles <- c("SL9.txt", "SL10.txt", "SL26.txt", "SL32.txt")
# Establish the library names and replicate structure.
libnames <- c("SL9", "SL10", "SL26", "SL32")
replicates <- c(1,1,2,2)
# Process the files to produce an `alignmentData' object.
alignData <- readGeneric(file = libfiles, dir = datadir, replicates =
replicates, libnames = libnames, chrs = c(">Chr1", ">Chr2"), chrlens =
chrlens)
Run the code above in your browser using DataLab