read.snps.long
.
read.long(file, samples, snps, fields = c(snp = 1, sample = 2, genotype = 3, confidence = 4, allele.A = NA, allele.B = NA), split = "\t| +", gcodes, no.call = "", threshold = NULL, lex.order = FALSE, verbose = FALSE)
TRUE
, the alleles at each locus will be in lexographical
order. Otherwise, ordering of alleles is arbitrary, depending on
the order in which they are encountered
TRUE
, this turns on output from the function. Otherwise
only error and warning messages are produced
SnpMatrix
. Otherwise it returns a list whose first element is the
SnpMatrix
object and whose second element is a dataframe
containing the allele codes, with the SNP identifiers as row names. Note
that allele codes only occur in this file if they occur in a genotype
which was accepted. Thus, monomorphic SNPs have allele.B
coded as
NA
, and SNPs which never pass confidence score filters have both
alleles coded as NA
.
strsplit
. The required fields are
extracted according to the fields
argument. This must
contain the locations of the sample and snp identifier
fields and either the location of a genotype field or the
locations of two allele fields.If the samples
and snps
arguments contain vectors of
character strings, a SnpMatrix
is created with these row and
column names and the genotype values are "cherry-picked" from the input
file. If either, or both, of these arguments are specified simply as
numbers, then these
numbers determine the dimensions of the SnpMatrix
created. In this case samples and/or SNPs are included in the
SnpMatrix
on a first-come-first-served basis. If either
or both of these arguments are omitted, a preliminary scan of the input file
is carried out to find the missing sample and/or SNP identifiers.
In this scan,
when a sample or SNP identifier differs from that in the previous
line, but is identical to one previously found, then all the relevant
identifiers are assumed to have been found. This implies that
the file must be sorted, in some consistent order,
by sample and by SNP (although either one of these may vary fastest).
If the genotype is to be read as a single field, the genotype
element of the fields
argument must be set to the appropriate
value, and the allele.A
and allele.B
elements should be
set to NA
. Its handling is controlled
by the gcodes
argument. If this is missing or NA
, then
the genotype is assumed to be represented by a two-character field,
the two characters representing the two alleles. If gcodes
is
a single string, then it is assumed to contain
a regular expression which will split the genotype field into two allele
fields. Otherwise, gcode
must be an array of length three,
specifying the three genotype codes in the order "AA", "AB", "BB".
If the two alleles of the genotype are to be read from two separate
fields, the genotype
element should be set to NA
and the
allele.A
and allele.B
elements set to the appropriate
values. The gcode
argument should be missing or set to NA
.
SnpMatrix-class
, XSnpMatrix-class
##
## No example supplied yet
##
Run the code above in your browser using DataLab