"SnpMatrix"
. This function has been
replaced in versions 1.3 and later by the more flexible function
read.long
.
read.snps.long(files, sample.id = NULL, snp.id = NULL, diploid = NULL, fields = c(sample = 1, snp = 2, genotype = 3, confidence = 4), codes = c("0", "1", "2"), threshold = 0.9, lower = TRUE, sep = " ", comment = "#", skip = 0, simplify = c(FALSE,FALSE), verbose = FALSE, in.order=TRUE, every = 1000)
sample.id
, required if reading data into an XSnpMatrix
rather than a SnpMatrix
. This vector gives the expected
ploidy for each row. If the same value suffices for all rows, then a
scalar may be suppliedsample
and snp
for the sample
and SNP identifier fields, confidence
for a call confidence
score (if present) and either genotype
if genotype calls
occur as a single field, or allele1
and allele2
if the
two alleles are coded in different fields"nucleotide"
denoting
that coding in terms of nucleotides
(A
, C
, G
or T
, case insensitive),
or a character vector
giving genotype or allele codes (see below)TRUE
, then threshold
represents a lower
bound. Otherwise it is an upper boundTRUE
, sample and SNP identifying strings
will be shortened by removal of any common leading or trailing
sequences when they are used as row and column names of the output
SnpMatrix
TRUE
, a progress report is generated as
every every
lines of data are readTRUE
, input lines are assumed to be in the
correct order (see details)verbose
"SnpMatrix"
or "XSnpMatrix"
.
codes
argument
should be a character array giving the valid codes.
For genotype coding of autosomal SNPs, this should be
an array of length 3 giving the codes
for the three genotypes, in the order homozygous(AA), heterozygous(AB),
homozygous(BB). All other codes will be treated
as "no call". The default codes are "0"
, "1"
,
"2"
. For X SNPs, males are assumed to be coded as homozygous,
unless an additional two codes are supplied (representing the
AY and BY genotypes). For allele coding, the
codes
array should be of length 2 and should specify the codes
for the two alleles. Again, any other code is treated as
"missing" and, for X SNPs, males should be coded either as
homozygous or by omission of the second allele.
For nucleotide coding, nucleotides are assigned to the nominal alleles
in alphabetic order. Thus, for a SNP with either "T" and "A"
nucleotides in the variant position,
the nominal genotypes AA, AB and BB will refer to A/A,
A/T and T/T.
Although the function allows for reading into an object of class
XSnpMatrix
directly,
it is usually preferable to read such data as a "SnpMatrix"
(i.e. as autosomal) and to coerce it to an object of type
"XSnpMatrix"
later using as(..., "X.SnpMatrix")
or
new("XSnpMatrix", ..., diploid=...)
. If diploid
is coded NA
for any subject the latter course must be
followed, since NA
s are not accepted in the diploid
argument. If the in.order
argument is set TRUE
, then
the vectors sample.id
and snp.id
must be in the same
order as they vary on the input file(s) and this ordering must be
consistent. However, there is
no requirement that either SNP or sample should vary fastest as this is
detected from the input. If in.order
is FALSE
, then no
assumptions about the ordering of the input file are assumed and SNP
and sample identifiers are looked up in hash tables as they are
read. This option must be expected, therefore, to be somewhat slower.
Each file may represent a separate sample or SNP, in which case the
appropriate .id
argument can be omitted; row or column names
are then taken from the file names.
read.plink
,
SnpMatrix-class
, XSnpMatrix-class