Learn R Programming

chopsticks (version 1.36.0)

read.snps.long.old: Read SNP input data in "long" format (old version)

Description

This function reads SNP genotype data and creates an object of class "snp.matrix" or "X.snp.matrix". Input data are assumed to be arranged as one line per SNP-call (without any headers). This function can read gzipped files.

Usage

read.snps.long.old(file, chip.id, snp.id, codes, female, conf = 1, threshold = 0.9, drop=FALSE, sorted=FALSE, progress=interactive())

Arguments

file
Name of file containing the input data. Input files which have been compressed by the gzip utility are recognized
chip.id
Array of type "character" containing (unique) identifiers for the chips, samples, or subjects for which calls are to be read. Other samples in the input data will be ignored
snp.id
Array of type "character" containing (unique) identifiers of the SNPs for which data will be read. Again, further SNPs in the input data will be ignored
codes
For autosomal SNPs, an array of length 3 giving the codes for the three genotypes, in the order homozygous(AA), heterozygous(AB), homozygous(BB). For X SNPs, an additional two codes for the male genotypes (AY and BY) must be supplied. All other codes will be treated as "no call". The default codes are "0", "1", "2" [,"0", "2"]
female
If the data to be read refer to SNPs on the X chromosome, this argument must be supplied and should indicate whether each row of data refers to a female (TRUE) or to a male (FALSE). The output object will then be of class "X.snp.matrix".
conf
Confidence score. See details
drop
If TRUE, any rows or columns without genotype calls will be dropped from the output matrix. Otherwise the full matrix, with rows and columns defined by the chip.id and snp.id arguments, will be returned
threshold
Acceptance threshold for confidence score
sorted
Is input file already sorted into the correct order (see details)?
progress
If TRUE, progress will be reported to the standard output stream

Value

An object of class snp.matrix.

Details

Data are assumed to be input with one line per call, in free format: [] ... Currently, any fields following the first three (or four) are ignored. If the argument sorted is TRUE, the file is assumed to be sorted with snp-id as primary key and chip-id as secondary key using the current locale. The rows and columns of the returned matrix will also be ordered in this manner. If sorted is set to FALSE, then an algorithm which avoids this assumption is used. The rows and columns of the returned matrix will then be in the same order as the input chip_id and snp_id vectors. Calls in which both id fields match elements in the chip.id and snp.id arguments are read in, after (optionally) checking that the level of confidence achieves a given threshold. Confidence level checking is controlled by the conf argument. conf=0 indicates that no confidence score is present and no checking is done. conf>0 indicates that calls with scores above threshold are accepted, while conf<0< code=""> indicates that only calls with scores below threshold should be accepted.

The routine is case-sensitive and it is important that the and match the cases of chip.id and snp.id exactly.

References

http://www-gene.cimr.cam.ac.uk/clayton

See Also

snp.matrix-class, X.snp.matrix-class