Learn R Programming

snpStats (version 1.22.0)

read.pedfile: Read a pedfile as "SnpMatrix" object

Description

Reads diallelic data in linkage "pedfile" format, with one line of data per sample (subject) containing six mandatory fields followed by pairs of fields, one pair for each locus, giving the two alleles observed.

Usage

read.pedfile(file, n, snps, which, split = "\t| +", sep = ".", na.strings = "0", lex.order = FALSE)

Arguments

file
The input pedfile. This may be (but need not be) gzipped
n
(Optional) The number of lines of data to be read. If not supplied the pedfile is read once and rewound to determine how many lines it contains
snps
(Optional) Either a character vector giving the names of the loci, or a single character variable giving the name of a locus information file from which these can be read. This file is assumed to be white-space delimited with one line per locus and no header line. If this argument is not supplied, locus names are generated as a numerical sequence, prefixed by locus and a separator character
which
(Optional) If locus names are to be read from a file, this argument should specify which column contains the names. If not supplied, the first column giving unique locus names is used
split
A "regexp" specifying how the input pedfile will be split into fields. The default value specifies either a TAB character or one or more spaces
sep
The separator character used in constructing row and column names of the output SnpMatrix object
na.strings
One or more strings to be set to NA. Any field taking one of these values will be set to NA
lex.order
If TRUE, then alleles will be allocated to internal 1 and 2 values in lexographic order. Otherwise they are converted in the order in which they are encountered when reading the file (the default setting)

Value

A list, comprising
genotypes
The output genotype data as an object of class "SnpMatrix". If either the pedigree or pedigree-member identifiers in the ped file are not duplicated, these are used for the row names of the output object. Otherwise these two fields are concatenated, separated by sep
fam
A dataframe containing the first six fields in the pedfile. The row names will correspond with those of the SnpMatrix
map
A dataframe giving the alleles at each locus. If locus names were obtained from a dataframe read from an existing file, then the allele information is simply appended to this frame. Otherwise a new dataframe is created. The row names will correspond with the column names of the SnpMatrix

Details

Row names for the output SnpMatrix object and for the accompanying subject description dataframe are taken as the pedigree identifiers, when these provide the required unique identifiers. When these are duplicated, an attempt is made to use the pedigree-member identifiers instead but, when these too are duplicated, row names are obtained by concatenating, with a separator character, the pedigree and pedigree-member identifiers.

See Also

SnpMatrix-class, XSnpMatrix-class

Examples

Run this code
##
## No example supplied yet
##

Run the code above in your browser using DataLab