The package PLINK saves genome-wide association data in groups of three
files, with the extensions .bed
, .bim
, and .fam
.
This function reads these files and creates a matrix with numeric genotypes
and two data frames with information from the .bim
, and .fam
files.
read.plink(bed, bim, fam, na.strings = c("0", "-9"), sep = "." ,
select.subjects = NULL, select.snps = NULL)
The name of the
file containing the packed binary SNP genotype data. It should have
the extension .bed
; if it doesn't, then this extension will
be appended.
The file containing the SNP descriptions.
The file containing subject (and, possibly, family) identifiers. This is basically a tab-delimited "pedfile".
Strings in .bam
and .fam
files to be recoded as NA.
A separator character for constructing unique subject identifiers.
A numeric vector indicating a subset of subjects to be selected from the input file (see Details).
Either a numeric or a character vector indicating a subset of SNPs to be selected from the input file (see Details).
A list with three elements:
The output genotype data as a numeric matrix.
A data frame corresponding to the .fam
file,
containing the first six fields in a standard pedfile.
The row names will correspond with those of the genotype matrix.
A data frame corresponding to the .bim
file. the row
names correspond with the column names of the genotype matrix.
If the bed
argument does not contain a file name with the file
extension .bed
, then this extension is appended to the
argument. The remaining two arguments are optional; their default
values are obtained by replacing the .bed
file name extension by
.bim
and .fam
respectively. See the PLINK documentation
for the detailed specification of these files.
The select.subjects
or select.snps
argument can be used
to read a subset of the data. Use of select.snps
requires that
the .bed
file is in SNP-major order (the default in
PLINK). Likewise, use of select.subjects
requires that
the .bed
file is in individual-major order. Subjects are
selected by their numeric order in the PLINK files, while SNPs are
selected either by order or by name. Note that
the order of selected SNPs/subjects in the output objects
will be the same as their order in the PLINK files.
Row names for the output object and for the accompanying subject description data frame are taken as the pedigree identifiers, when these provide the required unique identifiers. When these are duplicated, an attempt is made to use the pedigree-member identifiers instead but, when these too are duplicated, row names are obtained by concatenating, with a separator character, the pedigree and pedigree-member identifiers.
PLINK: Whole genome association analysis toolset. http://zzz.bwh.harvard.edu/plink/
# NOT RUN {
bedFile <- system.file("testfiles/sample.bed", package = "FREGAT")
data <- read.plink(bedFile)
# }
Run the code above in your browser using DataLab