pcair
is used to perform a Principal Components Analysis using genome-wide SNP data for the detection of population structure in a sample. Unlike a standard PCA, PC-AiR accounts for sample relatedness (known or cryptic) to provide accurate ancestry inference that is not confounded by family structure.pcair(genoData, v = 20, kinMat = NULL, kin.thresh = 2^(-11/2), divMat = NULL, div.thresh = -2^(-11/2), unrel.set = NULL, scan.include = NULL, snp.include = NULL, chromosome = NULL, snp.block.size = 10000, MAF = 0.01, verbose = TRUE)
"print"(x, ...)
"summary"(object, ...)
"print"(x, ...)
GenotypeData
from the package GWASTools
containing the genotype data for SNPs and samples to be used for the analysis. This object can easily be created from a matrix of SNP genotype data, PLINK files, or GDS files.v = NULL
, then all the principal components are returned.kin.thresh
and unrel.set
. IDs for each individual must be set as the row and column names of the matrix.kinMat
used for declaring each pair of individuals as related or unrelated. The default value is 2^(-11/2) ~ 0.022. See 'Details' for how this interacts with kinMat
.div.thresh
. IDs for each individual must be set as the row and column names of the matrix.divMat
used for deciding if each pair of individuals is ancestrally divergent. The default value is -2^(-11/2) ~ -0.022. See 'Details' for how this interacts with divMat
.kinMat
.chromosome
for further details.snp.include
is NULL; if chromosome
is also NULL, then all SNPs are included.pcair
', i.e. output from the pcair
function. pcair
', i.e. output from the pcair
function. pcair
'. A list including:
v
principal components; each column is a principal component. Sample IDs are provided as rownames.v
principal components. These values are determined from the standard PCA run on the 'unrelated subset'.MAF
.pcair
.kinMat
. Any pair of individuals with a pairwise kinship greater than kin.thresh
will be declared 'related.' Kinship coefficient estimates from the KING-robust software are used as measures of ancestry divergence in divMat
. Any pair of individuals with a pairwise divergence measure less than div.thresh
will be declared ancestrally 'divergent'. Typically, kin.thresh
and div.thresh
are set to be the amount of error around 0 expected in the estimate for a pair of truly unrelated individuals.
If divMat = NULL
and kinMat
is specified, the kinship coefficient estimates in kinMat
will also be used as divergence measures in place of divMat
.
It is important that the order of individuals in the matrices kinMat
and divMat
match the order of individuals in the genoData
.
There are multiple ways to partition the sample into an ancestry representative 'unrelated subset' and a 'related subset'. If kinMat
is specified and unrel.set = NULL
, then the PC-AiR algorithm is used to find an 'optimal' partition (see 'References' for a paper describing the algorithm). If kinMat = NULL
and unrel.set
is specified, then the individuals with IDs in unrel.set
are used as the 'unrelated subset'. If both kinMat
and unrel.set
are specified, then all individuals with IDs in unrel.set
are forced in the 'unrelated subset' and the PC-AiR algorithm is used to partition the rest of the sample; this is especially useful for including reference samples of known ancestry in the 'unrelated subset'. If kinMat = NULL
and unrel.set = NULL
, then a standard principal components analysis that does not account for relatedness is performed.
pcairPartition
for a description of the function used by pcair
that can be used to partition the sample into 'unrelated' and 'related' subsets without performing PCA.
plot.pcair
for plotting.
king2mat
for creating a matrix of pairwise kinship coefficient estimates from KING output text files that can be used for kinMat
or divMat
.
GWASTools
for a description of the package containing the following functions: GenotypeData
for a description of creating a GenotypeData
class object for storing sample and SNP genotype data, MatrixGenotypeReader
for a description of reading in genotype data stored as a matrix, and GdsGenotypeReader
for a description of reading in genotype data stored as a GDS file. Also see snpgdsBED2GDS
in the SNPRelate
package for a description of converting binary PLINK files to GDS. The generic functions summary
and print
.
# file path to GDS file
gdsfile <- system.file("extdata", "HapMap_ASW_MXL_geno.gds", package="GENESIS")
# read in GDS data
HapMap_geno <- GdsGenotypeReader(filename = gdsfile)
# create a GenotypeData class object
HapMap_genoData <- GenotypeData(HapMap_geno)
# load saved matrix of KING-robust estimates
data("HapMap_ASW_MXL_KINGmat")
# run PC-AiR
mypcair <- pcair(genoData = HapMap_genoData, kinMat = HapMap_ASW_MXL_KINGmat,
divMat = HapMap_ASW_MXL_KINGmat)
close(HapMap_genoData)
Run the code above in your browser using DataLab