pcair is used to perform a Principal Components Analysis using genome-wide SNP data for the detection of population structure in a sample. Unlike a standard PCA, PC-AiR accounts for sample relatedness (known or cryptic) to provide accurate ancestry inference that is not confounded by family structure.pcair(genoData, v = 20, kinMat = NULL, kin.thresh = 2^(-11/2), divMat = NULL, div.thresh = -2^(-11/2), unrel.set = NULL, scan.include = NULL, snp.include = NULL, chromosome = NULL, snp.block.size = 10000, MAF = 0.01, verbose = TRUE)
"print"(x, ...)
"summary"(object, ...)
"print"(x, ...)GenotypeData from the package GWASTools containing the genotype data for SNPs and samples to be used for the analysis. This object can easily be created from a matrix of SNP genotype data, PLINK files, or GDS files.v = NULL, then all the principal components are returned.kin.thresh and unrel.set. IDs for each individual must be set as the row and column names of the matrix.kinMat used for declaring each pair of individuals as related or unrelated. The default value is 2^(-11/2) ~ 0.022. See 'Details' for how this interacts with kinMat.div.thresh. IDs for each individual must be set as the row and column names of the matrix.divMat used for deciding if each pair of individuals is ancestrally divergent. The default value is -2^(-11/2) ~ -0.022. See 'Details' for how this interacts with divMat.kinMat.chromosome for further details.snp.include is NULL; if chromosome is also NULL, then all SNPs are included.pcair', i.e. output from the pcair function. pcair', i.e. output from the pcair function. pcair'. A list including:
v principal components; each column is a principal component. Sample IDs are provided as rownames.v principal components. These values are determined from the standard PCA run on the 'unrelated subset'.MAF.pcair.kinMat. Any pair of individuals with a pairwise kinship greater than kin.thresh will be declared 'related.' Kinship coefficient estimates from the KING-robust software are used as measures of ancestry divergence in divMat. Any pair of individuals with a pairwise divergence measure less than div.thresh will be declared ancestrally 'divergent'. Typically, kin.thresh and div.thresh are set to be the amount of error around 0 expected in the estimate for a pair of truly unrelated individuals.
If divMat = NULL and kinMat is specified, the kinship coefficient estimates in kinMat will also be used as divergence measures in place of divMat.
It is important that the order of individuals in the matrices kinMat and divMat match the order of individuals in the genoData.
There are multiple ways to partition the sample into an ancestry representative 'unrelated subset' and a 'related subset'. If kinMat is specified and unrel.set = NULL, then the PC-AiR algorithm is used to find an 'optimal' partition (see 'References' for a paper describing the algorithm). If kinMat = NULL and unrel.set is specified, then the individuals with IDs in unrel.set are used as the 'unrelated subset'. If both kinMat and unrel.set are specified, then all individuals with IDs in unrel.set are forced in the 'unrelated subset' and the PC-AiR algorithm is used to partition the rest of the sample; this is especially useful for including reference samples of known ancestry in the 'unrelated subset'. If kinMat = NULL and unrel.set = NULL, then a standard principal components analysis that does not account for relatedness is performed.
pcairPartition for a description of the function used by pcair that can be used to partition the sample into 'unrelated' and 'related' subsets without performing PCA.
plot.pcair for plotting.
king2mat for creating a matrix of pairwise kinship coefficient estimates from KING output text files that can be used for kinMat or divMat.
GWASTools for a description of the package containing the following functions: GenotypeData for a description of creating a GenotypeData class object for storing sample and SNP genotype data, MatrixGenotypeReader for a description of reading in genotype data stored as a matrix, and GdsGenotypeReader for a description of reading in genotype data stored as a GDS file. Also see snpgdsBED2GDS in the SNPRelate package for a description of converting binary PLINK files to GDS. The generic functions summary and print.
# file path to GDS file
gdsfile <- system.file("extdata", "HapMap_ASW_MXL_geno.gds", package="GENESIS")
# read in GDS data
HapMap_geno <- GdsGenotypeReader(filename = gdsfile)
# create a GenotypeData class object
HapMap_genoData <- GenotypeData(HapMap_geno)
# load saved matrix of KING-robust estimates
data("HapMap_ASW_MXL_KINGmat")
# run PC-AiR
mypcair <- pcair(genoData = HapMap_genoData, kinMat = HapMap_ASW_MXL_KINGmat,
divMat = HapMap_ASW_MXL_KINGmat)
close(HapMap_genoData)
Run the code above in your browser using DataLab