Learn R Programming

AssocTests (version 1.0-1)

pcoc: PCoC for correcting for population stratification

Description

Identify the clustered and continuous patterns of the genetic variation using the PCoC, which calculates the principal coordinates and the clustering of the subjects for correcting for PS.

Usage

pcoc(
  genoFile,
  outFile.txt = "pcoc.result.txt",
  n.MonteCarlo = 1000,
  num.splits = 10,
  miss.val = 9
)

Arguments

genoFile

a txt file containing the genotypes (0, 1, 2, or 9). The element of the file in Row i and Column j represents the genotype at the ith marker of the jth subject. 0, 1, and 2 denote the number of risk alleles, and 9 (default) is for the missing genotype.

outFile.txt

a txt file for saving the result of this function. The default is "pcoc.result.txt".

n.MonteCarlo

the number of times for the Monte Carlo procedure. The default is 1000.

num.splits

the number of groups into which the markers are split. The default is 10.

miss.val

the number representing the missing data in the input data. The default is 9. The element 9 for the missing data in the genoFile should be changed according to the value of miss.val.

Value

A list of principal.coordinates and cluster. principal.coordinates is the principal coordinates and cluster is the clustering of the subjects. If the number of clusters is only one, cluster is omitted.

Details

The hidden population structure is a possible confounding effect in the large-scale genome-wide association studies. Cases and controls might have systematic differences because of the unrecognized population structure. The PCoC procedure uses the techniques from the multidimensional scaling and the clustering to correct for the population stratification. The PCoC could be seen as an extension of the EIGENSTRAT.

References

Lin Wang, Wei Zhang, and Qizhai Li. AssocTests: An R Package for Genetic Association Studies. Journal of Statistical Software. 2020; 94(5): 1-26.

Q Li and K Yu. Improved Correction for Population Stratification in Genome-Wide Association Studies by Identifying Hidden Population Structures. Genetic Epidemiology. 2008; 32(3): 215-226.

KV Mardia, JT Kent, and JM Bibby. Multivariate Analysis. New York: Academic Press. 1976.

Examples

Run this code
# NOT RUN {
pcocG.eg <- matrix(rbinom(4000, 2, 0.5), ncol = 40)
write.table(pcocG.eg, file = "pcocG.eg.txt", quote = FALSE,
       sep = "", row.names = FALSE, col.names = FALSE)
pcoc(genoFile = "pcocG.eg.txt", outFile.txt = "pcoc.result.txt",
       n.MonteCarlo = 50, num.splits = 10, miss.val = 9)
file.remove("pcocG.eg.txt", "pcoc.result.txt")
# }

Run the code above in your browser using DataLab