Estimate pairwise Linkage Disequilibrium (LD) between markers measured as \(r^2\) using an object of class gpData
. For the general case, a gateway to the software PLINK (Purcell et al. 2007) is established to estimate the LD. A within-R solution is only available for marker data with only 2 genotypes, i.e. homozgous inbred lines. Return value is an object of class LDdf
which is a data.frame
with one row per marker pair or an object of class LDMat
which is a matrix
with all marker pairs. Additionally, the euclidian distance between position of markers is computed and returned.
pairwiseLD(gpData, chr = NULL, type = c("data.frame", "matrix"),use.plink=FALSE,
ld.threshold=0, ld.window=99999, rm.unmapped = TRUE, cores=1)
object of class gpData
with elements geno
and map
numeric
scalar or vector. Return value is a list with pairwise LD of all markers for each chromosome in chr
.
character
. Specifies the type of return value (see 'Value').
logical
. Should the software PLINK be used for the computation?
numeric
. Threshold for the LD to thin the output. Only pairwise LD>ld.threshold
is reported when PLINK is used. This argument can only be used for type="data.frame"
.
numeric
. Window size for pairwise differences which will be reported by PLINK (only for use.plink=TRUE
; argument --ld-window-kb
in PLINK) to thin the output dimensions. Only SNP pairs with a distance < ld.window
are reported (default = 99999).
logical
. Remove markers with unknown postion in map
before using PLINK?
numeric
. Here you can specify the number of cores you like to use.
For type="data.frame"
an object of class LDdf
with one element for each chromosome is returned. Each element is a data.frame
with columns marker1
, marker2
, r2
and distance
for all \(p(p-1)/2\) marker pairs (or thinned, see 'Details').
For type="matrix"
an object of class LDmat
with one element for each chromosome is returned. Each element is a list of 2: a \(p \times p\) matrix
with pairwise LD and the corresponding \(p \times p\) matrix
with pairwise distances.
The function write.plink
is called to prepare the input files and the script for PLINK. The executive PLINK file plink.exe
must be available (e.g. in the working directory or through path variables). The function pairwiseLD
calls PLINK and reads the results. The evaluation is performed separately for every chromosome. The measure for LD is \(r^2\). This is defined as
$$D= p_{AB} - p_Ap_B $$ and
$$r^2=\frac{D^2}{p_Ap_Bp_ap_b}$$
where \(p_{AB}\) is defined as the observed frequency of haplotype \(AB\), \(p_A=1-p_a\) and \(p_B=1-p_b\) the observed frequencies of alleles \(A\) and \(B\).
If the number of markers is high, a threshold for the LD can be used to thin the output. In this case, only pairwise LD above the threshold is reported (argument --ld-window-r2 in PLINK
).
Default PLINK options used --no-parents --no-sex --no-pheno --allow-no-sex --ld-window p --ld-window-kb 99999
Hill WG, Robertson A (1968). Linkage Disequilibrium in Finite Populations. Theoretical and Applied Genetics, 6(38), 226 - 231.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ & Sham PC (2007) PLINK: a toolset for whole-genome association and population-based linkage analysis. American Journal of Human Genetics, 81.
# NOT RUN {
library(synbreedData)
data(maize)
maizeC <- codeGeno(maize)
maizeLD <- pairwiseLD(maizeC,chr=1,type="data.frame")
# }
Run the code above in your browser using DataLab