This function performs an eQTL analysis and offers different types of tests. The type of test
can be specified with the method
option and possible options are "LM"
and "directional"
.
The option "LM"
fits for each SNP within a predefined window of size windowSize
(in MB) around a gene
a linear model for the genotype information and the corresponding gene expression. The null hypothesis
for each test is then that the slope is equal to zero and the alternative is that it is not zero.
The option "directional"
applies a new directional test based on probabilistic indices for triples as described
in Fischer, Oja, et al. (2014). Being \(\mathbf{x}_0=(x_{01},x_{02},\dots,x_{0N_0})', \mathbf{x}_1=(x_{11},x_{12},\dots,x_{1N_1})'\)
and \(\mathbf{x}_2=(x_{21},x_{22},\dots,x_{2N_2})'\) the expression values that are linked to the three genotype
groups \(0,1\) and \(2\) with underlying distributions \(F_0, F_1\) and \(F_2\). We first calculate the probabilistic
indices \(P_{0,1,2} = \frac{1}{N_0 N_1 N_2} \sum_i \sum_j \sum_k I(x_{0i} < x_{1j} < x_{2k})\)
and \(P_{2,1,0} = \frac{1}{N_0 N_1 N_2} \sum_i \sum_j \sum_k I(x_{2i} < x_{1j} < x_{0k})\). These are the probabilities that the expression
values of the three groups follow a certain order, that is what we would expect for possible eQTLs. The null hypothesis that we have then
in mind is that the expression values from these three groups have the same distribution \(H_0: F_0 = F_1 = F_2\) and the
two alternatives are that the distributions have a certain stochastical order \(H_1: F_0 < F_1 < F_2\) and \(H_2: F_2 < F_1 < F_0\).
The test is applied for the two probabilistic indices \(P_{0,1,2}\) and
\(P_{2,1,0}\) and combines the two resulting p-values \(p_{012}=p_1\) and \(p_{210}=p_2\) from previous tests then as overall p-value
\(\min(2 \min(p_1 , p_2 ), 1)\). In the two-group case (this means only two different genotypes are present for a certain SNP) a two-sided
Wilcoxon rank-sum test is applied.
The gene expressions used in the eQTL are specified in gex
. If several genes should be tested, then gex
is a matrix and each
column refers to a gene and each row to an individuum. The column names of this matrix must match then with the
names used in the annotation object xAnnot
. Sample names can either be given as row names in the matrix or as separate
vector in xSamples
. If only gene expressions of one gene should be tested then gex
can be a vector.
The genotype information is provided in the geno
object. Here one can i.a. specify the ped-file
name of a ped/map file pair. The function then imports the genotype information using the
function importPED
. In that case, the map file has to have the same filename as the ped file (despite the file extension...).
In case the genotype information has been imported already earlier using importPED
the resulting PedMap
object can
also be given as a parameter for geno
.
The xAnnot
object carries the annotation information for the gene expressions. Usually, the annotations are downloaded e.g. from the
Ensembl page and then imported to R with the importGTF
function. The resulting object can then be used as input for xAnnot
.
Casting options to feed in also other formats/objects are currently under development.
The option genoSamples
is used in case that the sample names in the ped/map file (or geno
object)
do not match with rownames(gex)
given in the expression matrix. The vector genoSamples
is as
long as the geno
object has samples, but gives then for each row in geno
the corresponding
name in the gex
object. The function finds then also the smallest union between the two data objects.
If there are repeated measurements per individual for the genotypes we take by default only the first
appearance in the data and neglect all successive values. Currently this cannot be changed. In case
this behavior is not desired, the user has to remove the corresponding rows from geno
before starting the
calculation.
If the code is executed on a Linux OS the user can specify with the mc
option the amount of CPU cores used for the
calculation.
If the sig
option is set to a certain significance level, then the method only reports those SNPs that are tested to be
significant. This can reduce the required memory drastically, especially in the case of trans-eQTL.
The method tests for trans-eQTLs (all combinations of SNPs and genes) if the windowSize
is set to NULL
.
Be aware that this might lead to long lasting calculations. For trans-eQTL calculations it is advisable to define a significance level in
sig
so that only significant results are stored. If all results are required, the option sig=NULL
has to be set and in
addition IHaveSpace=TRUE
. This additional parameter is necessary as a trans-eQTL with full output creates a huge output that often
exceeds the system properties of a normal desktop PC.
Note: The directional test currently supports only exact p-values based on permutation tests, but asymptotic implementations are developed and
will be soon available also.