eQTL: Perform an eQTL Analysis

Description

This function performs an eQTL analysis.

Usage

eQTL(gex=NULL, xAnnot = NULL, xSamples = NULL, geno=NULL, genoSamples = NULL,
       windowSize = 0.5, method = "directional", mc = 1, sig = NULL, which = NULL,
       testType = "asymptotic", nper = 2000, verbose = TRUE, MAF=0.05,
       IHaveSpace = FALSE)

Arguments

gex

Matrix or Vector with expression values.

xAnnot

Location annotations for the expression values.

xSamples

Sample names for the expression values (optional).

geno

Genotype data.

genoSamples

Sample names for the genotype values (optional).

windowSize

Size of the window around the center gene (in Mb).

method

Method of choice for the eQTL.

Amount of cores for parallel computing.

sig

Significance level for the eQTL testing.

which

Names of genes for that the eQTL should be performed.

testType

Type of test, either permutation or asymptotic.

nper

Sets the amount of permutations, if permuation tests are used.

verbose

Logical, shall the method give extensive feedback.

MAF

Filter for Minor Allele Frequency.

IHaveSpace

Logical, security switch for operations that require lots of memory.

Value

A list of class eqtl containing the values

gex

The gex object from the function call.

geno

The geno object from the function call.

xAnnot

The xAnnot object from the function call.

genoSamples

The genoSamples object from the function call.

windowSize

The windowSize object from the function call.

and an incapsulated list eqtl where each list item is a tested gene location and contains the items

ProbeLoc

Used position of that gene. (Only different from 1 if multiple locations are considered.)

TestedSNP

Details about the considered SNPs.

p.values

P values of the test.

GeneInfo

Details about the center gene.

Details

This function performs an eQTL analysis and offers different types of tests. The type of test can be specified with the method option and possible options are "LM" and "directional". The option "LM" fits for each SNP within a predefined window of size windowSize (in MB) around a gene a linear model for the genotype information and the corresponding gene expression. The null hypothesis for each test is then that the slope is equal to zero and the alternative is that it is not zero.

The option "directional" applies a new directional test based on probabilistic indices for triples as described in Fischer, Oja, et al. (2014). Being \(\mathbf{x}_0=(x_{01},x_{02},\dots,x_{0N_0})', \mathbf{x}_1=(x_{11},x_{12},\dots,x_{1N_1})'\) and \(\mathbf{x}_2=(x_{21},x_{22},\dots,x_{2N_2})'\) the expression values that are linked to the three genotype groups \(0,1\) and \(2\) with underlying distributions \(F_0, F_1\) and \(F_2\). We first calculate the probabilistic indices \(P_{0,1,2} = \frac{1}{N_0 N_1 N_2} \sum_i \sum_j \sum_k I(x_{0i} < x_{1j} < x_{2k})\) and \(P_{2,1,0} = \frac{1}{N_0 N_1 N_2} \sum_i \sum_j \sum_k I(x_{2i} < x_{1j} < x_{0k})\). These are the probabilities that the expression values of the three groups follow a certain order, that is what we would expect for possible eQTLs. The null hypothesis that we have then in mind is that the expression values from these three groups have the same distribution \(H_0: F_0 = F_1 = F_2\) and the two alternatives are that the distributions have a certain stochastical order \(H_1: F_0 < F_1 < F_2\) and \(H_2: F_2 < F_1 < F_0\).

The test is applied for the two probabilistic indices \(P_{0,1,2}\) and \(P_{2,1,0}\) and combines the two resulting p-values \(p_{012}=p_1\) and \(p_{210}=p_2\) from previous tests then as overall p-value \(\min(2 \min(p_1 , p_2 ), 1)\). In the two-group case (this means only two different genotypes are present for a certain SNP) a two-sided Wilcoxon rank-sum test is applied.

The gene expressions used in the eQTL are specified in gex. If several genes should be tested, then gex is a matrix and each column refers to a gene and each row to an individuum. The column names of this matrix must match then with the names used in the annotation object xAnnot. Sample names can either be given as row names in the matrix or as separate vector in xSamples. If only gene expressions of one gene should be tested then gex can be a vector.

The genotype information is provided in the geno object. Here one can i.a. specify the ped-file name of a ped/map file pair. The function then imports the genotype information using the function importPED. In that case, the map file has to have the same filename as the ped file (despite the file extension...). In case the genotype information has been imported already earlier using importPED the resulting PedMap object can also be given as a parameter for geno.

The xAnnot object carries the annotation information for the gene expressions. Usually, the annotations are downloaded e.g. from the Ensembl page and then imported to R with the importGTF function. The resulting object can then be used as input for xAnnot. Casting options to feed in also other formats/objects are currently under development.

The option genoSamples is used in case that the sample names in the ped/map file (or geno object) do not match with rownames(gex) given in the expression matrix. The vector genoSamples is as long as the geno object has samples, but gives then for each row in geno the corresponding name in the gex object. The function finds then also the smallest union between the two data objects. If there are repeated measurements per individual for the genotypes we take by default only the first appearance in the data and neglect all successive values. Currently this cannot be changed. In case this behavior is not desired, the user has to remove the corresponding rows from geno before starting the calculation.

If the code is executed on a Linux OS the user can specify with the mc option the amount of CPU cores used for the calculation.

If the sig option is set to a certain significance level, then the method only reports those SNPs that are tested to be significant. This can reduce the required memory drastically, especially in the case of trans-eQTL.

The method tests for trans-eQTLs (all combinations of SNPs and genes) if the windowSize is set to NULL. Be aware that this might lead to long lasting calculations. For trans-eQTL calculations it is advisable to define a significance level in sig so that only significant results are stored. If all results are required, the option sig=NULL has to be set and in addition IHaveSpace=TRUE. This additional parameter is necessary as a trans-eQTL with full output creates a huge output that often exceeds the system properties of a normal desktop PC.

Note: The directional test currently supports only exact p-values based on permutation tests, but asymptotic implementations are developed and will be soon available also.

References

Fischer, D., Oja, H., Sen, P.K., Schleutker, J., Wahlfors, T. (2014): Generalized Mann-Whitney Type Tests for Microarray Experiments, Scandinavian Journal of Statistics, 3, pages 672-692, doi: 10.1111/sjos.12055.

Fischer, D., Oja, H. (2013): Mann-Whitney Type Tests for Microarray Experiments: The R Package gMWT, submitted article.

Examples

Run this code

# NOT RUN {
# }
# NOT RUN {
# Make the example data available
data("annotTrack")   # Standard gtf file, imported with importGTF
data("geneEXP")      # Matrix with gene expression
data("genotData")    # A imported Ped/Map filepair, using importPED

# Transform gtf to bed format (not necessarily required)
annot.bed <- gtfToBed(annotTrack)

# cis-eQTL
###############################################

# Most basic cis-eQTL runs:
EQTL1 <- eQTL(gex=geneEXP[,1:10] , xAnnot = annotTrack, geno= genotData)

# Same run, if gtf has been transformed to bed previously
EQTL1.1   <- eQTL(gex=geneEXP[,1:10] , xAnnot = annot.bed, geno= genotData)

# Same run, when the genotype data wasn't loaded and should be loaded
# here instead
EQTL1.2 <- eQTL(gex=geneEXP[,1:10] , xAnnot = annotTrack, 
+               geno= file.path("Datasets","genotypes.ped"))

# Full set of genes, this time filtered with column names
EQTL2   <- eQTL(gex=geneEXP, xAnnot = annot.bed, geno= genotData, 
+               which = colnames(geneEXP)[1:20])

# Single vector of gene expression values, underlying gene is specified
# in the which option
EQTL3   <- eQTL(gex=as.vector(geneEXP[,1]) , xAnnot = annot.bed, 
+               geno= genotData, which="ENSG00000223972")

# Same call, but this time is the corresponding column not casted
EQTL3.1 <- eQTL(gex=geneEXP[,1] , xAnnot = annot.bed, geno= genotData, 
+               which="ENSG00000223972")

# Same call, but instead of the name the row number in the gtf/bed
# file is provided
EQTL3.2 <- eQTL(gex=geneEXP[,1] , xAnnot = annot.bed, geno= genotData,
+               which=1)

# The same expression values are now assigned to three different genes
EQTL4   <- eQTL(gex=as.vector(geneEXP[,1]) , xAnnot = annot.bed,
+               geno= genotData, which=1:3)

# Trans-eQTL
######################################

# Trans eQTL for the first and the last gene in our expression matrix
EQTL5   <- eQTL(gex=geneEXP[,c(1,1000)] , xAnnot = annot.bed, 
+               geno= genotData, windowSize = NULL)

# Same call, this time distributed to 8 cores (ony available on 
# Linux computers)
EQTL5   <- eQTL(gex=geneEXP[,c(1,1000)] , xAnnot = annot.bed, 
+               geno= genotData, windowSize = NULL, mc=8)

# Expression values from the first gene are used to test the 100st 
# gene for trans-eQTL
EQTL6   <- eQTL(gex=as.vector(geneEXP[,1]) , xAnnot = annot.bed, 
+               geno= genotData, windowSize = NULL, which=100)
# }

Run the code above in your browser using DataLab